Time Series Forecasting with Prophet

predicting the future with machine learning

Experiments
Time Series Analysis
Machine Learning
Published

April 20, 2024

Forecasting future trends is a common application in time series analysis. In this experiment, we will use Meta’s Prophet library to predict trends for births in Malaysia, based on available public data. Prophet is a forecasting tool developed by Meta that is available in Python and R. It is designed for analyzing time series data with daily observations that display patterns on different time scales.

Prophet handles missing data, shifts in the trend, and large outliers in a robust manner. It provides a straightforward way to include the effects of holidays and seasonality in the forecast. It decomposes time series data into trend, seasonality, and holiday effects, making it easy to understand the impact of these components on the forecast.

In this experiment, we will incorporate additional regressors, such as temperature and pollutant levels, to see how these factors influence birth rates. The approach allows us to account for external variables that might affect the trend and seasonality of births.

Load the datasets

We will be using public Kaggle datasets, one containing weather data for Malaysia, and the other containing the number of births.

Show the code
# Download https://www.kaggle.com/datasets/shahmirvarqha/weather-data-malaysia?select=full-weather.csv using the Kaggle API

!kaggle datasets download -p .data/ shahmirvarqha/weather-data-malaysia --unzip
Dataset URL: https://www.kaggle.com/datasets/shahmirvarqha/weather-data-malaysia
License(s): Attribution 4.0 International (CC BY 4.0)
Downloading weather-data-malaysia.zip to .data
  0%|                                                | 0.00/311M [00:00<?, ?B/s]  0%|▏                                      | 1.00M/311M [00:00<01:59, 2.71MB/s]  1%|▎                                      | 2.00M/311M [00:00<01:11, 4.50MB/s]  1%|▍                                      | 3.00M/311M [00:00<00:57, 5.60MB/s]  1%|▌                                      | 4.00M/311M [00:00<00:51, 6.26MB/s]  2%|▋                                      | 5.00M/311M [00:00<00:46, 6.87MB/s]  2%|▊                                      | 6.00M/311M [00:01<00:43, 7.40MB/s]  2%|▉                                      | 7.00M/311M [00:01<00:43, 7.41MB/s]  3%|█                                      | 8.00M/311M [00:01<00:41, 7.61MB/s]  3%|█▏                                     | 9.00M/311M [00:01<00:40, 7.75MB/s]  3%|█▎                                     | 10.0M/311M [00:01<00:39, 8.05MB/s]  4%|█▍                                     | 11.0M/311M [00:01<00:38, 8.20MB/s]  4%|█▌                                     | 12.0M/311M [00:01<00:38, 8.25MB/s]  4%|█▋                                     | 13.0M/311M [00:01<00:37, 8.31MB/s]  5%|█▊                                     | 14.0M/311M [00:02<00:37, 8.21MB/s]  5%|█▉                                     | 15.0M/311M [00:02<00:38, 8.05MB/s]  5%|██                                     | 16.0M/311M [00:02<00:37, 8.14MB/s]  5%|██▏                                    | 17.0M/311M [00:02<00:38, 8.11MB/s]  6%|██▎                                    | 18.0M/311M [00:02<00:36, 8.40MB/s]  6%|██▍                                    | 19.0M/311M [00:02<00:36, 8.44MB/s]  6%|██▌                                    | 20.0M/311M [00:02<00:36, 8.36MB/s]  7%|██▋                                    | 21.0M/311M [00:02<00:36, 8.30MB/s]  7%|██▊                                    | 22.0M/311M [00:03<00:36, 8.27MB/s]  7%|██▉                                    | 23.0M/311M [00:03<00:37, 8.05MB/s]  8%|███                                    | 24.0M/311M [00:03<00:37, 8.05MB/s]  8%|███▏                                   | 25.0M/311M [00:03<00:37, 7.93MB/s]  8%|███▎                                   | 26.0M/311M [00:03<00:37, 7.92MB/s]  9%|███▍                                   | 27.0M/311M [00:03<00:37, 7.95MB/s]  9%|███▌                                   | 28.0M/311M [00:03<00:37, 7.99MB/s]  9%|███▋                                   | 29.0M/311M [00:03<00:37, 7.96MB/s] 10%|███▊                                   | 30.0M/311M [00:04<00:37, 7.87MB/s] 10%|███▉                                   | 31.0M/311M [00:04<00:37, 7.89MB/s] 10%|████                                   | 32.0M/311M [00:04<00:36, 8.00MB/s] 11%|████▏                                  | 33.0M/311M [00:04<00:36, 8.03MB/s] 11%|████▎                                  | 34.0M/311M [00:04<00:36, 7.99MB/s] 11%|████▍                                  | 35.0M/311M [00:04<00:36, 8.04MB/s] 12%|████▌                                  | 36.0M/311M [00:04<00:35, 8.12MB/s] 12%|████▋                                  | 37.0M/311M [00:05<00:35, 8.02MB/s] 12%|████▊                                  | 38.0M/311M [00:05<00:36, 7.94MB/s] 13%|████▉                                  | 39.0M/311M [00:05<00:35, 7.95MB/s] 13%|█████                                  | 40.0M/311M [00:05<00:46, 6.05MB/s] 13%|█████▏                                 | 41.0M/311M [00:05<00:42, 6.58MB/s] 14%|█████▎                                 | 42.0M/311M [00:05<00:41, 6.87MB/s] 14%|█████▍                                 | 43.0M/311M [00:05<00:38, 7.33MB/s] 14%|█████▌                                 | 44.0M/311M [00:06<00:37, 7.56MB/s] 14%|█████▋                                 | 45.0M/311M [00:06<00:35, 7.77MB/s] 15%|█████▊                                 | 46.0M/311M [00:06<00:35, 7.86MB/s] 15%|█████▉                                 | 47.0M/311M [00:06<00:35, 7.91MB/s] 15%|██████                                 | 48.0M/311M [00:06<00:34, 8.02MB/s] 16%|██████▏                                | 49.0M/311M [00:06<00:34, 7.99MB/s] 16%|██████▎                                | 50.0M/311M [00:06<00:34, 8.01MB/s] 16%|██████▍                                | 51.0M/311M [00:06<00:33, 8.10MB/s] 17%|██████▌                                | 52.0M/311M [00:07<00:32, 8.25MB/s] 17%|██████▋                                | 53.0M/311M [00:07<00:32, 8.20MB/s] 17%|██████▊                                | 54.0M/311M [00:07<00:33, 8.13MB/s] 18%|██████▉                                | 55.0M/311M [00:07<00:31, 8.47MB/s] 18%|███████                                | 56.0M/311M [00:07<00:31, 8.48MB/s] 18%|███████▏                               | 57.0M/311M [00:07<00:31, 8.53MB/s] 19%|███████▎                               | 58.0M/311M [00:07<00:31, 8.36MB/s] 19%|███████▍                               | 59.0M/311M [00:07<00:31, 8.51MB/s] 19%|███████▌                               | 60.0M/311M [00:08<00:30, 8.51MB/s] 20%|███████▋                               | 61.0M/311M [00:08<00:30, 8.54MB/s] 20%|███████▊                               | 62.0M/311M [00:08<00:31, 8.40MB/s] 20%|███████▉                               | 63.0M/311M [00:08<00:31, 8.20MB/s] 21%|████████                               | 64.0M/311M [00:08<00:30, 8.49MB/s] 21%|████████▏                              | 65.0M/311M [00:08<00:30, 8.40MB/s] 21%|████████▎                              | 66.0M/311M [00:08<00:30, 8.53MB/s] 22%|████████▍                              | 67.0M/311M [00:08<00:30, 8.49MB/s] 22%|████████▌                              | 68.0M/311M [00:09<00:30, 8.32MB/s] 22%|████████▋                              | 69.0M/311M [00:09<00:30, 8.37MB/s] 23%|████████▊                              | 70.0M/311M [00:09<00:29, 8.48MB/s] 23%|████████▉                              | 71.0M/311M [00:09<00:29, 8.51MB/s] 23%|█████████                              | 72.0M/311M [00:09<00:29, 8.56MB/s] 23%|█████████▏                             | 73.0M/311M [00:09<00:29, 8.57MB/s] 24%|█████████▎                             | 74.0M/311M [00:09<00:29, 8.41MB/s] 24%|█████████▍                             | 75.0M/311M [00:09<00:29, 8.45MB/s] 24%|█████████▌                             | 76.0M/311M [00:10<00:29, 8.28MB/s] 25%|█████████▋                             | 77.0M/311M [00:10<00:29, 8.30MB/s] 25%|█████████▊                             | 78.0M/311M [00:10<00:29, 8.33MB/s] 25%|█████████▉                             | 79.0M/311M [00:10<00:28, 8.43MB/s] 26%|██████████                             | 80.0M/311M [00:10<00:39, 6.08MB/s] 26%|██████████▏                            | 81.0M/311M [00:10<00:36, 6.63MB/s] 26%|██████████▎                            | 82.0M/311M [00:11<00:33, 7.09MB/s] 27%|██████████▍                            | 83.0M/311M [00:11<00:32, 7.46MB/s] 27%|██████████▌                            | 84.0M/311M [00:11<00:30, 7.76MB/s] 27%|██████████▋                            | 85.0M/311M [00:11<00:29, 8.03MB/s] 28%|██████████▊                            | 86.0M/311M [00:11<00:28, 8.17MB/s] 28%|██████████▉                            | 87.0M/311M [00:11<00:28, 8.23MB/s] 28%|███████████                            | 88.0M/311M [00:11<00:27, 8.38MB/s] 29%|███████████▏                           | 89.0M/311M [00:11<00:27, 8.38MB/s] 29%|███████████▎                           | 90.0M/311M [00:11<00:27, 8.51MB/s] 29%|███████████▍                           | 91.0M/311M [00:12<00:27, 8.37MB/s] 30%|███████████▌                           | 92.0M/311M [00:12<00:27, 8.35MB/s] 30%|███████████▋                           | 93.0M/311M [00:12<00:27, 8.35MB/s] 30%|███████████▊                           | 94.0M/311M [00:12<00:26, 8.54MB/s] 31%|███████████▉                           | 95.0M/311M [00:12<00:26, 8.40MB/s] 31%|████████████                           | 96.0M/311M [00:12<00:26, 8.53MB/s] 31%|████████████▏                          | 97.0M/311M [00:12<00:26, 8.55MB/s] 32%|████████████▎                          | 98.0M/311M [00:13<00:27, 8.09MB/s] 32%|████████████▍                          | 99.0M/311M [00:13<00:26, 8.42MB/s] 32%|████████████▊                           | 100M/311M [00:13<00:27, 8.03MB/s] 32%|████████████▉                           | 101M/311M [00:13<00:28, 7.85MB/s] 33%|█████████████                           | 102M/311M [00:13<00:28, 7.73MB/s] 33%|█████████████▎                          | 103M/311M [00:13<00:28, 7.67MB/s] 33%|█████████████▍                          | 104M/311M [00:13<00:28, 7.50MB/s] 34%|█████████████▌                          | 105M/311M [00:13<00:28, 7.52MB/s] 34%|█████████████▋                          | 106M/311M [00:14<00:28, 7.59MB/s] 34%|█████████████▊                          | 107M/311M [00:14<00:28, 7.58MB/s] 35%|█████████████▉                          | 108M/311M [00:14<00:27, 7.75MB/s] 35%|██████████████                          | 109M/311M [00:14<00:26, 7.96MB/s] 35%|██████████████▏                         | 110M/311M [00:14<00:26, 7.97MB/s] 36%|██████████████▎                         | 111M/311M [00:14<00:26, 8.04MB/s] 36%|██████████████▍                         | 112M/311M [00:14<00:25, 8.18MB/s] 36%|██████████████▌                         | 113M/311M [00:14<00:24, 8.39MB/s] 37%|██████████████▋                         | 114M/311M [00:15<00:24, 8.37MB/s] 37%|██████████████▊                         | 115M/311M [00:15<00:28, 7.29MB/s] 37%|██████████████▉                         | 116M/311M [00:15<00:28, 7.26MB/s] 38%|███████████████                         | 117M/311M [00:15<00:27, 7.38MB/s] 38%|███████████████▏                        | 118M/311M [00:15<00:26, 7.73MB/s] 38%|███████████████▎                        | 119M/311M [00:15<00:25, 7.85MB/s] 39%|███████████████▍                        | 120M/311M [00:15<00:25, 7.89MB/s] 39%|███████████████▌                        | 121M/311M [00:16<00:24, 8.00MB/s] 39%|███████████████▋                        | 122M/311M [00:16<00:25, 7.92MB/s] 40%|███████████████▊                        | 123M/311M [00:16<00:25, 7.81MB/s] 40%|███████████████▉                        | 124M/311M [00:16<00:24, 8.13MB/s] 40%|████████████████                        | 125M/311M [00:16<00:23, 8.23MB/s] 41%|████████████████▏                       | 126M/311M [00:16<00:23, 8.35MB/s] 41%|████████████████▎                       | 127M/311M [00:16<00:23, 8.32MB/s] 41%|████████████████▍                       | 128M/311M [00:16<00:23, 8.33MB/s] 41%|████████████████▌                       | 129M/311M [00:17<00:23, 8.13MB/s] 42%|████████████████▋                       | 130M/311M [00:17<00:23, 8.08MB/s] 42%|████████████████▊                       | 131M/311M [00:17<00:23, 8.12MB/s] 42%|████████████████▉                       | 132M/311M [00:17<00:23, 8.02MB/s] 43%|█████████████████                       | 133M/311M [00:17<00:23, 8.00MB/s] 43%|█████████████████▏                      | 134M/311M [00:17<00:23, 8.00MB/s] 43%|█████████████████▎                      | 135M/311M [00:17<00:22, 8.12MB/s] 44%|█████████████████▍                      | 136M/311M [00:18<00:22, 8.30MB/s] 44%|█████████████████▋                      | 137M/311M [00:18<00:21, 8.34MB/s] 44%|█████████████████▊                      | 138M/311M [00:18<00:21, 8.43MB/s] 45%|█████████████████▉                      | 139M/311M [00:18<00:28, 6.24MB/s] 45%|██████████████████                      | 140M/311M [00:18<00:26, 6.75MB/s] 45%|██████████████████▏                     | 141M/311M [00:18<00:25, 7.12MB/s] 46%|██████████████████▎                     | 142M/311M [00:18<00:23, 7.50MB/s] 46%|██████████████████▍                     | 143M/311M [00:19<00:23, 7.57MB/s] 46%|██████████████████▌                     | 144M/311M [00:19<00:22, 7.72MB/s] 47%|██████████████████▋                     | 145M/311M [00:19<00:21, 7.91MB/s] 47%|██████████████████▊                     | 146M/311M [00:19<00:21, 7.95MB/s] 47%|██████████████████▉                     | 147M/311M [00:19<00:21, 7.94MB/s] 48%|███████████████████                     | 148M/311M [00:19<00:21, 8.07MB/s] 48%|███████████████████▏                    | 149M/311M [00:19<00:20, 8.09MB/s] 48%|███████████████████▎                    | 150M/311M [00:19<00:20, 8.21MB/s] 49%|███████████████████▍                    | 151M/311M [00:20<00:20, 8.24MB/s] 49%|███████████████████▌                    | 152M/311M [00:20<00:20, 8.23MB/s] 49%|███████████████████▋                    | 153M/311M [00:20<00:20, 8.21MB/s] 50%|███████████████████▊                    | 154M/311M [00:20<00:19, 8.34MB/s] 50%|███████████████████▉                    | 155M/311M [00:20<00:19, 8.40MB/s] 50%|████████████████████                    | 156M/311M [00:20<00:19, 8.46MB/s] 51%|████████████████████▏                   | 157M/311M [00:20<00:19, 8.42MB/s] 51%|████████████████████▎                   | 158M/311M [00:20<00:19, 8.41MB/s] 51%|████████████████████▍                   | 159M/311M [00:21<00:19, 8.33MB/s] 51%|████████████████████▌                   | 160M/311M [00:21<00:18, 8.41MB/s] 52%|████████████████████▋                   | 161M/311M [00:21<00:18, 8.37MB/s] 52%|████████████████████▊                   | 162M/311M [00:21<00:18, 8.30MB/s] 52%|████████████████████▉                   | 163M/311M [00:21<00:19, 8.04MB/s] 53%|█████████████████████                   | 164M/311M [00:21<00:18, 8.13MB/s] 53%|█████████████████████▏                  | 165M/311M [00:21<00:19, 7.94MB/s] 53%|█████████████████████▎                  | 166M/311M [00:21<00:18, 8.17MB/s] 54%|█████████████████████▍                  | 167M/311M [00:22<00:18, 8.09MB/s] 54%|█████████████████████▌                  | 168M/311M [00:22<00:18, 8.11MB/s] 54%|█████████████████████▋                  | 169M/311M [00:22<00:18, 8.08MB/s] 55%|█████████████████████▊                  | 170M/311M [00:22<00:18, 8.04MB/s] 55%|██████████████████████                  | 171M/311M [00:22<00:18, 8.04MB/s] 55%|██████████████████████▏                 | 172M/311M [00:22<00:18, 8.06MB/s] 56%|██████████████████████▎                 | 173M/311M [00:22<00:17, 8.09MB/s] 56%|██████████████████████▍                 | 174M/311M [00:23<00:17, 8.22MB/s] 56%|██████████████████████▌                 | 175M/311M [00:23<00:17, 8.17MB/s] 57%|██████████████████████▋                 | 176M/311M [00:23<00:17, 8.16MB/s] 57%|██████████████████████▊                 | 177M/311M [00:23<00:17, 8.22MB/s] 57%|██████████████████████▉                 | 178M/311M [00:23<00:16, 8.34MB/s] 58%|███████████████████████                 | 179M/311M [00:23<00:22, 6.24MB/s] 58%|███████████████████████▏                | 180M/311M [00:23<00:20, 6.77MB/s] 58%|███████████████████████▎                | 181M/311M [00:24<00:18, 7.25MB/s] 59%|███████████████████████▍                | 182M/311M [00:24<00:17, 7.55MB/s] 59%|███████████████████████▌                | 183M/311M [00:24<00:17, 7.69MB/s] 59%|███████████████████████▋                | 184M/311M [00:24<00:17, 7.71MB/s] 60%|███████████████████████▊                | 185M/311M [00:24<00:16, 7.92MB/s] 60%|███████████████████████▉                | 186M/311M [00:24<00:16, 7.94MB/s] 60%|████████████████████████                | 187M/311M [00:24<00:16, 8.07MB/s] 60%|████████████████████████▏               | 188M/311M [00:24<00:15, 8.25MB/s] 61%|████████████████████████▎               | 189M/311M [00:25<00:15, 8.32MB/s] 61%|████████████████████████▍               | 190M/311M [00:25<00:15, 8.25MB/s] 61%|████████████████████████▌               | 191M/311M [00:25<00:15, 8.23MB/s] 62%|████████████████████████▋               | 192M/311M [00:25<00:15, 8.20MB/s] 62%|████████████████████████▊               | 193M/311M [00:25<00:14, 8.32MB/s] 62%|████████████████████████▉               | 194M/311M [00:25<00:14, 8.35MB/s] 63%|█████████████████████████               | 195M/311M [00:25<00:14, 8.33MB/s] 63%|█████████████████████████▏              | 196M/311M [00:25<00:14, 8.38MB/s] 63%|█████████████████████████▎              | 197M/311M [00:26<00:14, 8.41MB/s] 64%|█████████████████████████▍              | 198M/311M [00:26<00:14, 8.41MB/s] 64%|█████████████████████████▌              | 199M/311M [00:26<00:14, 8.27MB/s] 64%|█████████████████████████▋              | 200M/311M [00:26<00:14, 8.23MB/s] 65%|█████████████████████████▊              | 201M/311M [00:26<00:14, 8.16MB/s] 65%|█████████████████████████▉              | 202M/311M [00:26<00:14, 8.12MB/s] 65%|██████████████████████████              | 203M/311M [00:26<00:13, 8.15MB/s] 66%|██████████████████████████▏             | 204M/311M [00:26<00:13, 8.22MB/s] 66%|██████████████████████████▍             | 205M/311M [00:27<00:13, 8.28MB/s] 66%|██████████████████████████▌             | 206M/311M [00:27<00:13, 8.15MB/s] 67%|██████████████████████████▋             | 207M/311M [00:27<00:13, 8.26MB/s] 67%|██████████████████████████▊             | 208M/311M [00:27<00:13, 8.17MB/s] 67%|██████████████████████████▉             | 209M/311M [00:27<00:13, 8.14MB/s] 68%|███████████████████████████             | 210M/311M [00:27<00:13, 8.06MB/s] 68%|███████████████████████████▏            | 211M/311M [00:27<00:13, 8.03MB/s] 68%|███████████████████████████▎            | 212M/311M [00:27<00:12, 8.14MB/s] 69%|███████████████████████████▍            | 213M/311M [00:28<00:12, 8.09MB/s] 69%|███████████████████████████▌            | 214M/311M [00:28<00:12, 8.12MB/s] 69%|███████████████████████████▋            | 215M/311M [00:28<00:12, 8.27MB/s] 69%|███████████████████████████▊            | 216M/311M [00:28<00:11, 8.37MB/s] 70%|███████████████████████████▉            | 217M/311M [00:28<00:11, 8.45MB/s] 70%|████████████████████████████            | 218M/311M [00:28<00:11, 8.42MB/s] 70%|████████████████████████████▏           | 219M/311M [00:29<00:15, 6.18MB/s] 71%|████████████████████████████▎           | 220M/311M [00:29<00:14, 6.71MB/s] 71%|████████████████████████████▍           | 221M/311M [00:29<00:13, 7.07MB/s] 71%|████████████████████████████▌           | 222M/311M [00:29<00:12, 7.30MB/s] 72%|████████████████████████████▋           | 223M/311M [00:29<00:12, 7.50MB/s] 72%|████████████████████████████▊           | 224M/311M [00:29<00:11, 7.84MB/s] 72%|████████████████████████████▉           | 225M/311M [00:29<00:11, 8.01MB/s] 73%|█████████████████████████████           | 226M/311M [00:29<00:11, 7.98MB/s] 73%|█████████████████████████████▏          | 227M/311M [00:30<00:10, 8.09MB/s] 73%|█████████████████████████████▎          | 228M/311M [00:30<00:10, 8.12MB/s] 74%|█████████████████████████████▍          | 229M/311M [00:30<00:11, 7.70MB/s] 74%|█████████████████████████████▌          | 230M/311M [00:30<00:10, 7.81MB/s] 74%|█████████████████████████████▋          | 231M/311M [00:30<00:10, 7.98MB/s] 75%|█████████████████████████████▊          | 232M/311M [00:30<00:10, 7.78MB/s] 75%|█████████████████████████████▉          | 233M/311M [00:30<00:10, 7.63MB/s] 75%|██████████████████████████████          | 234M/311M [00:30<00:10, 7.72MB/s] 76%|██████████████████████████████▏         | 235M/311M [00:31<00:10, 7.83MB/s] 76%|██████████████████████████████▎         | 236M/311M [00:31<00:09, 7.91MB/s] 76%|██████████████████████████████▍         | 237M/311M [00:31<00:09, 8.23MB/s] 77%|██████████████████████████████▌         | 238M/311M [00:31<00:09, 8.26MB/s] 77%|██████████████████████████████▊         | 239M/311M [00:31<00:08, 8.40MB/s] 77%|██████████████████████████████▉         | 240M/311M [00:31<00:08, 8.51MB/s] 78%|███████████████████████████████         | 241M/311M [00:31<00:08, 8.53MB/s] 78%|███████████████████████████████▏        | 242M/311M [00:31<00:08, 8.59MB/s] 78%|███████████████████████████████▎        | 243M/311M [00:32<00:08, 8.45MB/s] 78%|███████████████████████████████▍        | 244M/311M [00:32<00:08, 8.49MB/s] 79%|███████████████████████████████▌        | 245M/311M [00:32<00:08, 8.37MB/s] 79%|███████████████████████████████▋        | 246M/311M [00:32<00:08, 8.35MB/s] 79%|███████████████████████████████▊        | 247M/311M [00:32<00:08, 8.20MB/s] 80%|███████████████████████████████▉        | 248M/311M [00:32<00:07, 8.27MB/s] 80%|████████████████████████████████        | 249M/311M [00:32<00:07, 8.33MB/s] 80%|████████████████████████████████▏       | 250M/311M [00:32<00:07, 8.33MB/s] 81%|████████████████████████████████▎       | 251M/311M [00:33<00:07, 8.19MB/s] 81%|████████████████████████████████▍       | 252M/311M [00:33<00:07, 8.23MB/s] 81%|████████████████████████████████▌       | 253M/311M [00:33<00:07, 8.12MB/s] 82%|████████████████████████████████▋       | 254M/311M [00:33<00:07, 8.22MB/s] 82%|████████████████████████████████▊       | 255M/311M [00:33<00:06, 8.41MB/s] 82%|████████████████████████████████▉       | 256M/311M [00:33<00:06, 8.41MB/s] 83%|█████████████████████████████████       | 257M/311M [00:33<00:06, 8.34MB/s] 83%|█████████████████████████████████▏      | 258M/311M [00:33<00:06, 8.58MB/s] 83%|█████████████████████████████████▎      | 259M/311M [00:34<00:08, 6.28MB/s] 84%|█████████████████████████████████▍      | 260M/311M [00:34<00:07, 6.76MB/s] 84%|█████████████████████████████████▌      | 261M/311M [00:34<00:07, 7.16MB/s] 84%|█████████████████████████████████▋      | 262M/311M [00:34<00:06, 7.53MB/s] 85%|█████████████████████████████████▊      | 263M/311M [00:34<00:06, 7.76MB/s] 85%|█████████████████████████████████▉      | 264M/311M [00:34<00:06, 7.93MB/s] 85%|██████████████████████████████████      | 265M/311M [00:34<00:05, 8.03MB/s] 86%|██████████████████████████████████▏     | 266M/311M [00:35<00:05, 8.01MB/s] 86%|██████████████████████████████████▎     | 267M/311M [00:35<00:05, 8.01MB/s] 86%|██████████████████████████████████▍     | 268M/311M [00:35<00:05, 8.17MB/s] 87%|██████████████████████████████████▌     | 269M/311M [00:35<00:05, 8.17MB/s] 87%|██████████████████████████████████▋     | 270M/311M [00:35<00:05, 8.22MB/s] 87%|██████████████████████████████████▊     | 271M/311M [00:35<00:05, 8.25MB/s] 87%|██████████████████████████████████▉     | 272M/311M [00:35<00:04, 8.38MB/s] 88%|███████████████████████████████████▏    | 273M/311M [00:36<00:04, 8.29MB/s] 88%|███████████████████████████████████▎    | 274M/311M [00:36<00:04, 8.24MB/s] 88%|███████████████████████████████████▍    | 275M/311M [00:36<00:04, 8.33MB/s] 89%|███████████████████████████████████▌    | 276M/311M [00:36<00:04, 8.49MB/s] 89%|███████████████████████████████████▋    | 277M/311M [00:36<00:04, 8.43MB/s] 89%|███████████████████████████████████▊    | 278M/311M [00:36<00:04, 8.20MB/s] 90%|███████████████████████████████████▉    | 279M/311M [00:36<00:04, 8.08MB/s] 90%|████████████████████████████████████    | 280M/311M [00:36<00:03, 8.22MB/s] 90%|████████████████████████████████████▏   | 281M/311M [00:37<00:03, 8.12MB/s] 91%|████████████████████████████████████▎   | 282M/311M [00:37<00:03, 8.05MB/s] 91%|████████████████████████████████████▍   | 283M/311M [00:37<00:03, 8.18MB/s] 91%|████████████████████████████████████▌   | 284M/311M [00:37<00:03, 8.23MB/s] 92%|████████████████████████████████████▋   | 285M/311M [00:37<00:03, 8.19MB/s] 92%|████████████████████████████████████▊   | 286M/311M [00:37<00:03, 8.23MB/s] 92%|████████████████████████████████████▉   | 287M/311M [00:37<00:03, 8.25MB/s] 93%|█████████████████████████████████████   | 288M/311M [00:37<00:02, 8.17MB/s] 93%|█████████████████████████████████████▏  | 289M/311M [00:38<00:02, 8.24MB/s] 93%|█████████████████████████████████████▎  | 290M/311M [00:38<00:02, 8.27MB/s] 94%|█████████████████████████████████████▍  | 291M/311M [00:38<00:02, 8.30MB/s] 94%|█████████████████████████████████████▌  | 292M/311M [00:38<00:02, 8.32MB/s] 94%|█████████████████████████████████████▋  | 293M/311M [00:38<00:02, 8.24MB/s] 95%|█████████████████████████████████████▊  | 294M/311M [00:38<00:02, 8.19MB/s] 95%|█████████████████████████████████████▉  | 295M/311M [00:38<00:02, 8.16MB/s] 95%|██████████████████████████████████████  | 296M/311M [00:38<00:01, 8.24MB/s] 96%|██████████████████████████████████████▏ | 297M/311M [00:39<00:01, 7.97MB/s] 96%|██████████████████████████████████████▎ | 298M/311M [00:39<00:01, 8.02MB/s] 96%|██████████████████████████████████████▍ | 299M/311M [00:39<00:02, 5.94MB/s] 97%|██████████████████████████████████████▌ | 300M/311M [00:39<00:01, 6.47MB/s] 97%|██████████████████████████████████████▋ | 301M/311M [00:39<00:01, 6.97MB/s] 97%|██████████████████████████████████████▊ | 302M/311M [00:39<00:01, 7.28MB/s] 97%|██████████████████████████████████████▉ | 303M/311M [00:40<00:01, 7.44MB/s] 98%|███████████████████████████████████████ | 304M/311M [00:40<00:00, 7.60MB/s] 98%|███████████████████████████████████████▏| 305M/311M [00:40<00:00, 7.65MB/s] 98%|███████████████████████████████████████▎| 306M/311M [00:40<00:00, 7.87MB/s] 99%|███████████████████████████████████████▌| 307M/311M [00:40<00:00, 7.81MB/s] 99%|███████████████████████████████████████▋| 308M/311M [00:40<00:00, 8.01MB/s] 99%|███████████████████████████████████████▊| 309M/311M [00:40<00:00, 7.93MB/s]100%|███████████████████████████████████████▉| 310M/311M [00:40<00:00, 7.96MB/s]100%|████████████████████████████████████████| 311M/311M [00:41<00:00, 8.21MB/s]
100%|████████████████████████████████████████| 311M/311M [00:41<00:00, 7.94MB/s]
Show the code
# Download https://www.kaggle.com/datasets/jylim21/malaysia-public-data

!kaggle datasets download -p .data/ jylim21/malaysia-public-data --unzip
Dataset URL: https://www.kaggle.com/datasets/jylim21/malaysia-public-data
License(s): Community Data License Agreement - Permissive - Version 1.0
Downloading malaysia-public-data.zip to .data
  0%|                                                | 0.00/156k [00:00<?, ?B/s]100%|█████████████████████████████████████████| 156k/156k [00:00<00:00, 650kB/s]
100%|█████████████████████████████████████████| 156k/156k [00:00<00:00, 648kB/s]
Show the code
# Disable all warnings

import warnings

warnings.filterwarnings("ignore")

Let’s load these as dataframes and inspect the first few rows of each dataset.

Show the code
# Load full_weather.csv and births.csv

import pandas as pd

weather = pd.read_csv(".data/full_weather.csv")
births = pd.read_csv(".data/births.csv")

Preprocessing the data

We will need to adjust the available data to fit our purposes, including filling in gaps and merging the data we are interested in.

Show the code
# Display the first 5 rows of each dataframe

weather.head().style.background_gradient(cmap="Greens")
  datetime place city state temperature pressure dew_point humidity wind_speed gust wind_chill uv_index feels_like_temperature visibility solar_radiation pollutant_value precipitation_rate precipitation_total
0 1996-08-09 13:30:00 Tanjung Aru Kota Kinabalu Sabah 32.000000 1006.160000 25.000000 66.000000 9.000000 nan 32.000000 nan 39.000000 9.000000 nan nan nan nan
1 1996-08-09 13:30:00 Batu Maung Bayan Lepas Pulau Pinang 25.000000 1008.640000 24.000000 94.000000 4.000000 nan 25.000000 nan 25.000000 6.000000 nan nan nan nan
2 1996-08-09 13:30:00 Sepang Sepang Kuala Lumpur 29.000000 1006.970000 23.000000 70.000000 2.000000 nan 29.000000 nan 32.000000 9.000000 nan nan nan nan
3 1996-08-09 13:30:00 Kota Sentosa Kuching Sarawak 33.000000 1003.780000 24.000000 59.000000 nan nan 33.000000 nan 39.000000 9.000000 nan nan nan nan
4 1996-08-09 14:30:00 Sepang Sepang Kuala Lumpur nan nan nan nan 7.000000 nan nan nan nan 9.000000 nan nan nan nan
Show the code
births.head().style.background_gradient(cmap="Greens")
  date state births
0 1920-01-01 Malaysia 96
1 1920-01-02 Malaysia 115
2 1920-01-03 Malaysia 111
3 1920-01-04 Malaysia 101
4 1920-01-05 Malaysia 95

The datetime column is a string, which we want to convert to a Pandas datetime object.

Show the code
# Convert the 'date' column in both dataframes to datetime

weather["datetime"] = pd.to_datetime(weather["datetime"])
births["date"] = pd.to_datetime(births["date"])

The weather dataset contains multiple measurements in a single day, which we will need to aggregate to daily values. Also, different measurements are available for different locations - we will average these to get a single value for the whole country, as births are recorded at the national level.

Show the code
# Average all features in the weather dataframe by day

weather["date"] = weather["datetime"].dt.date
births["date"] = births["date"].dt.date

# Drop the columns 'place', 'city', 'state', and 'datetime'
weather.drop(columns=["place", "city", "state", "datetime"], inplace=True)

# Group by date and calculate the mean
daily_average = weather.groupby("date").mean().reset_index()

# Replace the original DataFrame with the new one
weather = daily_average

Let’s check what each column of the weather dataset now looks like, and statistics for each column.

Show the code
weather.describe().drop("count").style.background_gradient(cmap="Greens")
  temperature pressure dew_point humidity wind_speed gust wind_chill uv_index feels_like_temperature visibility solar_radiation pollutant_value precipitation_rate precipitation_total
mean 27.543296 1008.099890 23.834273 81.327786 6.376730 26.282709 27.471683 0.687915 30.504578 8.582930 146.499316 42.956782 5.215477 11.674262
std 0.889741 2.010950 1.144247 5.867251 2.388884 19.768379 0.835277 0.891434 1.526218 0.458520 38.194387 10.978799 81.753558 109.725370
min 23.444444 997.887076 13.897368 15.484936 1.163329 0.220339 23.361953 0.000000 23.777778 2.786753 0.000000 15.125438 0.000000 0.000000
25% 26.934070 1006.932785 23.573865 79.582090 5.683849 3.380416 26.909586 0.000000 29.448357 8.538924 124.749915 36.769231 0.031584 0.390435
50% 27.457458 1008.033583 24.001509 82.568477 6.919926 33.250000 27.404160 0.000000 30.364568 8.681920 149.442652 41.880000 0.212534 2.106173
75% 28.073267 1009.330966 24.411568 84.750354 7.739110 40.400000 27.971838 1.635242 31.464080 8.786272 170.528152 48.311878 0.543907 5.624009
max 33.000000 1016.209393 26.504087 95.250709 38.743243 593.000000 33.000000 2.972896 40.000000 9.000000 419.813559 136.627451 2539.750000 2539.750000

We have two separate datasets, one for weather and one for births. Let us merge these on the date column.

Show the code
# Merge the two dataframes on the 'date' column, where the date is a datetime64 type

data = pd.merge(births, weather, on="date")
data.drop(columns=["state"], inplace=True)
Show the code
data.head().style.background_gradient(cmap="Greens")
  date births temperature pressure dew_point humidity wind_speed gust wind_chill uv_index feels_like_temperature visibility solar_radiation pollutant_value precipitation_rate precipitation_total
0 1996-08-09 1520 29.714286 1005.877143 24.000000 72.571429 5.857143 nan 29.714286 nan 33.571429 8.625000 nan nan nan nan
1 1996-08-17 1539 25.407407 1007.800690 23.769231 90.615385 7.695652 nan 25.423077 0.000000 26.615385 8.096774 nan nan nan nan
2 1996-08-18 1423 26.035714 1007.954800 23.464286 86.500000 6.775000 nan 25.700000 0.000000 27.660714 8.372881 nan nan nan nan
3 1996-09-11 1756 25.709677 1005.271724 23.709677 88.967742 5.631579 nan 25.709677 0.000000 27.032258 8.612903 nan nan nan nan
4 1996-09-12 1638 32.333333 1001.980000 23.833333 62.000000 9.833333 nan 32.333333 nan 37.833333 9.000000 nan nan nan nan

Let us also fill in any missing values using the mean for each column to fill in the gaps. This is a simple approach, and in practice more sophisticated methods to fill in missing data would need to be considered. For the purposes of this experiment, it will suffice.

Show the code
# Fill in missing values for each numerical column with the mean of that column

mean_values = data.select_dtypes(include="number").mean()
data.fillna(mean_values, inplace=True)
Show the code
data.head().style.background_gradient(cmap="Greens")
  date births temperature pressure dew_point humidity wind_speed gust wind_chill uv_index feels_like_temperature visibility solar_radiation pollutant_value precipitation_rate precipitation_total
0 1996-08-09 1520 29.714286 1005.877143 24.000000 72.571429 5.857143 27.890241 29.714286 0.651795 33.571429 8.625000 145.609684 42.495870 5.762057 12.349303
1 1996-08-17 1539 25.407407 1007.800690 23.769231 90.615385 7.695652 27.890241 25.423077 0.000000 26.615385 8.096774 145.609684 42.495870 5.762057 12.349303
2 1996-08-18 1423 26.035714 1007.954800 23.464286 86.500000 6.775000 27.890241 25.700000 0.000000 27.660714 8.372881 145.609684 42.495870 5.762057 12.349303
3 1996-09-11 1756 25.709677 1005.271724 23.709677 88.967742 5.631579 27.890241 25.709677 0.000000 27.032258 8.612903 145.609684 42.495870 5.762057 12.349303
4 1996-09-12 1638 32.333333 1001.980000 23.833333 62.000000 9.833333 27.890241 32.333333 0.651795 37.833333 9.000000 145.609684 42.495870 5.762057 12.349303
Show the code
data.describe().drop("count").style.background_gradient(cmap="Greens")
  births temperature pressure dew_point humidity wind_speed gust wind_chill uv_index feels_like_temperature visibility solar_radiation pollutant_value precipitation_rate precipitation_total
mean 1399.317195 27.527325 1007.996160 23.783828 81.213123 6.569408 27.890241 27.451720 0.651795 30.441393 8.582386 145.609684 42.495870 5.762057 12.349303
std 151.534289 0.879650 1.962270 1.131681 5.905187 2.211017 15.385685 0.804632 0.871842 1.454741 0.455176 20.630437 9.242458 53.834741 72.250576
min 697.000000 23.444444 997.887076 13.897368 15.484936 1.163329 0.220339 23.361953 0.000000 23.777778 2.786753 0.000000 15.125438 0.000000 0.000000
25% 1299.000000 26.930111 1006.886622 23.554108 79.513711 6.182207 27.890241 26.924528 0.000000 29.463124 8.549533 145.609684 38.520319 0.356371 3.218137
50% 1411.000000 27.449486 1007.970255 23.967937 82.460700 6.919526 27.890241 27.435050 0.000000 30.406473 8.672687 145.609684 42.495870 5.762057 12.349303
75% 1509.000000 28.041932 1009.161633 24.359104 84.654464 7.738506 37.000000 27.917808 1.611262 31.328596 8.783403 145.609684 44.772431 5.762057 12.349303
max 2200.000000 33.000000 1016.209393 26.255048 95.250709 38.743243 593.000000 33.000000 2.972896 40.000000 9.000000 419.813559 136.627451 2539.750000 2539.750000

Visualising a few features

Let’s visualise the data to get a better understanding of the trends and seasonality, and to develop an intuition of what we are trying to forecast. We will focus on births, temperature, and pollutant levels.

Show the code
import matplotlib.pyplot as plt

fig, axs = plt.subplots(3, 1, figsize=(8, 9))

axs[0].plot(data["date"], data["births"])
axs[0].set_title("Daily Births")
axs[0].set_xlabel("Date")
axs[0].set_ylabel("Births")

axs[1].plot(data["date"], data["temperature"])
axs[1].set_title("Temperature")
axs[1].set_xlabel("Date")
axs[1].set_ylabel("Temperature")

axs[2].plot(data["date"], data["pollutant_value"])
axs[2].set_title("Pollution")
axs[2].set_xlabel("Date")
axs[2].set_ylabel("Pollution")

plt.tight_layout()
plt.show()

Building the model

We can now build a Prophet model forecasting five years into the future, we will adjust Prophet’s change point prior scale to make the model more flexible. First, we will forecast temperature and pollutant levels, and then we will forecast the number of births using these two features as regressors.

About the Change Point Prior Scale

The change point prior scale parameter controls the flexibility of the model. A higher value makes the model more flexible, allowing it to capture more fluctuations in the data. However, this can lead to overfitting, so it is important to tune carefully.

Prophet requires the input data to have two columns: ds and y. The ds column contains dates, and the y column the values we want to forecast - in our case temperature, pollutant levels, and births.

Show the code
from prophet import Prophet

future_period = 365 * 5
prior_scale = 0.05

# Prepare the data for Prophet
df_temperature = data[["date", "temperature"]].rename(
    columns={"date": "ds", "temperature": "y"}
)
df_pollutant = data[["date", "pollutant_value"]].rename(
    columns={"date": "ds", "pollutant_value": "y"}
)

# Initialize the Prophet model
model_temperature = Prophet(changepoint_prior_scale=prior_scale)
model_pollutant = Prophet(changepoint_prior_scale=prior_scale)

# Fit the model
model_temperature.fit(df_temperature)

# Make a dataframe to hold future predictions
future_temperature = model_temperature.make_future_dataframe(periods=future_period)
forecast_temperature = model_temperature.predict(future_temperature)

model_pollutant.fit(df_pollutant)
future_pollutant = model_pollutant.make_future_dataframe(periods=future_period)
forecast_pollutant = model_pollutant.predict(future_pollutant)

Prophet includes inbuilt methods to easily visuallise the forecasted values, as well as uncertainty intervals. We will also include change points in the forecast plot, which are the points where the trend changes direction.

About Change Points

Prophet uses a piecewise linear model to capture the trend in the data. Change points are where the trend changes direction, and are automatically selected by the model. You can also manually specify individual change points if you have domain knowledge about the data.

Show the code
from prophet.plot import add_changepoints_to_plot
import matplotlib.pyplot as plt

# Create a figure with a 2-row, 1-column grid
fig, axs = plt.subplots(2, 1, figsize=(8, 9))

# Plot the temperature forecast on the first subplot
fig1 = model_temperature.plot(forecast_temperature, ax=axs[0], include_legend=True)
axs[0].set_title("Temperature Forecast with Changepoints")
add_changepoints_to_plot(axs[0], model_temperature, forecast_temperature)

# Plot the pollutant forecast on the second subplot
fig2 = model_pollutant.plot(forecast_pollutant, ax=axs[1], include_legend=True)
axs[1].set_title("Pollutant Forecast with Changepoints")
add_changepoints_to_plot(axs[1], model_pollutant, forecast_pollutant)

plt.tight_layout()
plt.show()

In addition we can plot the components of the forecast, including the trend, seasonality, and holidays. It helps to understand how these components contribute to the forecast. Notice how in the yearly seasonality plot, the model captures the peaks in temperature and pollutant levels during certain months.

Show the code
# Visualise the components of each forecast

fig3 = model_temperature.plot_components(forecast_temperature, figsize=(8, 6))
_ = fig3.suptitle("Temperature Forecast Components", fontsize=14)
fig4 = model_pollutant.plot_components(forecast_pollutant, figsize=(8, 6))
_ = fig4.suptitle("Pollutant Forecast Components", fontsize=14)

Predicting births

We now want to forecast the number of future births. In addition we want to use temperature and pollutant levels as regressors in the model. Let us build a new Prophet model that includes these regressors.

About Regressors

Including additional regressors gives the ability to account for external factors that might influence the trend and seasonality of the data. This can improve the accuracy of the forecast, especially if these factors have a significant impact on the target variable. In this case, we are including temperature and pollutant levels as regressors as an illustration of how to use this feature in Prophet, in practice these might not be the most relevant factors for predicting births.

Show the code
df_births = data[["date", "births"]].rename(columns={"date": "ds", "births": "y"})

# Add temperature and pollutant values to the dataframe
df_births["temperature"] = data["temperature"]
df_births["pollutant_value"] = data["pollutant_value"]

Prophet also allows us to include factors such as holidays in the model. We can include public holidays in Malaysia, which will help the model to account for the impact of holidays on the number of births. We could also include other seasonalities or events that might affect the number of births, such as cultural or religious events. We are also including the temperature and pollutant levels as regressors in the model, as these might impact birth rates.

Other Regressors and Seasonalities

As an exercise, can you think of other regressors or seasonalities that might influence the number of births? For example, you could include certain economic indicators, social factors, or other external variables that might affect birth rates.

Show the code
model_births = Prophet(changepoint_prior_scale=prior_scale)

# Add Malaysian holidays to the model
model_births.add_country_holidays(country_name="MY")

# Add a monthly seasonality to the model
model_births.add_seasonality(name="monthly", period=30.5, fourier_order=5)

# Add the temperature and pollutant value to the model as regressors
model_births.add_regressor("temperature")
model_births.add_regressor("pollutant_value")

# Fit the model
model_births.fit(df_births)

# Make a dataframe to hold future predictions
future_births = model_births.make_future_dataframe(periods=future_period)

# Add forecasted temperature and pollutant values to future_births
future_births["temperature"] = forecast_temperature["yhat"][
    -len(future_births) :
].reset_index(drop=True)
future_births["pollutant_value"] = forecast_pollutant["yhat"][
    -len(future_births) :
].reset_index(drop=True)

# Predict the future
forecast_births = model_births.predict(future_births)

Now that we have completed a forecast, we can plot the predicted values, as well as uncertainty intervals, just as we did before for temperature and pollutant levels.

Show the code
# Visualize the forecast

fig5 = model_births.plot(forecast_births, include_legend=True, figsize=(8, 6))
fig5.gca().set_title("Births Forecast with Changepoints")

# Add changepoints to the plot
a = add_changepoints_to_plot(fig5.gca(), model_births, forecast_births)

Let us also plot the components of the forecast, including the trend, seasonality, holidays, and the impact of the regressors. This allows us to understand how these components contribute to the forecast and if and how the regressors influence the number of births.

Show the code
# Visualise the components of the forecast

fig6 = model_births.plot_components(forecast_births, figsize=(8, 12))
_ = fig6.suptitle("Births Forecast Components", fontsize=14)

Interestingly, there is a negative effect of many holidays on the number of births - this might be due to the fact that many births are planned, and people might avoid giving birth on holidays, or there might be other non-represented factors at play. Additionally we see that the number of births is highest between September and November.

Cross validating the model

Prophet provides a convenient way to cross-validate the model using historical data. This allows us to evaluate the performance of the model on past data and tune the hyperparameters accordingly. We will use cross-validation to assess the forecast accuracy of the model and identify any potential issues. Cross validation in Prophet works on a rolling forecast origin, where the model is trained on historical data up to a certain point and then used to forecast known future data. We can then compare the forecasted values with the actual values to evaluate the model’s performance. The initial parameter specifies the size of the initial training period, and the period parameter specifies the size of the forecast horizon.

About Cross Validation

Cross validation produces a dataframe with yhat, yhat_lower, yhat_upper and y columns. The yhat column contains the forecasted values, the yhat_lower and yhat_upper columns contain the uncertainty intervals, and the y column contains the actual values. We can use this dataframe to calculate evaluation metrics such as mean absolute error (MAE), mean squared error (MSE), and root mean squared error (RMSE).

Show the code
# Cross validate the model

from prophet.diagnostics import cross_validation

df_births_cv = cross_validation(
    model_births, initial="730 days", period="180 days", horizon="365 days"
)
df_births_cv.head().style.background_gradient(cmap="Greens")
  ds yhat yhat_lower yhat_upper y cutoff
0 1998-12-05 00:00:00 1311.555304 1256.298492 1367.429416 1349 1998-12-04 00:00:00
1 1998-12-06 00:00:00 1216.420565 1161.138190 1268.924482 1311 1998-12-04 00:00:00
2 1998-12-07 00:00:00 1373.392566 1314.221541 1423.385528 1399 1998-12-04 00:00:00
3 1998-12-08 00:00:00 1434.990681 1383.377583 1489.791256 1423 1998-12-04 00:00:00
4 1998-12-09 00:00:00 1449.400677 1398.352673 1504.466093 1420 1998-12-04 00:00:00

These metrics provide a quantitative measure of the model’s accuracy and can help us evaluate the performance of the model.

We are particularly interested in MAPE (Mean Absolute Percentage Error), which is a relative measure of the forecast accuracy. It is calculated as the average of the absolute percentage errors between the forecasted and actual values. A lower MAPE indicates a more accurate forecast.

\(MAPE = \frac{1}{n} \sum_{i=1}^{n} \left| \frac{y_i - \hat{y}_i}{y_i} \right| \times 100\)

As an example, a MAPE of 0.046 would indicate that the forecast is 4.6% off from the actual value.

Show the code
from prophet.diagnostics import performance_metrics

df_births_cv_p = performance_metrics(df_births_cv)
df_births_cv_p.head().style.background_gradient(cmap="Greens")
  horizon mse rmse mae mape mdape smape coverage
0 37 days 00:00:00 7044.649211 83.932409 64.313467 0.046796 0.037140 0.045922 0.681334
1 38 days 00:00:00 7065.587597 84.057050 64.387762 0.046818 0.036911 0.045946 0.679283
2 39 days 00:00:00 7081.623627 84.152383 64.466984 0.046857 0.037161 0.045984 0.678496
3 40 days 00:00:00 7159.995715 84.616758 64.878598 0.047116 0.037406 0.046254 0.675635
4 41 days 00:00:00 7172.213365 84.688921 65.028362 0.047190 0.037506 0.046333 0.673288

We can plot the MAPE values for each forecast horizon to see how the forecast accuracy changes over time.

Show the code
# Plot the MAPE performance metric

from prophet.plot import plot_cross_validation_metric

fig7 = plot_cross_validation_metric(df_births_cv, metric="mape", figsize=(8, 6))

Notice how this metric stays relatively stable over time, around or just below 5%. This indicates that the model is performing well and providing accurate forecasts.

Let us now plot the output of the cross-validation, showing the actual values and forecasted values superimposed on each other. This allows us to visually inspect the accuracy of the forecast over the period and horizon of the cross-validation.

Show the code
# Create a figure and axis

plt.figure(figsize=(8, 6))

# Plot actual values (y) as a scatter plot
plt.scatter(
    df_births_cv["ds"],
    df_births_cv["y"],
    color="blue",
    label="Actual Births (y)",
    alpha=1.0,
)

# Plot predicted values (yhat) as a scatter plot
plt.scatter(
    df_births_cv["ds"],
    df_births_cv["yhat"],
    color="red",
    label="Predicted Births (yhat)",
    alpha=0.5,
)

# Add labels and title
plt.xlabel("Date (ds)")
plt.ylabel("Births")
plt.title("Actual vs Predicted Births Over Time")
plt.legend()

# Show plot
plt.show()

Final remarks

This experiment demonstrates how Prophet can be effectively used to forecast births in Malaysia by combining historical data with external factors such as temperature and pollutant levels. Integrating these regressors enabled the model to better capture seasonal patterns and underlying trends, as evidenced by the consistent performance across cross-validation metrics—including a stable MAPE of around 5%. Overall, the approach not only validates the robustness of Prophet for time series forecasting but also lays the groundwork for further enhancements. Future work might explore additional variables, like economic or social indicators, to refine predictions even further.

Reuse

This work is licensed under CC BY (View License)