Probabilistic forecasting: prediction intervals and prediction distribution¶
When trying to anticipate future values, most forecasting models try to predict what will be the most likely value. This is called point-forecasting. Although knowing in advance the expected value of a time series is useful in almost every business case, this kind of prediction does not provide any information about the confidence of the model nor the prediction uncertainty.
Probabilistic forecasting, as opposed to point-forecasting, is a family of techniques that allow for predicting the expected distribution of the outcome instead of a single future value. This type of forecasting provides much rich information since it allows for creating prediction intervals, the range of likely values where the true value may fall. More formally, a prediction interval defines the interval within which the true value of the response variable is expected to be found with a given probability.
There are multiple ways to estimate prediction intervals, most of which require that the residuals (errors) of the model follow a normal distribution. When this property cannot be assumed, two alternatives commonly used are bootstrapping and quantile regression. To illustrate how skforecast allows estimating prediction intervals for multi-step forecasting, the following examples are shown:
Prediction intervals based on bootstrapped residuals and recursive-multi-step forecaster.
Prediction intervals based on quantile regression and direct-multi-step forecaster.
All forecasters in skforecast have four different methods that allow for probabilistic forecasting:
predict_bootstrapping
: this method generates multiple forecasting predictions through a bootstrapping process. By sampling from a collection of past observed errors (the residuals), each bootstrapping iteration generates a different set of predictions. The output is apandas DataFrame
with one row for each predicted step and one column for each bootstrapping iteration.predict_intervals
: this method estimates quantile prediction intervals using the values generated withpredict_bootstrapping
.predict_quantiles
: this method estimates a list of quantile predictions using the values generated withpredict_bootstrapping
.predict_dist
: this method fits a parametric distribution using the values generated withpredict_bootstrapping
. Any of the continuous distributions available in scipy.stats can be used.
The four can use in-sample residuals (default) or out-sample residuals. In both cases, the residuals can be conditioned on the predicted value to try to account for the existence of a correlation between the predicted values and the residuals.
⚠ Warning
As Rob J Hyndman explains in his blog, in real-world problems, almost all prediction intervals are too narrow. For example, nominal 95% intervals may only provide coverage between 71% and 87%. This is a well-known phenomenon and arises because they do not account for all sources of uncertainty. With forecasting models, there are at least four sources of uncertainty:
- The random error term
- The parameter estimates
- The choice of model for the historical data
- The continuation of the historical data generating process into the future
When producing prediction intervals for time series models, generally only the first of these sources is taken into account. Therefore, it is advisable to use test data to validate the empirical coverage of the interval and not solely rely on the expected coverage.
✎ Note
Conformal prediction is a relatively new framework that allows for the creation of confidence measures for predictions made by machine learning models. This method is on the roadmap of skforecast, but not yet available.
Prediction intervals using bootstrapped residuals¶
The error of a one-step-ahead forecast is defined as the difference between the actual value and the predicted value ($e_t = y_t - \hat{y}_{t|t-1}$). By assuming that future errors will be similar to past errors, it is possible to simulate different predictions by taking samples from the collection of errors previously seen in the past (i.e., the residuals) and adding them to the predictions.
Diagram bootstrapping prediction process.
Repeatedly performing this process creates a collection of slightly different predictions, which represent the distribution of possible outcomes due to the expected variance in the forecasting process.
Bootstrapping predictions.
Using the outcome of the bootstrapping process, prediction intervals can be computed by calculating the $α/2$ and $1 − α/2$ percentiles at each forecasting horizon.
Alternatively, it is also possible to fit a parametric distribution for each forecast horizon.
One of the main advantages of this strategy is that it requires only a single model to estimate any interval. However, performing hundreds or thousands of bootstrapping iterations can be computationally expensive and may not always be feasible.
Libraries and data¶
# Data processing
# ==============================================================================
import numpy as np
import pandas as pd
from skforecast.datasets import fetch_dataset
# Plots
# ==============================================================================
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from skforecast.plot import plot_residuals
from skforecast.plot import plot_prediction_distribution
from skforecast.plot import plot_prediction_intervals
from pprint import pprint
# Modelling and Forecasting
# ==============================================================================
from scipy.stats import norm
from lightgbm import LGBMRegressor
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.metrics import mean_pinball_loss
from skforecast.recursive import ForecasterRecursive
from skforecast.direct import ForecasterDirect
from skforecast.model_selection import TimeSeriesFold
from skforecast.model_selection import backtesting_forecaster
# Configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')
plt.style.use('seaborn-v0_8-darkgrid')
⚠ Warning
To create a sufficiently illustrative user guide, the data download process takes about 1 minute, and some functions called during the guide may take a few seconds to run. We appreciate your patience.
# Data download
# ==============================================================================
data = fetch_dataset(name='bike_sharing_extended_features')
data.head(2)
bike_sharing_extended_features ------------------------------ Hourly usage of the bike share system in the city of Washington D.C. during the years 2011 and 2012. In addition to the number of users per hour, the dataset was enriched by introducing supplementary features. Addition includes calendar- based variables (day of the week, hour of the day, month, etc.), indicators for sunlight, incorporation of rolling temperature averages, and the creation of polynomial features generated from variable pairs. All cyclic variables are encoded using sine and cosine functions to ensure accurate representation. Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C5W894. Shape of the dataset: (17352, 90)
users | weather | month_sin | month_cos | week_of_year_sin | week_of_year_cos | week_day_sin | week_day_cos | hour_day_sin | hour_day_cos | ... | temp_roll_mean_1_day | temp_roll_mean_7_day | temp_roll_max_1_day | temp_roll_min_1_day | temp_roll_max_7_day | temp_roll_min_7_day | holiday_previous_day | holiday_next_day | temp | holiday | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date_time | |||||||||||||||||||||
2011-01-08 00:00:00 | 25.0 | mist | 0.5 | 0.866025 | 0.120537 | 0.992709 | -0.781832 | 0.62349 | 0.258819 | 0.965926 | ... | 8.063334 | 10.127976 | 9.02 | 6.56 | 18.86 | 4.92 | 0.0 | 0.0 | 7.38 | 0.0 |
2011-01-08 01:00:00 | 16.0 | mist | 0.5 | 0.866025 | 0.120537 | 0.992709 | -0.781832 | 0.62349 | 0.500000 | 0.866025 | ... | 8.029166 | 10.113334 | 9.02 | 6.56 | 18.86 | 4.92 | 0.0 | 0.0 | 7.38 | 0.0 |
2 rows × 90 columns
# One hot encoding of categorical variables
# ==============================================================================
encoder = ColumnTransformer(
[('one_hot_encoder', OneHotEncoder(sparse_output=False), ['weather'])],
remainder='passthrough',
verbose_feature_names_out=False
).set_output(transform="pandas")
data = encoder.fit_transform(data)
# Select exogenous variables to be included in the model
# ==============================================================================
exog_features = [
'weather_clear', 'weather_mist', 'weather_rain', 'month_sin', 'month_cos',
'week_of_year_sin', 'week_of_year_cos', 'week_day_sin', 'week_day_cos',
'hour_day_sin', 'hour_day_cos', 'sunrise_hour_sin', 'sunrise_hour_cos',
'sunset_hour_sin', 'sunset_hour_cos', 'temp', 'holiday'
]
data = data[['users'] + exog_features]
# Split train-validation-test
# ==============================================================================
data = data.loc['2011-05-30 23:59:00':, :]
end_train = '2012-08-30 23:59:00'
end_validation = '2012-11-15 23:59:00'
data_train = data.loc[: end_train, :]
data_val = data.loc[end_train:end_validation, :]
data_test = data.loc[end_validation:, :]
print(f"Dates train : {data_train.index.min()} --- {data_train.index.max()} (n={len(data_train)})")
print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()} (n={len(data_val)})")
print(f"Dates test : {data_test.index.min()} --- {data_test.index.max()} (n={len(data_test)})")
Dates train : 2011-05-31 00:00:00 --- 2012-08-30 23:00:00 (n=10992) Dates validacion : 2012-08-31 00:00:00 --- 2012-11-15 23:00:00 (n=1848) Dates test : 2012-11-16 00:00:00 --- 2012-12-30 23:00:00 (n=1080)
# Plot time series partition
# ==============================================================================
fig, ax = plt.subplots(figsize=(9, 3))
data_train['users'].plot(label='train', ax=ax)
data_val['users'].plot(label='validation', ax=ax)
data_test['users'].plot(label='test', ax=ax)
ax.yaxis.set_major_formatter(ticker.EngFormatter())
ax.set_title('Number of users')
ax.legend();
Intervals with In-sample residuals¶
By default, intervals can be computed using in-sample residuals (residuals from the training set), either by calling the predict_interval()
method, or by performing a full backtesting procedure. However, this can result in intervals that are too narrow (overly optimistic).
# Create and fit forecaster
# ==============================================================================
params = {
"n_estimators": 600,
"max_depth": 6,
"min_data_in_leaf": 88,
"learning_rate": 0.2520098236227423,
"feature_fraction": 0.6,
"max_bin": 75,
"reg_alpha": 1.0,
"reg_lambda": 0.8,
}
lags = 48
forecaster = ForecasterRecursive(
regressor = LGBMRegressor(random_state=15926, verbose=-1, **params),
lags = lags
)
forecaster.fit(
y = data.loc[:end_validation, 'users'],
exog = data.loc[:end_validation, exog_features]
)
# In-sample residuals stored during fit
# ==============================================================================
print("Amount of residuals stored:", len(forecaster.in_sample_residuals_))
forecaster.in_sample_residuals_
Amount of residuals stored: 10000
array([-2.36857116, -2.77682126, -4.03075774, ..., -4.12456496, 4.45391786, 17.70362401])
# Predict interval with in-sample residuals
# ==============================================================================
predictions = forecaster.predict_interval(
exog = data_test[exog_features],
steps = 7,
interval = [10, 90]
)
predictions
pred | lower_bound | upper_bound | |
---|---|---|---|
2012-11-16 00:00:00 | 70.009970 | 53.587645 | 85.206036 |
2012-11-16 01:00:00 | 45.679914 | 30.385990 | 62.928811 |
2012-11-16 02:00:00 | 19.225220 | 2.777457 | 34.043910 |
2012-11-16 03:00:00 | -0.039409 | -15.202064 | 15.646154 |
2012-11-16 04:00:00 | 0.154831 | -14.950966 | 15.733683 |
2012-11-16 05:00:00 | 37.330998 | 25.131931 | 54.885782 |
2012-11-16 06:00:00 | 116.737843 | 69.809702 | 143.786683 |
The backtesting_forecaster()
function is used to generate the prediction intervals for the entire test set and calculate coverage of a given interval.
use_in_sample_residuals = True
is used to compute the intervals using in-sample residuals.The
interval
argument indicates the desired coverage probability of the prediction intervals. In this case,interval
is set to[10, 90]
, which means that the prediction intervals are calculated for the 10th and 90th percentiles, resulting in a theoretical coverage probability of 80%.The
n_boot
argument is used to specify the number of bootstrap samples to be used in estimating the prediction intervals. The larger the number of samples, the more accurate the prediction intervals will be, but the longer the calculation will take.
# Backtesting with prediction intervals in test data using in-sample residuals
# ==============================================================================
cv = TimeSeriesFold(
steps = 24,
initial_train_size = len(data.loc[:end_validation]),
refit = False
)
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'],
exog = data[exog_features],
cv = cv,
metric = 'mean_absolute_error',
interval = [10, 90], # 80% prediction interval
n_boot = 250,
use_in_sample_residuals = True, # Use in-sample residuals
use_binned_residuals = False,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
predictions.head(5)
0%| | 0/45 [00:00<?, ?it/s]
pred | lower_bound | upper_bound | |
---|---|---|---|
2012-11-16 00:00:00 | 70.009970 | 53.587645 | 85.206036 |
2012-11-16 01:00:00 | 45.679914 | 30.385990 | 62.928811 |
2012-11-16 02:00:00 | 19.225220 | 2.777457 | 34.043910 |
2012-11-16 03:00:00 | -0.039409 | -15.202064 | 15.646154 |
2012-11-16 04:00:00 | 0.154831 | -14.950966 | 15.733683 |
# Function to calculate coverage of a given interval
# ======================================================================================
def empirical_coverage(y, lower_bound, upper_bound):
"""
Calculate coverage of a given interval
"""
return np.mean(np.logical_and(y >= lower_bound, y <= upper_bound))
# Plot intervals
# ==============================================================================
plot_prediction_intervals(
predictions = predictions,
y_true = data_test,
target_variable = "users",
initial_x_zoom = ['2012-12-01', '2012-12-20'],
title = "Real value vs predicted in test data",
xaxis_title = "Date time",
yaxis_title = "users",
)
# Predicted interval coverage (on test data)
# ==============================================================================
coverage = empirical_coverage(
y = data.loc[end_validation:, 'users'],
lower_bound = predictions["lower_bound"],
upper_bound = predictions["upper_bound"]
)
print(f"Predicted interval coverage: {round(100 * coverage, 2)} %")
# Area of the interval
# ==============================================================================
area = (predictions["upper_bound"] - predictions["lower_bound"]).sum()
print(f"Area of the interval: {round(area, 2)}")
Predicted interval coverage: 62.78 % Area of the interval: 88854.52
The prediction intervals exhibit overconfidence as they tend to be excessively narrow, resulting in a true coverage that falls below the nominal coverage (80 %). This phenomenon arises from the tendency of in-sample residuals to often overestimate the predictive capacity of the model.
Out-sample residuals (non-conditioned on predicted values)¶
To address the issue of overoptimistic intervals, it is possible to use out-sample residuals (residuals from a validation set not seen during training) to estimate the prediction intervals. These residuals can be obtained through backtesting.
# Backtesting on validation data to obtain out-sample residuals
# ==============================================================================
cv = TimeSeriesFold(
steps = 24,
initial_train_size = len(data.loc[:end_train]),
refit = False
)
_, predictions_val = backtesting_forecaster(
forecaster = forecaster,
y = data.loc[:end_validation, 'users'],
exog = data.loc[:end_validation, exog_features],
cv = cv,
metric = 'mean_absolute_error',
n_jobs = 'auto',
verbose = False,
show_progress = True
)
0%| | 0/77 [00:00<?, ?it/s]
# Out-sample residuals distribution
# ==============================================================================
residuals = data.loc[predictions_val.index, 'users'] - predictions_val['pred']
print(pd.Series(np.where(residuals < 0, 'negative', 'positive')).value_counts())
plt.rcParams.update({'font.size': 8})
_ = plot_residuals(residuals=residuals, figsize=(7, 4))
positive 1118 negative 730 Name: count, dtype: int64
Then, set_out_sample_residuals()
method is used to specify the computed out-sample residuals.
# Store out-sample residuals in the forecaster
# ==============================================================================
forecaster.set_out_sample_residuals(
y_true = data.loc[predictions_val.index, 'users'],
y_pred = predictions_val['pred']
)
Now that the new residuals have been added to the forecaster, the prediction intervals can be calculated using use_in_sample_residuals = False
.
# Backtesting with prediction intervals in test data using out-sample residuals
# ==============================================================================
cv = TimeSeriesFold(
steps = 24,
initial_train_size = len(data.loc[:end_validation]),
refit = False
)
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'],
exog = data[exog_features],
cv = cv,
metric = 'mean_absolute_error',
interval = [10, 90], # 80% prediction interval
n_boot = 250,
use_in_sample_residuals = False, # Use out-sample residuals
use_binned_residuals = False,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
predictions.head(5)
0%| | 0/45 [00:00<?, ?it/s]
pred | lower_bound | upper_bound | |
---|---|---|---|
2012-11-16 00:00:00 | 70.009970 | 29.697636 | 175.720849 |
2012-11-16 01:00:00 | 45.679914 | 4.456010 | 189.808649 |
2012-11-16 02:00:00 | 19.225220 | -9.908056 | 179.402612 |
2012-11-16 03:00:00 | -0.039409 | -17.221435 | 192.666849 |
2012-11-16 04:00:00 | 0.154831 | -31.879388 | 225.809309 |
# Plot intervals
# ==============================================================================
plot_prediction_intervals(
predictions = predictions,
y_true = data_test,
target_variable = "users",
initial_x_zoom = ['2012-12-01', '2012-12-20'],
title = "Real value vs predicted in test data",
xaxis_title = "Date time",
yaxis_title = "users",
)
# Predicted interval coverage (on test data)
# ==============================================================================
coverage = empirical_coverage(
y = data.loc[end_validation:, 'users'],
lower_bound = predictions["lower_bound"],
upper_bound = predictions["upper_bound"]
)
print(f"Predicted interval coverage: {round(100 * coverage, 2)} %")
# Area of the interval
# ==============================================================================
area = (predictions["upper_bound"] - predictions["lower_bound"]).sum()
print(f"Area of the interval: {round(area, 2)}")
Predicted interval coverage: 75.37 % Area of the interval: 301725.86