Probabilistic forecasting¶
When trying to predict future values, most forecasting models try to predict what will be the most likely value. This is called point-forecasting. Although knowing the expected value of a time series in advance is useful in almost any business case, this type of prediction does not provide any information about the confidence of the model or the uncertainty of the prediction.
Probabilistic forecasting, as opposed to point-forecasting, is a family of techniques that allow the prediction of the expected distribution of the outcome rather than a single future value. This type of forecasting provides much richer information because it allows the creation of prediction intervals, the range of likely values where the true value may fall. More formally, a prediction interval defines the interval within which the true value of the response variable is expected to be found with a given probability.
Skforecast implements several methods for probabilistic forecasting.
Bootstrapped residuals¶
Bootstrapping is a statistical technique that allows for estimating the distribution of a statistic by resampling the data with replacement. In the context of forecasting, bootstrapping the residuals of a model allows for estimating the distribution of the errors, which can be used to create prediction intervals. Four methods are available in Skforecast for bootstrapping the residuals:
predict_bootstrapping
: this method generates multiple forecasting predictions through a bootstrapping process. By sampling from a collection of past observed errors (the residuals), each bootstrapping iteration generates a different set of predictions. The output is apandas DataFrame
with one row for each predicted step and one column for each bootstrapping iteration.predict_interval(method='bootstrapping')
: this method estimates quantile prediction intervals using the values generated withpredict_bootstrapping
.predict_quantiles
: this method estimates a list of quantile predictions using the values generated withpredict_bootstrapping
.predict_dist
: this method fits a parametric distribution using the values generated withpredict_bootstrapping
. Any of the continuous distributions available in scipy.stats can be used.
The four methods can use in-sample residuals (default) or out-sample residuals. In both cases, the residuals can be conditioned on the predicted value to try to account for the existence of a correlation between the predicted values and the residuals.
Discover how to use these methods in Probabilistic forecasting with bootstrapped residuals.
Conformal prediction¶
Conformal prediction is a framework for constructing prediction intervals that are guaranteed to contain the true value with a specified probability (coverage probability). It works by combining the predictions of a point-forecasting model with its past residuals—differences between previous predictions and actual values. These residuals help estimate the uncertainty in the forecast and determine the width of the prediction interval that is then added to the point forecast. Skforecast implements Split Conformal Prediction (SCP) through the predict_interval(method='conformal')
method.
Discover how to use conformal methods in Probabilistic forecasting with conformal prediction.
Conformal methods can also calibrate prediction intervals generated by other techniques, such as quantile regression or bootstrapped residuals. In this case, the conformal method adjusts the prediction intervals to ensure that they remain valid with respect to the coverage probability. Skforecast provides this functionality through the ConformalIntervalCalibrator
transformer.
Discover how to perform this calibration in Probabilistic forecasting, conformal calibration.
⚠ Warning
There are several well-established methods for conformal prediction, each with its own characteristics and assumptions. However, when applied to time series forecasting, their coverage guarantees are only valid for one-step-ahead predictions. For multi-step-ahead predictions, the coverage probability is not guaranteed. Skforecast implements Split Conformal Prediction (SCP) due to its simplicity and efficiency.
Quantile regression¶
Quantile regression is a technique for estimating the conditional quantiles of a response variable. By combining the predictions of two quantile regressors, an interval can be constructed, with each model estimating one of the bounds of the interval. For example, models trained on and produce an 80% prediction interval ().
If a machine learning algorithm capable of modeling quantiles is used as the regressor
in a forecaster, the predict
method will return predictions for a specified quantile. By creating two forecasters, each configured with a different quantile, their predictions can be combined to generate a prediction interval.
Discover how to use this method in Probabilistic forecasting with quantile regression.
Which Method to Use?¶
There is no definitive answer to this question, as the resulting coverage may vary depending on the dataset and regressor. However, some general guidelines can be followed:
Bootstrapped residuals: This method achieves good results in most cases, especially when using residuals conditioned on the predicted value (
use_binned_residuals = True
). However, it is computationally expensive, especially when using a large number of bootstrapping iterations, and may not scale well to large datasets or multiple time series.Conformal Prediction: Similar results to bootstrapping, but more computationally efficient (fast).
Quantile regression: An appropriate choice when the regressor is capable of modeling quantiles. However, two Forecasters must be trained — one for each quantile — which can be computationally expensive.
None of these methods guarantee coverage probability for multi-step predictions. Therefore, it is strongly recommended to validate the empirical coverage obtained with the chosen method. If the coverage is not satisfactory, the prediction intervals can be calibrated using the ConformalIntervalCalibrator
transformer.
For bootstrapped and conformal methods, it is recommended to use out-sample residuals (known as calibration residuals in conformal literature) as they provide a more realistic estimate of the prediction uncertainty (use_in_sample_residuals = False
). If in-sample residuals are used, the prediction intervals may be too narrow, resulting in low coverage.
⚠ Warning
As Rob J Hyndman explains in his blog, in real-world problems, almost all prediction intervals are too narrow. For example, nominal 95% intervals may only provide coverage between 71% and 87%. This is a well-known phenomenon and arises because they do not account for all sources of uncertainty. With forecasting models, there are at least four sources of uncertainty:
- The random error term
- The parameter estimates
- The choice of model for the historical data
- The continuation of the historical data generating process into the future
When producing prediction intervals for time series models, generally only the first of these sources is taken into account. Therefore, it is advisable to use test data to validate the empirical coverage of the interval and not only rely on the expected one.
💡 Tip
For more examples on how to use probabilistic forecasting, check out the following articles: