Prediction intervals in multi-step forecasting models¶

When trying to anticipate future values, most forecasting models try to predict what will be the most likely value. This is called point-forecasting. Although knowing in advance the expected value of a time series is useful in almost every business case, this kind of prediction does not provide any information about the confidence of the model or the prediction uncertainty.

Probabilistic forecasting, as opposed to point-forecasting, is a family of techniques that allow for predicting the expected distribution of the outcome instead of a single future value. This type of forecasting provides much rich information since it allows for creating prediction intervals, the range of likely values where the true value may fall. More formally, a prediction interval defines the interval within which the true value of the response variable is expected to be found with a given probability.

There are multiple ways to estimate prediction intervals, most of which require that the residuals (errors) of the model follow a normal distribution. When this property cannot be assumed, two alternatives commonly used are bootstrapping and quantile regression. To illustrate how skforecast allows estimating prediction intervals for multi-step forecasting, the following examples attempt to predict energy demand for a 7-day horizon. Two strategies are shown:

Prediction intervals based on bootstrapped residuals and recursive-multi-step forecaster.
Prediction intervals based on quantile regression and direct-multi-step forecaster.

Warning

As Rob J Hyndman explains in his blog, in real-world problems, almost all prediction intervals are too narrow. For example, nominal 95% intervals may only provide coverage between 71% and 87%. This is a well-known phenomenon and arises because they do not account for all sources of uncertainty. With forecasting models, there are at least four sources of uncertainty:

The random error term
The parameter estimates
The choice of model for the historical data
The continuation of the historical data generating process into the future

When producing prediction intervals for time series models, generally only the first of these sources is taken into account. Therefore, it is advisable to use test data to validate the empirical coverage of the interval and not only rely on the expected one.

Prediction intervals using bootstrapped residuals¶

The error of one-step-ahead forecast is defined as $e_t = y_t - \hat{y}_{t|t-1}$. Assuming future errors will be like past errors, it is possible to simulate different predictions by sampling from the collection of errors previously seen in the past (i.e., the residuals) and adding them to the predictions.

Doing this repeatedly, a collection of slightly different predictions is created (possible future paths), that represent the expected variance in the forecasting process.

Finally, prediction intervals can be computed by calculating the $α/2$ and $1 − α/2$ percentiles of the simulated data at each forecasting horizon.

The main advantage of this strategy is that it only requires a single model to estimate any interval. The drawback is that running hundreds or thousands of bootstrapping iterations, is computationally very expensive and not always workable.

This type of prediction interval can be easily estimated using the method predict_interval of classes ForecasterAutoreg and ForecasterAutoregCustom.

Prediction intervals using quantile regression models¶

As opposed to linear regression, which is intended to estimate the conditional mean of the response variable given certain values of the predictor variables, quantile regression aims at estimating the conditional quantiles of the response variable. For a continuous distribution function, the $\alpha$-quantile $Q_{\alpha}(x)$ is defined such that the probability of $Y$ being smaller than $Q_{\alpha}(x)$ is, for a given $X=x$, equal to $\alpha$. For example, 36% of the population values are lower than the quantile $Q=0.36$. The most known quantile is the 50%-quantile, more commonly called the median.

By combining the predictions of two quantile regressors, it is possible to build an interval. Each model estimates one of the limits of the interval. For example, the models obtained for $Q = 0.1$ and $Q = 0.9$ produce an 80% prediction interval (90% - 10% = 80%).

Several machine learning algorithms are capable of modeling quantiles. Some of them are:

Just as the squared-error loss function is used to train models that predict the mean value, a specific loss function is needed in order to train models that predict quantiles. The most common metric used for quantile regression is calles quantile loss or pinball loss:

$$\text{pinball}(y, \hat{y}) = \frac{1}{n_{\text{samples}}} \sum_{i=0}^{n_{\text{samples}}-1} \alpha \max(y_i - \hat{y}_i, 0) + (1 - \alpha) \max(\hat{y}_i - y_i, 0)$$

where $\alpha$ is the target quantile, $y$ the real value and $\hat{y}$ the quantile prediction.

It can be seen that loss differs depending on the evaluated quantile. The higher the quantile, the more the loss function penalizes underestimates, and the less it penalizes overestimates. As with MSE and MAE, the goal is to minimize its values (the lower loss, the better).

Two disadvantages of quantile regression, compared to the bootstrap approach to prediction intervals, are that each quantile needs its regressor and quantile regression is not available for all types of regression models. However, once the models are trained, the inference is much faster since no iterative process is needed.

This type of prediction intervals can be easily estimated using ForecasterAutoregDirect models.

Libraries¶

In [1]:

            
                Copied!
                
                    
                    
                
                

        
# Data processing
# ==============================================================================
import numpy as np
import pandas as pd

# Plots
# ==============================================================================
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
plt.style.use('fivethirtyeight')
plt.rcParams['lines.linewidth'] = 1.5
dark_style = {
    'figure.facecolor'  : '#212946',
    'axes.facecolor'    : '#212946',
    'savefig.facecolor' :'#212946',
    'axes.grid'         : True,
    'axes.grid.which'   : 'both',
    'axes.spines.left'  : False,
    'axes.spines.right' : False,
    'axes.spines.top'   : False,
    'axes.spines.bottom': False,
    'grid.color'        : '#2A3459',
    'grid.linewidth'    : '1',
    'text.color'        : '0.9',
    'axes.labelcolor'   : '0.9',
    'xtick.color'       : '0.9',
    'ytick.color'       : '0.9',
    'font.size'         : 12
}
plt.rcParams.update(dark_style)

# Modelling and Forecasting
# ==============================================================================
from lightgbm import LGBMRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.ForecasterAutoregDirect import ForecasterAutoregDirect
from skforecast.model_selection import grid_search_forecaster
from skforecast.model_selection import backtesting_forecaster
from sklearn.metrics import mean_pinball_loss

# Configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')
%config Completer.use_jedi = False
# Data processing
# ==============================================================================
import numpy as np
import pandas as pd

# Plots
# ==============================================================================
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
plt.style.use('fivethirtyeight')
plt.rcParams['lines.linewidth'] = 1.5
dark_style = {
    'figure.facecolor'  : '#212946',
    'axes.facecolor'    : '#212946',
    'savefig.facecolor' :'#212946',
    'axes.grid'         : True,
    'axes.grid.which'   : 'both',
    'axes.spines.left'  : False,
    'axes.spines.right' : False,
    'axes.spines.top'   : False,
    'axes.spines.bottom': False,
    'grid.color'        : '#2A3459',
    'grid.linewidth'    : '1',
    'text.color'        : '0.9',
    'axes.labelcolor'   : '0.9',
    'xtick.color'       : '0.9',
    'ytick.color'       : '0.9',
    'font.size'         : 12
}
plt.rcParams.update(dark_style)

# Modelling and Forecasting
# ==============================================================================
from lightgbm import LGBMRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.ForecasterAutoregDirect import ForecasterAutoregDirect
from skforecast.model_selection import grid_search_forecaster
from skforecast.model_selection import backtesting_forecaster
from sklearn.metrics import mean_pinball_loss

# Configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')
%config Completer.use_jedi = False

Data¶

Data used in this document were obtained from the R tsibbledata package. The dataset contains half-hourly electricity demand for Victoria (Australia) and additional information about the temperature and an indicator for if that day is a public holiday. In the following examples, data are aggregated at the daily level.

In [2]:

            
                Copied!
                
                    
                    
                
                

        
# Data download
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/' +
       'data/vic_elec.csv')
data = pd.read_csv(url, sep=',')

# Data preparation (aggregation at daily level)
# ==============================================================================
data['Time'] = pd.to_datetime(data['Time'], format='%Y-%m-%dT%H:%M:%SZ')
data = data.set_index('Time')
data = data.asfreq('30min')
data = data.sort_index()
data = data.drop(columns='Date')
data = data.resample(rule='D', closed='left', label ='right')\
       .agg({'Demand': 'sum', 'Temperature': 'mean', 'Holiday': 'max'})
data.head()
# Data download
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/' +
       'data/vic_elec.csv')
data = pd.read_csv(url, sep=',')

# Data preparation (aggregation at daily level)
# ==============================================================================
data['Time'] = pd.to_datetime(data['Time'], format='%Y-%m-%dT%H:%M:%SZ')
data = data.set_index('Time')
data = data.asfreq('30min')
data = data.sort_index()
data = data.drop(columns='Date')
data = data.resample(rule='D', closed='left', label ='right')\
       .agg({'Demand': 'sum', 'Temperature': 'mean', 'Holiday': 'max'})
data.head()

Out[2]:

	Demand	Temperature	Holiday
Time
2012-01-01	82531.745918	21.047727	True
2012-01-02	227778.257304	26.578125	True
2012-01-03	275490.988882	31.751042	True
2012-01-04	258955.329422	24.567708	False
2012-01-05	213792.376946	18.191667	False

In [3]:

            
                Copied!
                
                    
                    
                
                

        
# Split data into train-val-test
# ==============================================================================
data = data.loc['2012-01-01 00:00:00': '2014-12-30 23:00:00']
end_train = '2013-12-31 23:59:00'
end_validation = '2014-9-30 23:59:00'
data_train = data.loc[: end_train, :].copy()
data_val   = data.loc[end_train:end_validation, :].copy()
data_test  = data.loc[end_validation:, :].copy()

print(f"Train dates      : {data_train.index.min()} --- {data_train.index.max()}  (n={len(data_train)})")
print(f"Validation dates : {data_val.index.min()} --- {data_val.index.max()}  (n={len(data_val)})")
print(f"Test dates       : {data_test.index.min()} --- {data_test.index.max()}  (n={len(data_test)})")
# Split data into train-val-test
# ==============================================================================
data = data.loc['2012-01-01 00:00:00': '2014-12-30 23:00:00']
end_train = '2013-12-31 23:59:00'
end_validation = '2014-9-30 23:59:00'
data_train = data.loc[: end_train, :].copy()
data_val   = data.loc[end_train:end_validation, :].copy()
data_test  = data.loc[end_validation:, :].copy()

print(f"Train dates      : {data_train.index.min()} --- {data_train.index.max()}  (n={len(data_train)})")
print(f"Validation dates : {data_val.index.min()} --- {data_val.index.max()}  (n={len(data_val)})")
print(f"Test dates       : {data_test.index.min()} --- {data_test.index.max()}  (n={len(data_test)})")

Train dates      : 2012-01-01 00:00:00 --- 2013-12-31 00:00:00  (n=731)
Validation dates : 2014-01-01 00:00:00 --- 2014-09-30 00:00:00  (n=273)
Test dates       : 2014-10-01 00:00:00 --- 2014-12-30 00:00:00  (n=91)

In [4]:

            
                Copied!
                
                    
                    
                
                

        
# Plot time series partition
# ==============================================================================
fig, ax = plt.subplots(figsize=(11, 4))
data_train['Demand'].plot(label='train', ax=ax)
data_val['Demand'].plot(label='validation', ax=ax)
data_test['Demand'].plot(label='test', ax=ax)
ax.yaxis.set_major_formatter(ticker.EngFormatter())
ax.set_ylim(bottom=160_000)
ax.set_ylabel('MW')
ax.set_title('Energy demand')
ax.legend();
# Plot time series partition
# ==============================================================================
fig, ax = plt.subplots(figsize=(11, 4))
data_train['Demand'].plot(label='train', ax=ax)
data_val['Demand'].plot(label='validation', ax=ax)
data_test['Demand'].plot(label='test', ax=ax)
ax.yaxis.set_major_formatter(ticker.EngFormatter())
ax.set_ylim(bottom=160_000)
ax.set_ylabel('MW')
ax.set_title('Energy demand')
ax.legend();

In [5]:

            
                Copied!
                
# Autocorrelation plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
plot_acf(data.Demand, ax=ax, lags=30)
plt.show()
# Autocorrelation plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
plot_acf(data.Demand, ax=ax, lags=30)
plt.show()

Base on the autocorrelation plot, the last 7 days may be used as predictors.

Bootstrapping prediction intervals¶

A recursive-multi-step forecaster is trained and its hyper-parameters optimized. Then, prediction intervals based on bootstrapped residuals are estimated.

In [6]:

            
                Copied!
                
                    
                    
                
                

        
# Create forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = LGBMRegressor(random_state=123),
                 lags      = 7
             )
forecaster
# Create forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = LGBMRegressor(random_state=123),
                 lags      = 7
             )
forecaster

Out[6]:

================= 
ForecasterAutoreg 
================= 
Regressor: LGBMRegressor(random_state=123) 
Lags: [1 2 3 4 5 6 7] 
Transformer for y: None 
Transformer for exog: None 
Window size: 7 
Included exogenous: False 
Type of exogenous variable: None 
Exogenous variables names: None 
Training range: None 
Training index type: None 
Training index frequency: None 
Regressor parameters: {'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': -1, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 100, 'n_jobs': -1, 'num_leaves': 31, 'objective': None, 'random_state': 123, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'silent': 'warn', 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0} 
Creation date: 2022-09-24 08:58:30 
Last fit date: None 
Skforecast version: 0.5.0 
Python version: 3.9.13

In order to find the best value for the hyper-parameters, a grid search is carried out. It is important not to include test data in the search, otherwise model overfitting could happen.

In [7]:

            
                Copied!
                
                    
                    
                
                

        
# Grid search of hyper-parameters and lags
# ==============================================================================
# Regressor hyper-parameters
param_grid = {
    'n_estimators': [100, 500],
    'max_depth': [3, 5, 10],
    'learning_rate': [0.01, 0.1]
}

# Lags used as predictors
lags_grid = [7]

results_grid_q10 = grid_search_forecaster(
                       forecaster         = forecaster,
                       y                  = data.loc[:end_validation, 'Demand'],
                       param_grid         = param_grid,
                       lags_grid          = lags_grid,
                       steps              = 7,
                       refit              = True,
                       metric             = 'mean_squared_error',
                       initial_train_size = int(len(data_train)),
                       return_best        = True,
                       verbose            = False
                   )
# Grid search of hyper-parameters and lags
# ==============================================================================
# Regressor hyper-parameters
param_grid = {
    'n_estimators': [100, 500],
    'max_depth': [3, 5, 10],
    'learning_rate': [0.01, 0.1]
}

# Lags used as predictors
lags_grid = [7]

results_grid_q10 = grid_search_forecaster(
                       forecaster         = forecaster,
                       y                  = data.loc[:end_validation, 'Demand'],
                       param_grid         = param_grid,
                       lags_grid          = lags_grid,
                       steps              = 7,
                       refit              = True,
                       metric             = 'mean_squared_error',
                       initial_train_size = int(len(data_train)),
                       return_best        = True,
                       verbose            = False
                   )

Number of models compared: 12.

loop lags_grid: 100%|███████████████████████████████████████| 1/1 [00:31<00:00, 31.48s/it]

`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [1 2 3 4 5 6 7] 
  Parameters: {'learning_rate': 0.01, 'max_depth': 10, 'n_estimators': 500}
  Backtesting metric: 411133164.00389796

Once the best hyper-parameters have been found, the forecaster is ready to estimate forecasting intervals.

In [8]:

            
                Copied!
                
# Predict intervals for next 7 steps
# ==============================================================================
predictions = forecaster.predict_interval(steps=7, interval=[10, 90], n_boot=1000)
predictions
# Predict intervals for next 7 steps
# ==============================================================================
predictions = forecaster.predict_interval(steps=7, interval=[10, 90], n_boot=1000)
predictions

Out[8]:

	pred	lower_bound	upper_bound
2014-10-01	223264.432587	214633.692784	231376.785240
2014-10-02	233596.210854	223051.862727	244752.538248
2014-10-03	242653.671597	226947.654906	258361.695192
2014-10-04	220842.603128	209967.438912	236404.408845
2014-10-05	193652.241931	184619.338330	218354.110033
2014-10-06	193894.071610	186107.802546	207869.512062
2014-10-07	216094.828372	193668.009711	229810.210476

A backtesting process can be applied in order to evaluate the forecaster's performance on test data and calculate the real coverage of the estimated interval.

In [9]:

            
                Copied!
                
                    
                    
                
                

        
# Backtesting
# ==============================================================================
metric, predictions = backtesting_forecaster(
                          forecaster         = forecaster,
                          y                  = data['Demand'],
                          initial_train_size = len(data_train) + len(data_val),
                          steps              = 7,
                          refit              = True,
                          interval           = [10, 90],
                          n_boot             = 1000,
                          metric             = 'mean_squared_error',
                          verbose            = False
                      )
predictions.head(4)
# Backtesting
# ==============================================================================
metric, predictions = backtesting_forecaster(
                          forecaster         = forecaster,
                          y                  = data['Demand'],
                          initial_train_size = len(data_train) + len(data_val),
                          steps              = 7,
                          refit              = True,
                          interval           = [10, 90],
                          n_boot             = 1000,
                          metric             = 'mean_squared_error',
                          verbose            = False
                      )
predictions.head(4)

Out[9]:

	pred	lower_bound	upper_bound
2014-10-01	223264.432587	214633.692784	231376.785240
2014-10-02	233596.210854	223051.862727	244752.538248
2014-10-03	242653.671597	226947.654906	258361.695192
2014-10-04	220842.603128	209967.438912	236404.408845

In [10]:

            
                Copied!
                
                    
                    
                
                

        
# Interval coverage on test data
# ==============================================================================
inside_interval = np.where(
                      (data.loc[predictions.index, 'Demand'] >= predictions['lower_bound']) & \
                      (data.loc[predictions.index, 'Demand'] <= predictions['upper_bound']),
                      True,
                      False
                  )

coverage = inside_interval.mean()
print(f"Coverage of the predicted interval on test data: {100 * coverage}")
# Interval coverage on test data
# ==============================================================================
inside_interval = np.where(
                      (data.loc[predictions.index, 'Demand'] >= predictions['lower_bound']) & \
                      (data.loc[predictions.index, 'Demand'] <= predictions['upper_bound']),
                      True,
                      False
                  )

coverage = inside_interval.mean()
print(f"Coverage of the predicted interval on test data: {100 * coverage}")

Coverage of the predicted interval on test data: 65.93406593406593

The coverage of the predicted interval is lower than expected (80%).

In [11]:

            
                Copied!
                
                    
                    
                
                

        
# Plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(11, 3))
data.loc[end_validation:, 'Demand'].plot(ax=ax, label='Demand')
ax.fill_between(
    predictions.index,
    predictions['lower_bound'],
    predictions['upper_bound'],
    color = 'deepskyblue',
    alpha = 0.3,
    label = '80% interval'
)
ax.yaxis.set_major_formatter(ticker.EngFormatter())
ax.set_ylabel('MW')
ax.set_title('Energy demand forecast')
ax.legend();
# Plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(11, 3))
data.loc[end_validation:, 'Demand'].plot(ax=ax, label='Demand')
ax.fill_between(
    predictions.index,
    predictions['lower_bound'],
    predictions['upper_bound'],
    color = 'deepskyblue',
    alpha = 0.3,
    label = '80% interval'
)
ax.yaxis.set_major_formatter(ticker.EngFormatter())
ax.set_ylabel('MW')
ax.set_title('Energy demand forecast')
ax.legend();

By default, training residuals are used to create the prediction intervals. However, other residuals can be used, for example, residuals obtained from a validation set.

To do this, the new residuals must be stored inside the forecaster using the set_out_sample_residuals method. Once the new residuals have been added to the forecaster, set in_sample_residuals = False when using the predict_interval method.

In [12]:

            
                Copied!
                
# Simulated residuals
simulated_residuals = pd.Series(np.random.normal(size=500))
forecaster.set_out_sample_residuals(residuals=simulated_residuals)
# Simulated residuals
simulated_residuals = pd.Series(np.random.normal(size=500))
forecaster.set_out_sample_residuals(residuals=simulated_residuals)

Quantile forecasting models¶

As in the previous example, an 80% prediction interval is estimated for 7 steps-ahead predictions but, this time, using quantile regression. In this example a LightGBM gradient boosting model is trained, however, the reader can use any other model by simply substituting the regressor definition.

In [16]:

            
                Copied!
                
                    
                    
                
                

        
# Create forecasters: one for each limit of the interval
# ==============================================================================
# The forecasters obtained for alpha=0.1 and alpha=0.9 produce a 80% confidence
# interval (90% - 10% = 80%).

# Forecaster for quantile 10%
forecaster_q10 = ForecasterAutoregDirect(
                     regressor = LGBMRegressor(
                                     objective     = 'quantile',
                                     metric        = 'quantile',
                                     alpha         = 0.1,
                                     learning_rate = 0.1,
                                     max_depth     = 10,
                                     n_estimators  = 100
                                 ),
                     lags = 7,
                     steps = 7
                 )
                  
# Forecaster for quantile 90%
forecaster_q90 = ForecasterAutoregDirect(
                     regressor = LGBMRegressor(
                                     objective     = 'quantile',
                                     metric        = 'quantile',
                                     alpha         = 0.9,
                                     learning_rate = 0.1,
                                     max_depth     = 10,
                                     n_estimators  = 100
                                 ),
                     lags = 7,
                     steps = 7
                 )

forecaster_q10.fit(y=data['Demand'])
forecaster_q90.fit(y=data['Demand'])
# Create forecasters: one for each limit of the interval
# ==============================================================================
# The forecasters obtained for alpha=0.1 and alpha=0.9 produce a 80% confidence
# interval (90% - 10% = 80%).

# Forecaster for quantile 10%
forecaster_q10 = ForecasterAutoregDirect(
                     regressor = LGBMRegressor(
                                     objective     = 'quantile',
                                     metric        = 'quantile',
                                     alpha         = 0.1,
                                     learning_rate = 0.1,
                                     max_depth     = 10,
                                     n_estimators  = 100
                                 ),
                     lags = 7,
                     steps = 7
                 )
                  
# Forecaster for quantile 90%
forecaster_q90 = ForecasterAutoregDirect(
                     regressor = LGBMRegressor(
                                     objective     = 'quantile',
                                     metric        = 'quantile',
                                     alpha         = 0.9,
                                     learning_rate = 0.1,
                                     max_depth     = 10,
                                     n_estimators  = 100
                                 ),
                     lags = 7,
                     steps = 7
                 )

forecaster_q10.fit(y=data['Demand'])
forecaster_q90.fit(y=data['Demand'])

Once the quantile forecasters are trained, they can be used to predict each of the bounds of the forecasting interval.

In [17]:

            
                Copied!
                
                    
                    
                
                

        
# Predict intervals for next 7 steps
# ==============================================================================
predictions_q10 = forecaster_q10.predict(steps=7)
predictions_q90 = forecaster_q90.predict(steps=7)
predictions = pd.DataFrame({
                  'lower_bound': predictions_q10,
                  'upper_bound': predictions_q90,
              }) 
predictions
# Predict intervals for next 7 steps
# ==============================================================================
predictions_q10 = forecaster_q10.predict(steps=7)
predictions_q90 = forecaster_q90.predict(steps=7)
predictions = pd.DataFrame({
                  'lower_bound': predictions_q10,
                  'upper_bound': predictions_q90,
              }) 
predictions 

Out[17]:

	lower_bound	upper_bound
2014-12-31	177607.329878	218885.714906
2015-01-01	178062.602517	242582.788349
2015-01-02	186213.019530	220677.491829
2015-01-03	184901.085939	204261.638256
2015-01-04	193237.899189	235310.200573
2015-01-05	196673.050873	284881.068713
2015-01-06	184148.733152	293018.478848

When validating a quantile regression model, a custom metric must be provided depending on the quantile being estimated. These metrics will be used again when tuning the hyper-parameters of each model.

In [18]:

            
                Copied!
                
                    
                    
                
                

        
# Loss function for each quantile (pinball_loss)
# ==============================================================================
def mean_pinball_loss_q10(y_true, y_pred):
    """
    Pinball loss for quantile 10.
    """
    return mean_pinball_loss(y_true, y_pred, alpha=0.1)


def mean_pinball_loss_q90(y_true, y_pred):
    """
    Pinball loss for quantile 90.
    """
    return mean_pinball_loss(y_true, y_pred, alpha=0.9)
# Loss function for each quantile (pinball_loss)
# ==============================================================================
def mean_pinball_loss_q10(y_true, y_pred):
    """
    Pinball loss for quantile 10.
    """
    return mean_pinball_loss(y_true, y_pred, alpha=0.1)


def mean_pinball_loss_q90(y_true, y_pred):
    """
    Pinball loss for quantile 90.
    """
    return mean_pinball_loss(y_true, y_pred, alpha=0.9)

In [19]:

            
                Copied!
                
                    
                    
                
                

        
# Backtesting on test data
# ==============================================================================
metric_q10, predictions_q10 = backtesting_forecaster(
                                  forecaster         = forecaster_q10,
                                  y                  = data['Demand'],
                                  initial_train_size = len(data_train) + len(data_val),
                                  steps              = 7,
                                  refit              = True,
                                  metric             = mean_pinball_loss_q10,
                                  verbose            = False
                              )

metric_q90, predictions_q90 = backtesting_forecaster(
                                  forecaster         = forecaster_q90,
                                  y                  = data['Demand'],
                                  initial_train_size = len(data_train) + len(data_val),
                                  steps              = 7,
                                  refit              = True,
                                  metric             = mean_pinball_loss_q90,
                                  verbose            = False
                              )
# Backtesting on test data
# ==============================================================================
metric_q10, predictions_q10 = backtesting_forecaster(
                                  forecaster         = forecaster_q10,
                                  y                  = data['Demand'],
                                  initial_train_size = len(data_train) + len(data_val),
                                  steps              = 7,
                                  refit              = True,
                                  metric             = mean_pinball_loss_q10,
                                  verbose            = False
                              )

metric_q90, predictions_q90 = backtesting_forecaster(
                                  forecaster         = forecaster_q90,
                                  y                  = data['Demand'],
                                  initial_train_size = len(data_train) + len(data_val),
                                  steps              = 7,
                                  refit              = True,
                                  metric             = mean_pinball_loss_q90,
                                  verbose            = False
                              )

Predictions generated for each model are used to define the upper and lower limits of the interval.

In [20]:

            
                Copied!
                
                    
                    
                
                

        
# Interval coverage on test data
# ==============================================================================
inside_interval = np.where(
                      (data.loc[end_validation:, 'Demand'] >= predictions_q10['pred']) & \
                      (data.loc[end_validation:, 'Demand'] <= predictions_q90['pred']),
                      True,
                      False
                  )

coverage = inside_interval.mean()
print(f"Coverage of the predicted interval: {100 * coverage}")
# Interval coverage on test data
# ==============================================================================
inside_interval = np.where(
                      (data.loc[end_validation:, 'Demand'] >= predictions_q10['pred']) & \
                      (data.loc[end_validation:, 'Demand'] <= predictions_q90['pred']),
                      True,
                      False
                  )

coverage = inside_interval.mean()
print(f"Coverage of the predicted interval: {100 * coverage}")

Coverage of the predicted interval: 61.53846153846154

The coverage of the predicted interval is much lower than expected (80%).

The hyperparameters of the model were hand-tuned and there is no reason that the same hyperparameters are suitable for the 10th and 90th percentiles regressors. Therefore, a grid search is carried out for each forecaster.

In [21]:

            
                Copied!
                
                    
                    
                
                

        
# Grid search of hyper-parameters and lags for each quantile forecaster
# ==============================================================================
# Regressor hyper-parameters
param_grid = {
    'n_estimators': [100, 500],
    'max_depth': [3, 5, 10],
    'learning_rate': [0.01, 0.1]
}

# Lags used as predictors
lags_grid = [7]

results_grid_q10 = grid_search_forecaster(
                       forecaster         = forecaster_q10,
                       y                  = data.loc[:end_validation, 'Demand'],
                       param_grid         = param_grid,
                       lags_grid          = lags_grid,
                       steps              = 7,
                       refit              = False,
                       metric             = mean_pinball_loss_q10,
                       initial_train_size = int(len(data_train)),
                       return_best        = True,
                       verbose            = False
                   )

results_grid_q90 = grid_search_forecaster(
                       forecaster         = forecaster_q90,
                       y                  = data.loc[:end_validation, 'Demand'],
                       param_grid         = param_grid,
                       lags_grid          = lags_grid,
                       steps              = 7,
                       refit              = False,
                       metric             = mean_pinball_loss_q90,
                       initial_train_size = int(len(data_train)),
                       return_best        = True,
                       verbose            = False
                   )
# Grid search of hyper-parameters and lags for each quantile forecaster
# ==============================================================================
# Regressor hyper-parameters
param_grid = {
    'n_estimators': [100, 500],
    'max_depth': [3, 5, 10],
    'learning_rate': [0.01, 0.1]
}

# Lags used as predictors
lags_grid = [7]

results_grid_q10 = grid_search_forecaster(
                       forecaster         = forecaster_q10,
                       y                  = data.loc[:end_validation, 'Demand'],
                       param_grid         = param_grid,
                       lags_grid          = lags_grid,
                       steps              = 7,
                       refit              = False,
                       metric             = mean_pinball_loss_q10,
                       initial_train_size = int(len(data_train)),
                       return_best        = True,
                       verbose            = False
                   )

results_grid_q90 = grid_search_forecaster(
                       forecaster         = forecaster_q90,
                       y                  = data.loc[:end_validation, 'Demand'],
                       param_grid         = param_grid,
                       lags_grid          = lags_grid,
                       steps              = 7,
                       refit              = False,
                       metric             = mean_pinball_loss_q90,
                       initial_train_size = int(len(data_train)),
                       return_best        = True,
                       verbose            = False
                   )

Number of models compared: 12.

loop lags_grid: 100%|███████████████████████████████████████| 1/1 [00:05<00:00,  5.57s/it]

`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [1 2 3 4 5 6 7] 
  Parameters: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 500}
  Backtesting metric: 2713.0192469016706

Number of models compared: 12.

loop lags_grid: 100%|███████████████████████████████████████| 1/1 [00:05<00:00,  5.55s/it]

`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [1 2 3 4 5 6 7] 
  Parameters: {'learning_rate': 0.01, 'max_depth': 10, 'n_estimators': 500}
  Backtesting metric: 4094.3047516967745

Once the best hyperparameters have been found for each forecaster, a backtesting process is applied again using the test data.

In [22]:

            
                Copied!
                
                    
                    
                
                

        
# Backtesting on test data
# ==============================================================================
metric_q10, predictions_q10 = backtesting_forecaster(
                                  forecaster         = forecaster_q10,
                                  y                  = data['Demand'],
                                  initial_train_size = len(data_train) + len(data_val),
                                  steps              = 7,
                                  refit              = False,
                                  metric             = mean_pinball_loss_q10,
                                  verbose            = False
                              )

metric_q90, predictions_q90 = backtesting_forecaster(
                                  forecaster         = forecaster_q90,
                                  y                  = data['Demand'],
                                  initial_train_size = len(data_train) + len(data_val),
                                  steps              = 7,
                                  refit              = False,
                                  metric             = mean_pinball_loss_q90,
                                  verbose            = False
                              )
# Backtesting on test data
# ==============================================================================
metric_q10, predictions_q10 = backtesting_forecaster(
                                  forecaster         = forecaster_q10,
                                  y                  = data['Demand'],
                                  initial_train_size = len(data_train) + len(data_val),
                                  steps              = 7,
                                  refit              = False,
                                  metric             = mean_pinball_loss_q10,
                                  verbose            = False
                              )

metric_q90, predictions_q90 = backtesting_forecaster(
                                  forecaster         = forecaster_q90,
                                  y                  = data['Demand'],
                                  initial_train_size = len(data_train) + len(data_val),
                                  steps              = 7,
                                  refit              = False,
                                  metric             = mean_pinball_loss_q90,
                                  verbose            = False
                              )

In [23]:

            
                Copied!
                
                    
                    
                
                

        
# Plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(11, 3))
data.loc[end_validation:, 'Demand'].plot(ax=ax, label='Demand')
ax.fill_between(
    data.loc[end_validation:].index,
    predictions_q10['pred'],
    predictions_q90['pred'],
    color = 'deepskyblue',
    alpha = 0.3,
    label = '80% interval'
)
ax.yaxis.set_major_formatter(ticker.EngFormatter())
ax.set_ylabel('MW')
ax.set_title('Energy demand forecast')
ax.legend();
# Plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(11, 3))
data.loc[end_validation:, 'Demand'].plot(ax=ax, label='Demand')
ax.fill_between(
    data.loc[end_validation:].index,
    predictions_q10['pred'],
    predictions_q90['pred'],
    color = 'deepskyblue',
    alpha = 0.3,
    label = '80% interval'
)
ax.yaxis.set_major_formatter(ticker.EngFormatter())
ax.set_ylabel('MW')
ax.set_title('Energy demand forecast')
ax.legend();

In [24]:

            
                Copied!
                
                    
                    
                
                

        
# Interval coverage
# ==============================================================================
inside_interval = np.where(
                      (data.loc[end_validation:, 'Demand'] >= predictions_q10['pred']) & \
                      (data.loc[end_validation:, 'Demand'] <= predictions_q90['pred']),
                      True,
                      False
                  )

coverage = inside_interval.mean()
print(f"Coverage of the predicted interval: {100 * coverage}")
# Interval coverage
# ==============================================================================
inside_interval = np.where(
                      (data.loc[end_validation:, 'Demand'] >= predictions_q10['pred']) & \
                      (data.loc[end_validation:, 'Demand'] <= predictions_q90['pred']),
                      True,
                      False
                  )

coverage = inside_interval.mean()
print(f"Coverage of the predicted interval: {100 * coverage}")

Coverage of the predicted interval: 75.82417582417582

After optimizing the hyper-parameters of each quantile forecaster, the coverage is closer to the expected one (80%).

In [25]:

            
                Copied!
                
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>
%%html