Stacking ensemble machine learning models to improve forecasting¶

In machine learning, stacking is an ensemble technique that combines multiple models to reduce their biases and improve predictive performance. Specifically, the predictions of each model (base models) are stacked and used as input to a final model (metamodel) to compute the prediction.

Stacking is effective because it leverages the strengths of different algorithms and attempts to mitigate their individual weaknesses. By combining multiple models, it can capture complex patterns in the data and improve prediction accuracy.

However, stacking can be computationally expensive and requires careful tuning to avoid overfitting. To this end, it is highly recommended to train the final estimator through cross-validation. In addition, obtaining diverse and well-performing base models is critical to the success of the stacking technique.

With scikit-learn, it is very easy to combine multiple regressors thanks to the StackingRegressor class. The estimators parameter corresponds to the list of the estimators (base learners) that will be stacked in parallel on the input data. The final_estimator (metamodel) will use the predictions of the estimators as input.

✎ Note

See Stacking (ensemble) machine learning models to improve forecasting for a more detailed example of stacking models.

Libraries and Data¶

In [1]:

Copied!





# Data processing
# ==============================================================================
import numpy as np
import pandas as pd

# Modelling and Forecasting
# ==============================================================================
from lightgbm import LGBMRegressor
from sklearn.linear_model import Ridge
from sklearn.ensemble import StackingRegressor
from sklearn.model_selection import KFold

from skforecast.datasets import fetch_dataset
from skforecast.recursive import ForecasterRecursive
from skforecast.model_selection import TimeSeriesFold
from skforecast.model_selection import backtesting_forecaster
from skforecast.model_selection import grid_search_forecaster

# Warnings
# ==============================================================================
import warnings
warnings.filterwarnings('ignore')
# Data processing
# ==============================================================================
import numpy as np
import pandas as pd

# Modelling and Forecasting
# ==============================================================================
from lightgbm import LGBMRegressor
from sklearn.linear_model import Ridge
from sklearn.ensemble import StackingRegressor
from sklearn.model_selection import KFold

from skforecast.datasets import fetch_dataset
from skforecast.recursive import ForecasterRecursive
from skforecast.model_selection import TimeSeriesFold
from skforecast.model_selection import backtesting_forecaster
from skforecast.model_selection import grid_search_forecaster

# Warnings
# ==============================================================================
import warnings
warnings.filterwarnings('ignore')

In [2]:

Copied!





# Data
# ==============================================================================
data = fetch_dataset(name = 'fuel_consumption')
data = data.loc[:"2019-01-01", ['Gasolinas']]
data = data.rename(columns = {'Gasolinas': 'consumption'})
data.index.name = 'date'
data['consumption'] = data['consumption'] / 100000
data.head(3)
# Data
# ==============================================================================
data = fetch_dataset(name = 'fuel_consumption')
data = data.loc[:"2019-01-01", ['Gasolinas']]
data = data.rename(columns = {'Gasolinas': 'consumption'})
data.index.name = 'date'
data['consumption'] = data['consumption'] / 100000
data.head(3)

fuel_consumption
----------------
Monthly fuel consumption in Spain from 1969-01-01 to 2022-08-01.
Obtained from Corporación de Reservas Estratégicas de Productos Petrolíferos and
Corporación de Derecho Público tutelada por el Ministerio para la Transición
Ecológica y el Reto Demográfico. https://www.cores.es/es/estadisticas
Shape of the dataset: (644, 5)

Out[2]:

	consumption
date
1969-01-01	1.668752
1969-02-01	1.554668
1969-03-01	1.849837

In addition to the past values of the series (lags), an additional variable indicating the month of the year is added. This variable is included in the model to capture the seasonality of the series.

In [3]:

Copied!





# Calendar features
# ==============================================================================
data['month_of_year'] = data.index.month
data.head(3)
# Calendar features
# ==============================================================================
data['month_of_year'] = data.index.month
data.head(3)

Out[3]:

	consumption	month_of_year
date
1969-01-01	1.668752	1
1969-02-01	1.554668	2
1969-03-01	1.849837	3

To facilitate the training of the models, the search for optimal hyperparameters, and the evaluation of their predictive accuracy, the data are divided into three separate sets: training, validation, and test.

In [4]:

Copied!





# Split train-validation-test
# ==============================================================================
end_train = '2007-12-01 23:59:00'
end_validation = '2012-12-01 23:59:00'
data_train = data.loc[: end_train, :]
data_val   = data.loc[end_train:end_validation, :]
data_test  = data.loc[end_validation:, :]

print(f"Dates train      : {data_train.index.min()} --- {data_train.index.max()}  (n={len(data_train)})")
print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()}  (n={len(data_val)})")
print(f"Dates test       : {data_test.index.min()} --- {data_test.index.max()}  (n={len(data_test)})")
# Split train-validation-test
# ==============================================================================
end_train = '2007-12-01 23:59:00'
end_validation = '2012-12-01 23:59:00'
data_train = data.loc[: end_train, :]
data_val   = data.loc[end_train:end_validation, :]
data_test  = data.loc[end_validation:, :]

print(f"Dates train      : {data_train.index.min()} --- {data_train.index.max()}  (n={len(data_train)})")
print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()}  (n={len(data_val)})")
print(f"Dates test       : {data_test.index.min()} --- {data_test.index.max()}  (n={len(data_test)})")

Dates train      : 1969-01-01 00:00:00 --- 2007-12-01 00:00:00  (n=468)
Dates validacion : 2008-01-01 00:00:00 --- 2012-12-01 00:00:00  (n=60)
Dates test       : 2013-01-01 00:00:00 --- 2019-01-01 00:00:00  (n=73)

StackingRegressor¶

With scikit-learn it is very easy to combine multiple regressors thanks to the StackingRegressor class.

The estimators parameter corresponds to the list of the estimators (base learners) that will be stacked in parallel on the input data. It should be specified as a list of names and estimators. The final_estimator (metamodel) will use the predictions of the estimators as input.

In [5]:

Copied!





# Create stacking regressor
# ==============================================================================
params_ridge = {'alpha': 0.001}
params_lgbm = {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 500, 'verbose': -1}

estimators = [
    ('ridge', Ridge(**params_ridge)),
    ('lgbm', LGBMRegressor(random_state=42, **params_lgbm)),
]

stacking_regressor = StackingRegressor(
                         estimators = estimators,
                         final_estimator = Ridge(),
                         cv = KFold(n_splits=5, shuffle=False)
                     )
stacking_regressor
# Create stacking regressor
# ==============================================================================
params_ridge = {'alpha': 0.001}
params_lgbm = {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 500, 'verbose': -1}

estimators = [
    ('ridge', Ridge(**params_ridge)),
    ('lgbm', LGBMRegressor(random_state=42, **params_lgbm)),
]

stacking_regressor = StackingRegressor(
                         estimators = estimators,
                         final_estimator = Ridge(),
                         cv = KFold(n_splits=5, shuffle=False)
                     )
stacking_regressor

Out[5]:

StackingRegressor(cv=KFold(n_splits=5, random_state=None, shuffle=False),
                  estimators=[('ridge', Ridge(alpha=0.001)),
                              ('lgbm',
                               LGBMRegressor(max_depth=5, n_estimators=500,
                                             random_state=42, verbose=-1))],
                  final_estimator=Ridge())

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [6]:

Copied!





# Create forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
                 regressor = stacking_regressor,
                 lags      = 12  # Last 12 months used as predictors
             )
# Create forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
                 regressor = stacking_regressor,
                 lags      = 12  # Last 12 months used as predictors
             )

In [7]:

Copied!





# Backtesting on test data
# ==============================================================================
cv = TimeSeriesFold(
         steps              = 12,  # Forecast horizon
         initial_train_size = len(data.loc[:end_validation]),
         refit              = False, 
     )

metric, predictions = backtesting_forecaster(
                          forecaster = forecaster,
                          y          = data['consumption'],
                          exog       = data['month_of_year'],
                          cv         = cv,
                          metric     = 'mean_squared_error',
                          n_jobs     = 'auto',
                          verbose    = False
                      )        

metric
# Backtesting on test data
# ==============================================================================
cv = TimeSeriesFold(
         steps              = 12,  # Forecast horizon
         initial_train_size = len(data.loc[:end_validation]),
         refit              = False, 
     )

metric, predictions = backtesting_forecaster(
                          forecaster = forecaster,
                          y          = data['consumption'],
                          exog       = data['month_of_year'],
                          cv         = cv,
                          metric     = 'mean_squared_error',
                          n_jobs     = 'auto',
                          verbose    = False
                      )        

metric

  0%|          | 0/7 [00:00<?, ?it/s]

Out[7]:

	mean_squared_error
0	0.053776

Hiperparameters search of StackingRegressor¶

When using StackingRegressor, the hyperparameters of each regressor must be preceded by the name of the regressor followed by two underscores. For example, the alpha hyperparameter of the ridge regressor must be specified as ridge__alpha. The hyperparameter of the final estimator must be specified with the prefix final_estimator__.

In [8]:

Copied!





# Grid search of hyperparameters and lags
# ==============================================================================
param_grid = {
    'ridge__alpha': [0.1, 1, 10],
    'lgbm__n_estimators': [100, 500],
    'lgbm__max_depth': [3, 5, 10],
    'lgbm__learning_rate': [0.01, 0.1],
    'final_estimator__alpha': [0.1, 1]
}

# Lags used as predictors
lags_grid = [24]

cv = TimeSeriesFold(
         steps              = 12,
         initial_train_size = len(data.loc[:end_train]),
         refit              = False, 
     )

results_grid = grid_search_forecaster(
                   forecaster  = forecaster,
                   y           = data['consumption'],
                   exog        = data['month_of_year'],
                   lags_grid   = lags_grid,
                   param_grid  = param_grid,
                   cv          = cv,
                   metric      = 'mean_squared_error',
                   return_best = True,
                   n_jobs      = 'auto',
                   verbose     = False
               )

results_grid.head()
# Grid search of hyperparameters and lags
# ==============================================================================
param_grid = {
    'ridge__alpha': [0.1, 1, 10],
    'lgbm__n_estimators': [100, 500],
    'lgbm__max_depth': [3, 5, 10],
    'lgbm__learning_rate': [0.01, 0.1],
    'final_estimator__alpha': [0.1, 1]
}

# Lags used as predictors
lags_grid = [24]

cv = TimeSeriesFold(
         steps              = 12,
         initial_train_size = len(data.loc[:end_train]),
         refit              = False, 
     )

results_grid = grid_search_forecaster(
                   forecaster  = forecaster,
                   y           = data['consumption'],
                   exog        = data['month_of_year'],
                   lags_grid   = lags_grid,
                   param_grid  = param_grid,
                   cv          = cv,
                   metric      = 'mean_squared_error',
                   return_best = True,
                   n_jobs      = 'auto',
                   verbose     = False
               )

results_grid.head()

lags grid:   0%|          | 0/1 [00:00<?, ?it/s]

params grid:   0%|          | 0/72 [00:00<?, ?it/s]

`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] 
  Parameters: {'final_estimator__alpha': 1, 'lgbm__learning_rate': 0.01, 'lgbm__max_depth': 3, 'lgbm__n_estimators': 100, 'ridge__alpha': 0.1}
  Backtesting metric: 0.06635424584881097

Out[8]:

	lags	lags_label	params	mean_squared_error	final_estimator__alpha	lgbm__learning_rate	lgbm__max_depth	lgbm__n_estimators	ridge__alpha
0	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'final_estimator__alpha': 1, 'lgbm__learning_...	0.066354	1.0	0.01	3.0	100.0	0.1
1	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'final_estimator__alpha': 1, 'lgbm__learning_...	0.068780	1.0	0.01	3.0	100.0	1.0
2	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'final_estimator__alpha': 0.1, 'lgbm__learnin...	0.069371	0.1	0.01	3.0	100.0	0.1
3	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'final_estimator__alpha': 1, 'lgbm__learning_...	0.069450	1.0	0.10	10.0	500.0	0.1
4	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'final_estimator__alpha': 1, 'lgbm__learning_...	0.069464	1.0	0.01	5.0	100.0	0.1

Once the best hyperparameters have been determined for each regressor in the ensemble, the test error is computed through backtesting.

In [9]:

Copied!





# Backtesting on test data
# ==============================================================================
cv = TimeSeriesFold(
         steps              = 12,
         initial_train_size = len(data.loc[:end_validation]),
         refit              = False, 
     )

metric, predictions = backtesting_forecaster(
                          forecaster = forecaster,
                          y          = data['consumption'],
                          exog       = data['month_of_year'],
                          cv         = cv,
                          metric     = 'mean_squared_error',
                          n_jobs     = 'auto',
                          verbose    = False
                      )        

metric
# Backtesting on test data
# ==============================================================================
cv = TimeSeriesFold(
         steps              = 12,
         initial_train_size = len(data.loc[:end_validation]),
         refit              = False, 
     )

metric, predictions = backtesting_forecaster(
                          forecaster = forecaster,
                          y          = data['consumption'],
                          exog       = data['month_of_year'],
                          cv         = cv,
                          metric     = 'mean_squared_error',
                          n_jobs     = 'auto',
                          verbose    = False
                      )        

metric

  0%|          | 0/7 [00:00<?, ?it/s]

Out[9]:

	mean_squared_error
0	0.013506

Feature importance in StackingRegressor¶

When a regressor of type StackingRegressor is used as a regressor in a predictor, its get_feature_importances method will not work. This is because objects of type StackingRegressor do not have either the feature_importances or coef_ attribute. Instead, it is necessary to inspect each of the regressors that are part of the stacking.

In [10]:

Copied!





# Feature importances for each regressor in the stacking
# ==============================================================================
if forecaster.regressor.__class__.__name__ == 'StackingRegressor':
    importancia_pred = []
    for regressor in forecaster.regressor.estimators_:
        try:
            importancia = pd.DataFrame(
                data = {
                    'feature': forecaster.regressor.feature_names_in_,
                    f'importance_{type(regressor).__name__}': regressor.coef_,
                    f'importance_abs_{type(regressor).__name__}': np.abs(regressor.coef_)
                }
            ).set_index('feature')
        except:
            importancia = pd.DataFrame(
                data = {
                    'feature': forecaster.regressor.feature_names_in_,
                    f'importance_{type(regressor).__name__}': regressor.feature_importances_,
                    f'importance_abs_{type(regressor).__name__}': np.abs(regressor.feature_importances_)
                }
            ).set_index('feature')
        importancia_pred.append(importancia)
    
    importancia_pred = pd.concat(importancia_pred, axis=1)
    
else:
    importancia_pred = forecaster.get_feature_importances()
    importancia_pred['importance_abs'] = importancia_pred['importance'].abs()
    importancia_pred = importancia_pred.sort_values(by='importance_abs', ascending=False)

importancia_pred.head(5)
# Feature importances for each regressor in the stacking
# ==============================================================================
if forecaster.regressor.__class__.__name__ == 'StackingRegressor':
    importancia_pred = []
    for regressor in forecaster.regressor.estimators_:
        try:
            importancia = pd.DataFrame(
                data = {
                    'feature': forecaster.regressor.feature_names_in_,
                    f'importance_{type(regressor).__name__}': regressor.coef_,
                    f'importance_abs_{type(regressor).__name__}': np.abs(regressor.coef_)
                }
            ).set_index('feature')
        except:
            importancia = pd.DataFrame(
                data = {
                    'feature': forecaster.regressor.feature_names_in_,
                    f'importance_{type(regressor).__name__}': regressor.feature_importances_,
                    f'importance_abs_{type(regressor).__name__}': np.abs(regressor.feature_importances_)
                }
            ).set_index('feature')
        importancia_pred.append(importancia)
    
    importancia_pred = pd.concat(importancia_pred, axis=1)
    
else:
    importancia_pred = forecaster.get_feature_importances()
    importancia_pred['importance_abs'] = importancia_pred['importance'].abs()
    importancia_pred = importancia_pred.sort_values(by='importance_abs', ascending=False)

importancia_pred.head(5)

Out[10]:

	importance_Ridge	importance_abs_Ridge	importance_LGBMRegressor	importance_abs_LGBMRegressor
feature
lag_1	0.020984	0.020984	59	59
lag_2	0.216998	0.216998	1	1
lag_3	0.188519	0.188519	0	0
lag_4	0.200916	0.200916	0	0
lag_5	0.106734	0.106734	0	0