Skforecast library allows to combine grid search strategy with backtesting in order to identify the combination of lags and hyperparameters that achieve the best prediction performance.
# Grid search hyperparameter and lags# ==============================================================================forecaster=ForecasterAutoreg(regressor=RandomForestRegressor(random_state=123),lags=12# Placeholder, the value will be overwritten)# Regressor hyperparametersparam_grid={'n_estimators':[50,100],'max_depth':[5,10,15]}# Lags used as predictorslags_grid=[3,10,[1,2,3,20]]results_grid=grid_search_forecaster(forecaster=forecaster,y=data.loc[:'2006-01-01'],param_grid=param_grid,lags_grid=lags_grid,steps=12,refit=True,metric='mean_squared_error',initial_train_size=len(data_train),return_best=True,verbose=False)
1
results_grid
lags
params
metric
max_depth
n_estimators
[ 1 2 3 4 5 6 7 8 9 10]
{'max_depth': 5, 'n_estimators': 50}
0.0334486
5
50
[ 1 2 3 4 5 6 7 8 9 10]
{'max_depth': 10, 'n_estimators': 50}
0.0392212
10
50
[ 1 2 3 4 5 6 7 8 9 10]
{'max_depth': 15, 'n_estimators': 100}
0.0392658
15
100
[ 1 2 3 4 5 6 7 8 9 10]
{'max_depth': 5, 'n_estimators': 100}
0.0395258
5
100
[ 1 2 3 4 5 6 7 8 9 10]
{'max_depth': 10, 'n_estimators': 100}
0.0402408
10
100
[ 1 2 3 4 5 6 7 8 9 10]
{'max_depth': 15, 'n_estimators': 50}
0.0407645
15
50
[ 1 2 3 20]
{'max_depth': 15, 'n_estimators': 100}
0.0439092
15
100
[ 1 2 3 20]
{'max_depth': 5, 'n_estimators': 100}
0.0449923
5
100
[ 1 2 3 20]
{'max_depth': 5, 'n_estimators': 50}
0.0462237
5
50
[1 2 3]
{'max_depth': 5, 'n_estimators': 50}
0.0486662
5
50
[ 1 2 3 20]
{'max_depth': 10, 'n_estimators': 100}
0.0489914
10
100
[ 1 2 3 20]
{'max_depth': 10, 'n_estimators': 50}
0.0501932
10
50
[1 2 3]
{'max_depth': 15, 'n_estimators': 100}
0.0505563
15
100
[ 1 2 3 20]
{'max_depth': 15, 'n_estimators': 50}
0.0512172
15
50
[1 2 3]
{'max_depth': 5, 'n_estimators': 100}
0.0531229
5
100
[1 2 3]
{'max_depth': 15, 'n_estimators': 50}
0.0602604
15
50
[1 2 3]
{'max_depth': 10, 'n_estimators': 50}
0.0609513
10
50
[1 2 3]
{'max_depth': 10, 'n_estimators': 100}
0.0673343
10
100
If argument return_best = True, the forecaster is retrained using all data available and the best combination of lags and hyperparameters.
1
skforecast
1 2 3 4 5 6 7 8 910111213
==========================================================================
<class 'skforecast.ForecasterAutoreg.ForecasterAutoreg.ForecasterAutoreg'>
==========================================================================
Regressor: RandomForestRegressor(max_depth=5, n_estimators=50, random_state=123)
Lags: [ 1 2 3 4 5 6 7 8 9 10]
Window size: 10
Included exogenous: False
Type of exogenous variable: None
Exogenous variables names: None
Training range: [Timestamp('1991-07-01 00:00:00'), Timestamp('2006-01-01 00:00:00')]
Training index type: <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
Training index frequancy: MS
Regressor parameters: {'bootstrap': True, 'ccp_alpha': 0.0, 'criterion': 'squared_error', 'max_depth': 5, 'max_features': 'auto', 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 50, 'n_jobs': None, 'oob_score': False, 'random_state': 123, 'verbose': 0, 'warm_start': False}