Hyperparameter tuning and lags selection¶
Hyperparameter tuning is a crucial aspect of developing accurate and effective machine learning models. In machine learning, hyperparameters are values that cannot be learned from data and must be set by the user before the model is trained. These hyperparameters can significantly impact the performance of the model, and tuning them carefully can improve its accuracy and generalization to new data. In the case of forecasting models, the lags included in the model can be considered as an additional hyperparameter.
Hyperparameter tuning involves systematically testing different values or combinations of hyperparameters (including lags) to find the optimal configuration that produces the best results. The Skforecast library offers various hyperparameter tuning strategies, including grid search, random search, and Bayesian search, that can be combined with backtesting to identify the optimal combination of lags and hyperparameters that achieve the best prediction performance.
Libraries¶
# Libraries
# ==============================================================================
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import grid_search_forecaster
from skforecast.model_selection import random_search_forecaster
from skforecast.model_selection import bayesian_search_forecaster
from sklearn.metrics import mean_squared_error
Data¶
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'datetime'])
# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data[['y']]
data = data.sort_index()
# Train-val-test dates
# ==============================================================================
end_train = '2001-01-01 23:59:00'
end_val = '2006-01-01 23:59:00'
print(f"Train dates : {data.index.min()} --- {data.loc[:end_train].index.max()} (n={len(data.loc[:end_train])})")
print(f"Validation dates : {data.loc[end_train:].index.min()} --- {data.loc[:end_val].index.max()} (n={len(data.loc[end_train:end_val])})")
print(f"Test dates : {data.loc[end_val:].index.min()} --- {data.index.max()} (n={len(data.loc[end_val:])})")
# Plot
# ==============================================================================
fig, ax=plt.subplots(figsize=(7, 3))
data.loc[:end_train].plot(ax=ax, label='train')
data.loc[end_train:end_val].plot(ax=ax, label='validation')
data.loc[end_val:].plot(ax=ax, label='test')
ax.legend();
Train dates : 1991-07-01 00:00:00 --- 2001-01-01 00:00:00 (n=115) Validation dates : 2001-02-01 00:00:00 --- 2006-01-01 00:00:00 (n=60) Test dates : 2006-02-01 00:00:00 --- 2008-06-01 00:00:00 (n=29)
Grid search¶
Grid search is a popular hyperparameter tuning technique that evaluate an exaustive list of combinations of hyperparameters and lags to find the optimal configuration for a forecasting model. To perform a grid search with the Skforecast library, two grids are needed: one with different lags (lags_grid
) and another with the hyperparameters (param_grid
).
The grid search process involves the following steps:
grid_search_forecaster
creates a copy of the forecaster object and replaces thelags
argument with the first option appearing inlags_grid
.The function validates all combinations of hyperparameters presented in
param_grid
using backtesting.The function repeats these two steps until it has evaluated all possible combinations of lags and hyperparameters.
If
return_best = True
, the original forecaster is trained with the best lags and hyperparameters configuration found during the grid search process.
# Grid search hyperparameters and lags
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 10 # Placeholder, the value will be overwritten
)
# Lags used as predictors
lags_grid = [3, 10, [1, 2, 3, 20]]
# Regressor hyperparameters
param_grid = {'n_estimators': [50, 100],
'max_depth': [5, 10, 15]}
results_grid = grid_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_val, 'y'],
param_grid = param_grid,
lags_grid = lags_grid,
steps = 12,
refit = True,
metric = 'mean_squared_error',
initial_train_size = len(data.loc[:end_train]),
fixed_train_size = False,
return_best = True,
verbose = False
)
Number of models compared: 18.
loop lags_grid: 100%|███████████████████████████████████████| 3/3 [00:13<00:00, 4.65s/it]
`Forecaster` refitted using the best-found lags and parameters, and the whole data set: Lags: [ 1 2 3 4 5 6 7 8 9 10] Parameters: {'max_depth': 5, 'n_estimators': 50} Backtesting metric: 0.03344857370906804
results_grid
lags | params | mean_squared_error | max_depth | n_estimators | |
---|---|---|---|---|---|
6 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | {'max_depth': 5, 'n_estimators': 50} | 0.033449 | 5 | 50 |
8 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | {'max_depth': 10, 'n_estimators': 50} | 0.039221 | 10 | 50 |
11 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | {'max_depth': 15, 'n_estimators': 100} | 0.039266 | 15 | 100 |
7 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | {'max_depth': 5, 'n_estimators': 100} | 0.039526 | 5 | 100 |
9 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | {'max_depth': 10, 'n_estimators': 100} | 0.040241 | 10 | 100 |
10 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | {'max_depth': 15, 'n_estimators': 50} | 0.040765 | 15 | 50 |
17 | [1, 2, 3, 20] | {'max_depth': 15, 'n_estimators': 100} | 0.043909 | 15 | 100 |
13 | [1, 2, 3, 20] | {'max_depth': 5, 'n_estimators': 100} | 0.044992 | 5 | 100 |
12 | [1, 2, 3, 20] | {'max_depth': 5, 'n_estimators': 50} | 0.046224 | 5 | 50 |
0 | [1, 2, 3] | {'max_depth': 5, 'n_estimators': 50} | 0.048666 | 5 | 50 |
15 | [1, 2, 3, 20] | {'max_depth': 10, 'n_estimators': 100} | 0.048991 | 10 | 100 |
14 | [1, 2, 3, 20] | {'max_depth': 10, 'n_estimators': 50} | 0.050193 | 10 | 50 |
5 | [1, 2, 3] | {'max_depth': 15, 'n_estimators': 100} | 0.050556 | 15 | 100 |
16 | [1, 2, 3, 20] | {'max_depth': 15, 'n_estimators': 50} | 0.051217 | 15 | 50 |
1 | [1, 2, 3] | {'max_depth': 5, 'n_estimators': 100} | 0.053123 | 5 | 100 |
4 | [1, 2, 3] | {'max_depth': 15, 'n_estimators': 50} | 0.060260 | 15 | 50 |
2 | [1, 2, 3] | {'max_depth': 10, 'n_estimators': 50} | 0.060951 | 10 | 50 |
3 | [1, 2, 3] | {'max_depth': 10, 'n_estimators': 100} | 0.067334 | 10 | 100 |
Since return_best = True
, the forecaster object is updated with the best configuration found and trained with the whole data set. This means that the final model obtained from grid search will have the best combination of lags and hyperparameters that resulted in the highest performance metric. This final model can then be used for future predictions on new data.
forecaster
================= ForecasterAutoreg ================= Regressor: RandomForestRegressor(max_depth=5, n_estimators=50, random_state=123) Lags: [ 1 2 3 4 5 6 7 8 9 10] Transformer for y: None Transformer for exog: None Window size: 10 Weight function included: False Exogenous included: False Type of exogenous variable: None Exogenous variables names: None Training range: [Timestamp('1991-07-01 00:00:00'), Timestamp('2006-01-01 00:00:00')] Training index type: DatetimeIndex Training index frequency: MS Regressor parameters: {'bootstrap': True, 'ccp_alpha': 0.0, 'criterion': 'squared_error', 'max_depth': 5, 'max_features': 1.0, 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 50, 'n_jobs': None, 'oob_score': False, 'random_state': 123, 'verbose': 0, 'warm_start': False} Creation date: 2023-04-07 00:01:41 Last fit date: 2023-04-07 00:01:55 Skforecast version: 0.7.0 Python version: 3.10.0 Forecaster id: None
Random search¶
Random search is another hyperparameter tuning strategy available in the Skforecast library. In contrast to grid search, which tries out all possible combinations of hyperparameters and lags, randomized search samples a fixed number of values from the specified possibilities. The number of combinations that are evaluated is given by n_iter
.
It is important to note that random sampling is only applied to the model hyperparameters, but not to the lags. All lags specified by the user are evaluated.
# Random search hyperparameters and lags
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 10 # Placeholder, the value will be overwritten
)
# Lags used as predictors
lags_grid = [3, 5]
# Regressor hyperparameters
param_distributions = {'n_estimators': np.arange(start=10, stop=100, step=1, dtype=int),
'max_depth': np.arange(start=5, stop=30, step=1, dtype=int)}
results = random_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_val, 'y'],
steps = 12,
lags_grid = lags_grid,
param_distributions = param_distributions,
n_iter = 5,
metric = 'mean_squared_error',
refit = True,
initial_train_size = len(data.loc[:end_train]),
fixed_train_size = False,
return_best = True,
random_state = 123,
verbose = False
)
results_grid.head(4)
Number of models compared: 10.
loop lags_grid: 100%|███████████████████████████████████████| 2/2 [00:07<00:00, 3.88s/it]
`Forecaster` refitted using the best-found lags and parameters, and the whole data set: Lags: [1 2 3 4 5] Parameters: {'n_estimators': 77, 'max_depth': 17} Backtesting metric: 0.03147248676391345
lags | params | mean_squared_error | max_depth | n_estimators | |
---|---|---|---|---|---|
6 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | {'max_depth': 5, 'n_estimators': 50} | 0.033449 | 5 | 50 |
8 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | {'max_depth': 10, 'n_estimators': 50} | 0.039221 | 10 | 50 |
11 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | {'max_depth': 15, 'n_estimators': 100} | 0.039266 | 15 | 100 |
7 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | {'max_depth': 5, 'n_estimators': 100} | 0.039526 | 5 | 100 |
Bayesian search¶
Grid and random search can generate good results, especially when the search range is narrowed down. However, neither of them takes into account the results obtained so far, which prevents them from focusing the search on the regions of greatest interest while avoiding unnecessary ones.
An alternative is to use Bayesian optimization methods to search for hyperparameters. In general terms, bayesian hyperparameter optimization consists of creating a probabilistic model in which the objective function is the model validation metric (RMSE, AUC, accuracy...). With this strategy, the search is redirected at each iteration to the regions of greatest interest. The ultimate goal is to reduce the number of hyperparameter combinations with which the model is evaluated, choosing only the best candidates. This approach is particularly advantageous when the search space is very large or the model evaluation is very slow.
It is worth noting that, in the context of skforecast, bayesian search is only applied to the hyperparameters of the model, and not to the lags, as all lags specified by the user are evaluated.
Skforecast offers two Bayesian optimization engines: Scikit-Optimize and Optuna. It is worth noting that bayesian search is only applied to the hyperparameters of the model, and not to the lags, as all lags specified by the user are evaluated.
Optuna¶
In skforecast, Bayesian optimization with Optuna is performed using its Study object. The objective of the optimization is to minimize the metric generated by backtesting.
Additional parameters can be included by passing a dictionary to kwargs_create_study
and kwargs_study_optimize
arguments to create_study and optimize method, respectively. These arguments are used to configure the study object and optimization algorithm.
To use Optuna in skforecast, the search_space
argument must be a python function that defines the hyperparameters to optimize over. Optuna uses the Trial object object to generate each search space.
# Bayesian search hyperparameters and lags with Optuna
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 10 # Placeholder, the value will be overwritten
)
# Lags used as predictors
lags_grid = [3, 5]
# Regressor hyperparameters search space
def search_space(trial):
search_space = {'n_estimators' : trial.suggest_int('n_estimators', 10, 20),
'min_samples_leaf' : trial.suggest_int('min_samples_leaf', 1., 10),
'max_features' : trial.suggest_categorical('max_features', ['log2', 'sqrt'])
}
return search_space
results, frozen_trial = bayesian_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_val, 'y'],
lags_grid = lags_grid,
search_space = search_space,
steps = 12,
metric = 'mean_absolute_error',
refit = True,
initial_train_size = len(data.loc[:end_train]),
fixed_train_size = True,
n_trials = 10,
random_state = 123,
return_best = False,
verbose = False,
engine = 'optuna',
kwargs_create_study = {},
kwargs_study_optimize = {}
)
results_grid.head(4)
Number of models compared: 20, 10 bayesian search in each lag configuration.
loop lags_grid: 100%|███████████████████████████████████████| 2/2 [00:03<00:00, 1.81s/it]
lags | params | mean_squared_error | max_depth | n_estimators | |
---|---|---|---|---|---|
6 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | {'max_depth': 5, 'n_estimators': 50} | 0.033449 | 5 | 50 |
8 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | {'max_depth': 10, 'n_estimators': 50} | 0.039221 | 10 | 50 |
11 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | {'max_depth': 15, 'n_estimators': 100} | 0.039266 | 15 | 100 |
7 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | {'max_depth': 5, 'n_estimators': 100} | 0.039526 | 5 | 100 |
frozen_trial
contains information of the trial which achived the best results: See more in Study class.
frozen_trial
FrozenTrial(number=0, state=TrialState.COMPLETE, values=[0.15057479525044887], datetime_start=datetime.datetime(2023, 4, 7, 0, 2, 3, 982299), datetime_complete=datetime.datetime(2023, 4, 7, 0, 2, 4, 256723), params={'n_estimators': 17, 'min_samples_leaf': 3, 'max_features': 'sqrt'}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'n_estimators': IntDistribution(high=20, log=False, low=10, step=1), 'min_samples_leaf': IntDistribution(high=10, log=False, low=1, step=1), 'max_features': CategoricalDistribution(choices=('log2', 'sqrt'))}, trial_id=0, value=None)
Scikit-optimize¶
Skopt performs bayesian optimization with Gaussian processes. This is done with the function gp_minimize where the objective value to be minimized is calculated by backtesting.