Hyperparameter tuning and lags selection¶
Hyperparameter tuning is a crucial aspect of developing accurate and effective machine learning models. In machine learning, hyperparameters are values that cannot be learned from data and must be set by the user before the model is trained. These hyperparameters can significantly impact the performance of the model, and tuning them carefully can improve its accuracy and generalization to new data. In the case of forecasting models, the lags included in the model can be considered as an additional hyperparameter.
Hyperparameter tuning involves systematically testing different values or combinations of hyperparameters (including lags) to find the optimal configuration that produces the best results. The skforecast library offers various hyperparameter tuning strategies, including grid search, random search, and Bayesian search, that can be combined with backtesting or one-step-ahead validation to identify the optimal combination of lags and hyperparameters that achieve the best prediction performance.
💡 Tip
The computational cost of hyperparameter tuning depends heavily on the backtesting approach chosen to evaluate each hyperparameter combination. In general, the duration of the tuning process increases with the number of re-trains involved in the backtesting.
To effectively speed up the prototyping phase, it is highly recommended to adopt a two-step strategy. First, use refit=False
during the initial search to narrow down the range of values. Then, focus on the identified region of interest and apply a tailored backtesting strategy that meets the specific requirements of the use case.
For additional tips on backtesting strategies, refer to the following resource: Which backtesting strategy should I use?.
✎ Note
All backtesting and grid search functions have been extended to include the n_jobs
argument, allowing multi-process parallelization for improved performance. This applies to all functions of the different model_selection
modules.
The benefits of parallelization depend on several factors, including the regressor used, the number of fits to be performed, and the volume of data involved. When the n_jobs
parameter is set to 'auto'
, the level of parallelization is automatically selected based on heuristic rules that aim to choose the best option for each scenario.
For a more detailed look at parallelization, visit Parallelization in skforecast.
Libraries and data¶
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from lightgbm import LGBMRegressor
from sklearn.metrics import mean_squared_error
from skforecast.datasets import fetch_dataset
from skforecast.recursive import ForecasterRecursive
from skforecast.model_selection import TimeSeriesFold
from skforecast.model_selection import OneStepAheadFold
from skforecast.model_selection import grid_search_forecaster
from skforecast.model_selection import random_search_forecaster
from skforecast.model_selection import bayesian_search_forecaster
# Download data
# ==============================================================================
data = fetch_dataset(
name="h2o", raw=True, kwargs_read_csv={"names": ["y", "datetime"], "header": 0}
)
# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y-%m-%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data[['y']]
data = data.sort_index()
# Train-val-test dates
# ==============================================================================
end_train = '2001-01-01 23:59:00'
end_val = '2006-01-01 23:59:00'
print(
f"Train dates : {data.index.min()} --- {data.loc[:end_train].index.max()}"
f" (n={len(data.loc[:end_train])})"
)
print(
f"Validation dates : {data.loc[end_train:].index.min()} --- {data.loc[:end_val].index.max()}"
f" (n={len(data.loc[end_train:end_val])})"
)
print(
f"Test dates : {data.loc[end_val:].index.min()} --- {data.index.max()}"
f" (n={len(data.loc[end_val:])})"
)
# Plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data.loc[:end_train].plot(ax=ax, label='train')
data.loc[end_train:end_val].plot(ax=ax, label='validation')
data.loc[end_val:].plot(ax=ax, label='test')
ax.legend();
h2o --- Monthly expenditure ($AUD) on corticosteroid drugs that the Australian health system had between 1991 and 2008. Hyndman R (2023). fpp3: Data for Forecasting: Principles and Practice(3rd Edition). http://pkg.robjhyndman.com/fpp3package/,https://github.com/robjhyndman /fpp3package, http://OTexts.com/fpp3. Shape of the dataset: (204, 2) Train dates : 1991-07-01 00:00:00 --- 2001-01-01 00:00:00 (n=115) Validation dates : 2001-02-01 00:00:00 --- 2006-01-01 00:00:00 (n=60) Test dates : 2006-02-01 00:00:00 --- 2008-06-01 00:00:00 (n=29)
Grid search¶
Grid search is a popular hyperparameter tuning technique that evaluate an exaustive list of combinations of hyperparameters and lags to find the optimal configuration for a forecasting model. To perform a grid search with the Skforecast library, two grids are needed: one with different lags (lags_grid
) and another with the hyperparameters (param_grid
).
The grid search process involves the following steps:
grid_search_forecaster
replaces thelags
argument with the first option appearing inlags_grid
.The function validates all combinations of hyperparameters presented in
param_grid
using backtesting.The function repeats these two steps until it has evaluated all possible combinations of lags and hyperparameters.
If
return_best = True
, the original forecaster is trained with the best lags and hyperparameters configuration found during the grid search process.
# Grid search hyperparameters and lags
# ==============================================================================
forecaster = ForecasterRecursive(
regressor = LGBMRegressor(random_state=123, verbose=-1),
lags = 10 # Placeholder, the value will be overwritten
)
# Lags used as predictors
lags_grid = {
'lags_1': 3,
'lags_2': 10,
'lags_3': [1, 2, 3, 20]
}
# Regressor hyperparameters
param_grid = {
'n_estimators': [50, 100],
'max_depth': [5, 10, 15]
}
# Folds
cv = TimeSeriesFold(
steps = 12,
initial_train_size = len(data.loc[:end_train]),
refit = False
)
results = grid_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_val, 'y'],
param_grid = param_grid,
lags_grid = lags_grid,
cv = cv,
metric = 'mean_squared_error',
return_best = True,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
results
lags grid: 0%| | 0/3 [00:00<?, ?it/s]
params grid: 0%| | 0/6 [00:00<?, ?it/s]
`Forecaster` refitted using the best-found lags and parameters, and the whole data set: Lags: [1 2 3] Parameters: {'max_depth': 5, 'n_estimators': 100} Backtesting metric: 0.04387531272712768
lags | lags_label | params | mean_squared_error | max_depth | n_estimators | |
---|---|---|---|---|---|---|
0 | [1, 2, 3] | lags_1 | {'max_depth': 5, 'n_estimators': 100} | 0.043875 | 5 | 100 |
1 | [1, 2, 3] | lags_1 | {'max_depth': 10, 'n_estimators': 100} | 0.043875 | 10 | 100 |
2 | [1, 2, 3] | lags_1 | {'max_depth': 15, 'n_estimators': 100} | 0.043875 | 15 | 100 |
3 | [1, 2, 3, 20] | lags_3 | {'max_depth': 15, 'n_estimators': 100} | 0.044074 | 15 | 100 |
4 | [1, 2, 3, 20] | lags_3 | {'max_depth': 10, 'n_estimators': 100} | 0.044074 | 10 | 100 |
5 | [1, 2, 3, 20] | lags_3 | {'max_depth': 5, 'n_estimators': 100} | 0.044074 | 5 | 100 |
6 | [1, 2, 3] | lags_1 | {'max_depth': 5, 'n_estimators': 50} | 0.045423 | 5 | 50 |
7 | [1, 2, 3] | lags_1 | {'max_depth': 15, 'n_estimators': 50} | 0.045423 | 15 | 50 |
8 | [1, 2, 3] | lags_1 | {'max_depth': 10, 'n_estimators': 50} | 0.045423 | 10 | 50 |
9 | [1, 2, 3, 20] | lags_3 | {'max_depth': 15, 'n_estimators': 50} | 0.046221 | 15 | 50 |
10 | [1, 2, 3, 20] | lags_3 | {'max_depth': 5, 'n_estimators': 50} | 0.046221 | 5 | 50 |
11 | [1, 2, 3, 20] | lags_3 | {'max_depth': 10, 'n_estimators': 50} | 0.046221 | 10 | 50 |
12 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | lags_2 | {'max_depth': 5, 'n_estimators': 100} | 0.047896 | 5 | 100 |
13 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | lags_2 | {'max_depth': 10, 'n_estimators': 100} | 0.047896 | 10 | 100 |
14 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | lags_2 | {'max_depth': 15, 'n_estimators': 100} | 0.047896 | 15 | 100 |
15 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | lags_2 | {'max_depth': 15, 'n_estimators': 50} | 0.051399 | 15 | 50 |
16 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | lags_2 | {'max_depth': 5, 'n_estimators': 50} | 0.051399 | 5 | 50 |
17 | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] | lags_2 | {'max_depth': 10, 'n_estimators': 50} | 0.051399 | 10 | 50 |
Since return_best = True
, the forecaster object is updated with the best configuration found and trained with the whole data set. This means that the final model obtained from grid search will have the best combination of lags and hyperparameters that resulted in the highest performance metric. This final model can then be used for future predictions on new data.
forecaster
ForecasterRecursive
General Information
- Regressor: LGBMRegressor
- Lags: [1 2 3]
- Window features: None
- Window size: 3
- Exogenous included: False
- Weight function included: False
- Differentiation order: None
- Creation date: 2024-11-10 16:55:59
- Last fit date: 2024-11-10 16:56:01
- Skforecast version: 0.14.0
- Python version: 3.11.10
- Forecaster id: None
Exogenous Variables
-
None
Data Transformations
- Transformer for y: None
- Transformer for exog: None
Training Information
- Training range: [Timestamp('1991-07-01 00:00:00'), Timestamp('2006-01-01 00:00:00')]
- Training index type: DatetimeIndex
- Training index frequency: MS
Regressor Parameters
-
{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': 5, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 100, 'n_jobs': None, 'num_leaves': 31, 'objective': None, 'random_state': 123, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0, 'verbose': -1}
Fit Kwargs
-
{}
Random search¶
Random search is another hyperparameter tuning strategy available in the Skforecast library. In contrast to grid search, which tries out all possible combinations of hyperparameters and lags, randomized search samples a fixed number of values from the specified possibilities. The number of combinations that are evaluated is given by n_iter
.
It is important to note that random sampling is only applied to the model hyperparameters, but not to the lags. All lags specified by the user are evaluated.
# Random search hyperparameters and lags
# ==============================================================================
forecaster = ForecasterRecursive(
regressor = LGBMRegressor(random_state=123, verbose=-1),
lags = 10 # Placeholder, the value will be overwritten
)
# Lags used as predictors
lags_grid = [3, 5]
# Regressor hyperparameters
param_distributions = {
'n_estimators': np.arange(start=10, stop=100, step=1, dtype=int),
'max_depth': np.arange(start=5, stop=30, step=1, dtype=int)
}
# Folds
cv = TimeSeriesFold(
steps = 12,
initial_train_size = len(data.loc[:end_train]),
refit = False,
)
results = random_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_val, 'y'],
lags_grid = lags_grid,
param_distributions = param_distributions,
cv = cv,
n_iter = 5,
metric = 'mean_squared_error',
return_best = True,
random_state = 123,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
results.head(4)
lags grid: 0%| | 0/2 [00:00<?, ?it/s]
params grid: 0%| | 0/5 [00:00<?, ?it/s]
`Forecaster` refitted using the best-found lags and parameters, and the whole data set: Lags: [1 2 3 4 5] Parameters: {'n_estimators': 96, 'max_depth': 19} Backtesting metric: 0.04313147793349785
lags | lags_label | params | mean_squared_error | n_estimators | max_depth | |
---|---|---|---|---|---|---|
0 | [1, 2, 3, 4, 5] | [1, 2, 3, 4, 5] | {'n_estimators': 96, 'max_depth': 19} | 0.043131 | 96 | 19 |
1 | [1, 2, 3, 4, 5] | [1, 2, 3, 4, 5] | {'n_estimators': 94, 'max_depth': 28} | 0.043171 | 94 | 28 |
2 | [1, 2, 3, 4, 5] | [1, 2, 3, 4, 5] | {'n_estimators': 77, 'max_depth': 17} | 0.043663 | 77 | 17 |
3 | [1, 2, 3] | [1, 2, 3] | {'n_estimators': 96, 'max_depth': 19} | 0.043868 | 96 | 19 |
Bayesian search¶
Grid and random search can generate good results, especially when the search range is narrowed down. However, neither of them takes into account the results obtained so far, which prevents them from focusing the search on the regions of greatest interest while avoiding unnecessary ones.
An alternative is to use Bayesian optimization methods to search for hyperparameters. In general terms, bayesian hyperparameter optimization consists of creating a probabilistic model in which the objective function is the model validation metric (RMSE, AUC, accuracy...). With this strategy, the search is redirected at each iteration to the regions of greatest interest. The ultimate goal is to reduce the number of hyperparameter combinations with which the model is evaluated, choosing only the best candidates. This approach is particularly advantageous when the search space is very large or the model evaluation is very slow.
⚠ Warning
lags_grid
is no longer required when using bayesian_search_forecaster
since skforecast 0.12.0. The lags
argument is now included in the search_space
. This allows the lags to be optimized along with the other hyperparameters of the regressor in the bayesian search.
In skforecast, Bayesian optimization with Optuna is performed using its Study object. The objective of the optimization is to minimize the metric generated by backtesting.
Additional parameters can be included by passing a dictionary to kwargs_create_study
and kwargs_study_optimize
arguments to create_study and optimize method, respectively. These arguments are used to configure the study object and optimization algorithm.
To use Optuna in skforecast, the search_space
argument must be a python function that defines the hyperparameters to optimize over. Optuna uses the Trial object object to generate each search space.
# Bayesian search hyperparameters and lags with Optuna
# ==============================================================================
forecaster = ForecasterRecursive(
regressor = LGBMRegressor(random_state=123, verbose=-1),
lags = 10 # Placeholder, the value will be overwritten
)
# Search space
def search_space(trial):
search_space = {
'lags' : trial.suggest_categorical('lags', [3, 5]),
'n_estimators' : trial.suggest_int('n_estimators', 10, 20),
'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 10),
'max_features' : trial.suggest_categorical('max_features', ['log2', 'sqrt'])
}
return search_space
# Folds
cv = TimeSeriesFold(
steps = 12,
initial_train_size = len(data.loc[:end_train]),
refit = False,
)
results, best_trial = bayesian_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_val, 'y'],
search_space = search_space,
cv = cv,
metric = 'mean_absolute_error',
n_trials = 10,
random_state = 123,
return_best = False,
n_jobs = 'auto',
verbose = False,
show_progress = True,
kwargs_create_study = {},
kwargs_study_optimize = {}
)
results.head(4)
0%| | 0/10 [00:00<?, ?it/s]
lags | params | mean_absolute_error | n_estimators | min_samples_leaf | max_features | |
---|---|---|---|---|---|---|
0 | [1, 2, 3, 4, 5] | {'n_estimators': 19, 'min_samples_leaf': 3, 'm... | 0.126995 | 19 | 3 | sqrt |
1 | [1, 2, 3] | {'n_estimators': 15, 'min_samples_leaf': 4, 'm... | 0.153278 | 15 | 4 | sqrt |
2 | [1, 2, 3] | {'n_estimators': 13, 'min_samples_leaf': 3, 'm... | 0.160396 | 13 | 3 | sqrt |
3 | [1, 2, 3, 4, 5] | {'n_estimators': 14, 'min_samples_leaf': 5, 'm... | 0.172366 | 14 | 5 | log2 |
best_trial
contains information of the trial which achived the best results. See more in Study class.
# Optuna best trial in the study
# ==============================================================================
best_trial
FrozenTrial(number=7, state=1, values=[0.1269945910624239], datetime_start=datetime.datetime(2024, 11, 10, 16, 56, 2, 465549), datetime_complete=datetime.datetime(2024, 11, 10, 16, 56, 2, 526349), params={'lags': 5, 'n_estimators': 19, 'min_samples_leaf': 3, 'max_features': 'sqrt'}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'lags': CategoricalDistribution(choices=(3, 5)), 'n_estimators': IntDistribution(high=20, log=False, low=10, step=1), 'min_samples_leaf': IntDistribution(high=10, log=False, low=1, step=1), 'max_features': CategoricalDistribution(choices=('log2', 'sqrt'))}, trial_id=7, value=None)
One-step-ahead validation¶
Hyperparameter and lag tuning involves systematically testing different values or combinations of hyperparameters (and/or lags) to find the optimal configuration that gives the best performance. The skforecast library provides two different methods to evaluate each candidate configuration:
Backtesting: In this method, the model predicts several steps ahead in each iteration, using the same forecast horizon and retraining frequency strategy that would be used if the model were deployed. This simulates a real forecasting scenario where the model is retrained and updated over time. More information here.
One-Step Ahead: Evaluates the model using only one-step-ahead predictions. This method is faster because it requires fewer iterations, but it only tests the model's performance in the immediate next time step ($t+1$).
Each method uses a different evaluation strategy, so they may produce different results. However, in the long run, both methods are expected to converge to similar selections of optimal hyperparameters. The one-step-ahead method is much faster than backtesting because it requires fewer iterations, but it only tests the model's performance in the immediate next time step. It is recommended to backtest the final model for a more accurate multi-step performance estimate.
💡 Tip
For a more detailed comparison of the results (execution time and metric) obtained with each strategy, visit Hyperparameters and lags search: backtesting vs one-step-ahead.
# Bayesian search with OneStepAheadFold
# ==============================================================================
forecaster = ForecasterRecursive(
regressor = LGBMRegressor(random_state=123, verbose=-1),
lags = 10 # Placeholder, the value will be overwritten
)
# Search space
def search_space(trial):
search_space = {
'lags' : trial.suggest_categorical('lags', [3, 5]),
'n_estimators' : trial.suggest_int('n_estimators', 10, 20),
'min_samples_leaf': trial.suggest_int('min_samples_leaf', 1, 10),
'max_features' : trial.suggest_categorical('max_features', ['log2', 'sqrt'])
}
return search_space
# Folds
cv = OneStepAheadFold(initial_train_size = len(data.loc[:end_train]))
results, best_trial = bayesian_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_val, 'y'],
search_space = search_space,
cv = cv,
metric = 'mean_absolute_error',
n_trials = 10,
random_state = 123,
return_best = False,
n_jobs = 'auto',
verbose = False,
show_progress = True,
kwargs_create_study = {},
kwargs_study_optimize = {}
)
results.head(4)
c:\Users\jaesc2\Miniconda3\envs\skforecast_py11_2\Lib\site-packages\skforecast\model_selection\_search.py:715: OneStepAheadValidationWarning: One-step-ahead predictions are used for faster model comparison, but they may not fully represent multi-step prediction performance. It is recommended to backtest the final model for a more accurate multi-step performance estimate. You can suppress this warning using: warnings.simplefilter('ignore', category=OneStepAheadValidationWarning) warnings.warn(
0%| | 0/10 [00:00<?, ?it/s]
lags | params | mean_absolute_error | n_estimators | min_samples_leaf | max_features | |
---|---|---|---|---|---|---|
0 | [1, 2, 3, 4, 5] | {'n_estimators': 20, 'min_samples_leaf': 6, 'm... | 0.180137 | 20 | 6 | log2 |
1 | [1, 2, 3, 4, 5] | {'n_estimators': 14, 'min_samples_leaf': 5, 'm... | 0.180815 | 14 | 5 | log2 |
2 | [1, 2, 3, 4, 5] | {'n_estimators': 16, 'min_samples_leaf': 9, 'm... | 0.187584 | 16 | 9 | log2 |
3 | [1, 2, 3] | {'n_estimators': 14, 'min_samples_leaf': 7, 'm... | 0.188359 | 14 | 7 | log2 |
Hyperparameter tuning with custom metric¶
Besides to the commonly used metrics such as mean_squared_error
, mean_absolute_error
, and mean_absolute_percentage_error
, users have the flexibility to define their own custom metric function, provided that it includes the arguments y_true
(the true values of the series) and y_pred
(the predicted values), and returns a numeric value (either a float
or an int
).
This customizability enables users to evaluate the model's predictive performance in a wide range of scenarios, such as considering only certain months, days, non holiday; or focusing only on the last step of the predicted horizon.
To illustrate this, consider the following example: a 12-month horizon is forecasted, but the interest metric is calculated by considering only the last three months of each year. This is achieved by defining a custom metric function that takes into account only the relevant months, which is then passed as an argument to the backtesting function.
The example below demonstrates how to use hyperparameter optimization to find the optimal parameters for a custom metric that considers only the last three months of each year.
# Custom metric
# ==============================================================================
def custom_metric(y_true, y_pred):
"""
Calculate the mean squared error using only the predicted values of the last
3 months of the year.
"""
mask = y_true.index.month.isin([10, 11, 12])
metric = mean_squared_error(y_true[mask], y_pred[mask])
return metric
# Grid search hyperparameter and lags with custom metric
# ==============================================================================
forecaster = ForecasterRecursive(
regressor = LGBMRegressor(random_state=123, verbose=-1),
lags = 10 # Placeholder, the value will be overwritten
)
# Lags used as predictors
lags_grid = [3, 10, [1, 2, 3, 20]]
# Regressor hyperparameters
param_grid = {
'n_estimators': [50, 100],
'max_depth': [5, 10, 15]
}
# Folds
cv = TimeSeriesFold(
steps = 12,
initial_train_size = len(data.loc[:end_train]),
refit = False,
)
results = grid_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_val, 'y'],
cv = cv,
param_grid = param_grid,
lags_grid = lags_grid,
metric = custom_metric,
return_best = True,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
results.head(4)
lags grid: 0%| | 0/3 [00:00<?, ?it/s]
params grid: 0%| | 0/6 [00:00<?, ?it/s]
`Forecaster` refitted using the best-found lags and parameters, and the whole data set: Lags: [ 1 2 3 20] Parameters: {'max_depth': 15, 'n_estimators': 100} Backtesting metric: 0.0681822427249296
lags | lags_label | params | custom_metric | max_depth | n_estimators | |
---|---|---|---|---|---|---|
0 | [1, 2, 3, 20] | [1, 2, 3, 20] | {'max_depth': 15, 'n_estimators': 100} | 0.068182 | 15 | 100 |
1 | [1, 2, 3, 20] | [1, 2, 3, 20] | {'max_depth': 10, 'n_estimators': 100} | 0.068182 | 10 | 100 |
2 | [1, 2, 3, 20] | [1, 2, 3, 20] | {'max_depth': 5, 'n_estimators': 100} | 0.068182 | 5 | 100 |
3 | [1, 2, 3] | [1, 2, 3] | {'max_depth': 5, 'n_estimators': 100} | 0.070472 | 5 | 100 |
Compare multiple metrics¶
All three functions (grid_search_forecaster
, random_search_forecaster
, and bayesian_search_forecaster
) allow the calculation of multiple metrics for each forecaster configuration if a list is provided. This list may include custom metrics and the best model selection is done based on the first metric of the list.
All three functions (grid_search_forecaster
, random_search_forecaster
, and bayesian_search_forecaster
) enable users to calculate multiple metrics for each forecaster configuration if a list is provided. This list may include any combination of built-in metrics, such as mean_squared_error
, mean_absolute_error
, and mean_absolute_percentage_error
, as well as user-defined custom metrics.
Note that if multiple metrics are specified, these functions will select the best model based on the first metric in the list.
# Grid search hyperparameter and lags with multiple metrics
# ==============================================================================
forecaster = ForecasterRecursive(
regressor = LGBMRegressor(random_state=123, verbose=-1),
lags = 10 # Placeholder, the value will be overwritten
)
# Lags used as predictors
lags_grid = [3, 10, [1, 2, 3, 20]]
# Regressor hyperparameters
param_grid = {
'n_estimators': [50, 100],
'max_depth': [5, 10, 15]
}
# Folds
cv = TimeSeriesFold(
steps = 12,
initial_train_size = len(data.loc[:end_train]),
refit = False,
)
results = grid_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_val, 'y'],
param_grid = param_grid,
lags_grid = lags_grid,
cv = cv,
metric = ['mean_absolute_error', mean_squared_error, custom_metric],
return_best = True,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
results.head(4)
lags grid: 0%| | 0/3 [00:00<?, ?it/s]
params grid: 0%| | 0/6 [00:00<?, ?it/s]
`Forecaster` refitted using the best-found lags and parameters, and the whole data set: Lags: [1 2 3] Parameters: {'max_depth': 5, 'n_estimators': 100} Backtesting metric: 0.18359367014650177
lags | lags_label | params | mean_absolute_error | mean_squared_error | custom_metric | max_depth | n_estimators | |
---|---|---|---|---|---|---|---|---|
0 | [1, 2, 3] | [1, 2, 3] | {'max_depth': 5, 'n_estimators': 100} | 0.183594 | 0.043875 | 0.070472 | 5 | 100 |
1 | [1, 2, 3] | [1, 2, 3] | {'max_depth': 10, 'n_estimators': 100} | 0.183594 | 0.043875 | 0.070472 | 10 | 100 |
2 | [1, 2, 3] | [1, 2, 3] | {'max_depth': 15, 'n_estimators': 100} | 0.183594 | 0.043875 | 0.070472 | 15 | 100 |
3 | [1, 2, 3, 20] | [1, 2, 3, 20] | {'max_depth': 15, 'n_estimators': 100} | 0.184901 | 0.044074 | 0.068182 | 15 | 100 |
Compare multiple regressors¶
The grid search process can be easily extended to compare several machine learning models. This can be achieved by using a simple for loop that iterates over each regressor and applying the grid_search_forecaster
function. This approach allows for a more thorough exploration and can help you select the best model.
# Models to compare
from sklearn.ensemble import RandomForestRegressor
from lightgbm import LGBMRegressor
from sklearn.linear_model import Ridge
models = [
RandomForestRegressor(random_state=123),
LGBMRegressor(random_state=123, verbose=-1),
Ridge(random_state=123)
]
# Hyperparameter to search for each model
param_grids = {
'RandomForestRegressor': {'n_estimators': [50, 100], 'max_depth': [5, 15]},
'LGBMRegressor': {'n_estimators': [20, 50], 'max_depth': [5, 10]},
'Ridge': {'alpha': [0.01, 0.1, 1]}
}
# Lags used as predictors
lags_grid = [3, 5]
# Folds
cv = TimeSeriesFold(
steps = 3,
initial_train_size = len(data.loc[:end_train]),
refit = False,
)
df_results = pd.DataFrame()
for i, model in enumerate(models):
print(f"Grid search for regressor: {model}")
print("-------------------------")
forecaster = ForecasterRecursive(
regressor = model,
lags = 3
)
# Regressor hyperparameters
param_grid = param_grids[list(param_grids)[i]]
results = grid_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_val, 'y'],
param_grid = param_grid,
lags_grid = lags_grid,
cv = cv,
metric = 'mean_squared_error',
return_best = False,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
# Create a column with model name
results['model'] = list(param_grids)[i]
df_results = pd.concat([df_results, results])
df_results = df_results.sort_values(by='mean_squared_error')
df_results.head(10)
Grid search for regressor: RandomForestRegressor(random_state=123) -------------------------
lags grid: 0%| | 0/2 [00:00<?, ?it/s]
params grid: 0%| | 0/4 [00:00<?, ?it/s]
Grid search for regressor: LGBMRegressor(random_state=123, verbose=-1) -------------------------
lags grid: 0%| | 0/2 [00:00<?, ?it/s]
params grid: 0%| | 0/4 [00:00<?, ?it/s]
Grid search for regressor: Ridge(random_state=123) -------------------------
lags grid: 0%| | 0/2 [00:00<?, ?it/s]
params grid: 0%| | 0/3 [00:00<?, ?it/s]
lags | lags_label | params | mean_squared_error | max_depth | n_estimators | model | alpha | |
---|---|---|---|---|---|---|---|---|
0 | [1, 2, 3, 4, 5] | [1, 2, 3, 4, 5] | {'max_depth': 5, 'n_estimators': 50} | 0.050180 | 5.0 | 50.0 | LGBMRegressor | NaN |
1 | [1, 2, 3, 4, 5] | [1, 2, 3, 4, 5] | {'max_depth': 10, 'n_estimators': 50} | 0.050180 | 10.0 | 50.0 | LGBMRegressor | NaN |
2 | [1, 2, 3] | [1, 2, 3] | {'max_depth': 5, 'n_estimators': 50} | 0.050907 | 5.0 | 50.0 | LGBMRegressor | NaN |
3 | [1, 2, 3] | [1, 2, 3] | {'max_depth': 10, 'n_estimators': 50} | 0.050907 | 10.0 | 50.0 | LGBMRegressor | NaN |
5 | [1, 2, 3] | [1, 2, 3] | {'max_depth': 10, 'n_estimators': 20} | 0.056990 | 10.0 | 20.0 | LGBMRegressor | NaN |
4 | [1, 2, 3] | [1, 2, 3] | {'max_depth': 5, 'n_estimators': 20} | 0.056990 | 5.0 | 20.0 | LGBMRegressor | NaN |
7 | [1, 2, 3, 4, 5] | [1, 2, 3, 4, 5] | {'max_depth': 10, 'n_estimators': 20} | 0.057542 | 10.0 | 20.0 | LGBMRegressor | NaN |
6 | [1, 2, 3, 4, 5] | [1, 2, 3, 4, 5] | {'max_depth': 5, 'n_estimators': 20} | 0.057542 | 5.0 | 20.0 | LGBMRegressor | NaN |
0 | [1, 2, 3] | [1, 2, 3] | {'alpha': 0.01} | 0.059814 | NaN | NaN | Ridge | 0.01 |
1 | [1, 2, 3] | [1, 2, 3] | {'alpha': 0.1} | 0.060078 | NaN | NaN | Ridge | 0.10 |
Saving results to file¶
The results of the hyperparameter search process can be saved to a file by setting the output_file
argument to the desired path. The results will be saved in a tab-separated values (TSV) format containing the hyperparameters, lags, and metrics of each configuration evaluated during the search.
The saving process occurs after each hyperparameter evaluation, which means that if the optimization is stopped in the middle of the process, the logs of the first part of the evaluation have already been stored in the file. This can be useful for further analysis or to keep a record of the tuning process.
# Save results to file
# ==============================================================================
forecaster = ForecasterRecursive(
regressor = LGBMRegressor(random_state=123, verbose=-1),
lags = 10 # Placeholder, the value will be overwritten
)
# Lags used as predictors
lags_grid = [3, 10, [1, 2, 3, 20]]
# Regressor hyperparameters
param_grid = {
'n_estimators': [50, 100],
'max_depth': [5, 10, 15]
}
# Folds
cv = TimeSeriesFold(
steps = 12,
initial_train_size = len(data.loc[:end_train]),
refit = False
)
results = grid_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_val, 'y'],
param_grid = param_grid,
lags_grid = lags_grid,
cv = cv,
metric = 'mean_squared_error',
return_best = True,
n_jobs = 'auto',
verbose = False,
show_progress = True,
output_file = "results_grid_search.txt"
)
lags grid: 0%| | 0/3 [00:00<?, ?it/s]
params grid: 0%| | 0/6 [00:00<?, ?it/s]
`Forecaster` refitted using the best-found lags and parameters, and the whole data set: Lags: [1 2 3] Parameters: {'max_depth': 5, 'n_estimators': 100} Backtesting metric: 0.04387531272712768
# Read results file
# ==============================================================================
pd.read_csv("results_grid_search.txt", sep="\t")
lags | lags_label | params | mean_squared_error | max_depth | n_estimators | |
---|---|---|---|---|---|---|
0 | [1 2 3] | [1 2 3] | {'max_depth': 5, 'n_estimators': 50} | 0.045423 | 5 | 50 |
1 | [1 2 3] | [1 2 3] | {'max_depth': 5, 'n_estimators': 100} | 0.043875 | 5 | 100 |
2 | [1 2 3] | [1 2 3] | {'max_depth': 10, 'n_estimators': 50} | 0.045423 | 10 | 50 |
3 | [1 2 3] | [1 2 3] | {'max_depth': 10, 'n_estimators': 100} | 0.043875 | 10 | 100 |
4 | [1 2 3] | [1 2 3] | {'max_depth': 15, 'n_estimators': 50} | 0.045423 | 15 | 50 |
5 | [1 2 3] | [1 2 3] | {'max_depth': 15, 'n_estimators': 100} | 0.043875 | 15 | 100 |
6 | [ 1 2 3 4 5 6 7 8 9 10] | [ 1 2 3 4 5 6 7 8 9 10] | {'max_depth': 5, 'n_estimators': 50} | 0.051399 | 5 | 50 |
7 | [ 1 2 3 4 5 6 7 8 9 10] | [ 1 2 3 4 5 6 7 8 9 10] | {'max_depth': 5, 'n_estimators': 100} | 0.047896 | 5 | 100 |
8 | [ 1 2 3 4 5 6 7 8 9 10] | [ 1 2 3 4 5 6 7 8 9 10] | {'max_depth': 10, 'n_estimators': 50} | 0.051399 | 10 | 50 |
9 | [ 1 2 3 4 5 6 7 8 9 10] | [ 1 2 3 4 5 6 7 8 9 10] | {'max_depth': 10, 'n_estimators': 100} | 0.047896 | 10 | 100 |
10 | [ 1 2 3 4 5 6 7 8 9 10] | [ 1 2 3 4 5 6 7 8 9 10] | {'max_depth': 15, 'n_estimators': 50} | 0.051399 | 15 | 50 |
11 | [ 1 2 3 4 5 6 7 8 9 10] | [ 1 2 3 4 5 6 7 8 9 10] | {'max_depth': 15, 'n_estimators': 100} | 0.047896 | 15 | 100 |
12 | [ 1 2 3 20] | [ 1 2 3 20] | {'max_depth': 5, 'n_estimators': 50} | 0.046221 | 5 | 50 |
13 | [ 1 2 3 20] | [ 1 2 3 20] | {'max_depth': 5, 'n_estimators': 100} | 0.044074 | 5 | 100 |
14 | [ 1 2 3 20] | [ 1 2 3 20] | {'max_depth': 10, 'n_estimators': 50} | 0.046221 | 10 | 50 |
15 | [ 1 2 3 20] | [ 1 2 3 20] | {'max_depth': 10, 'n_estimators': 100} | 0.044074 | 10 | 100 |
16 | [ 1 2 3 20] | [ 1 2 3 20] | {'max_depth': 15, 'n_estimators': 50} | 0.046221 | 15 | 50 |
17 | [ 1 2 3 20] | [ 1 2 3 20] | {'max_depth': 15, 'n_estimators': 100} | 0.044074 | 15 | 100 |