model_selection
¶
skforecast.model_selection.model_selection.backtesting_forecaster(forecaster, y, steps, metric, initial_train_size, fixed_train_size=False, exog=None, refit=False, interval=None, n_boot=500, random_state=123, in_sample_residuals=True, verbose=False, set_out_sample_residuals='deprecated')
¶
Backtesting of forecaster model.
If refit
is False, the model is trained only once using the initial_train_size
first observations. If refit
is True, the model is trained in each iteration
increasing the training set. A copy of the original forecaster is created so
it is not modified during the process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
forecaster |
ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput |
Forecaster model. |
required |
y |
Series |
Training time series values. |
required |
initial_train_size |
Optional[int] |
Number of samples in the initial train split. If
|
required |
fixed_train_size |
bool |
If True, train size doesn't increases but moves by |
False |
steps |
int |
Number of steps to predict. |
required |
metric |
Union[str, <built-in function callable>] |
Metric used to quantify the goodness of fit of the model. If string: {'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'} It callable: Function with arguments y_true, y_pred that returns a float. |
required |
exog |
Union[pandas.core.series.Series, pandas.core.frame.DataFrame] |
Exogenous variable/s included as predictor/s. Must have the same
number of observations as |
None |
refit |
bool |
Whether to re-fit the forecaster in each iteration. |
False |
interval |
Optional[list] |
Confidence of the prediction interval estimated. Sequence of percentiles
to compute, which must be between 0 and 100 inclusive. If |
None |
n_boot |
int |
Number of bootstrapping iterations used to estimate prediction intervals. |
500 |
random_state |
int |
Sets a seed to the random generator, so that boot intervals are always deterministic. |
123 |
in_sample_residuals |
bool |
If |
True |
set_out_sample_residuals |
Any |
Deprecated since version 0.4.2, will be removed on version 0.5.0. |
'deprecated' |
verbose |
bool |
Print number of folds and index of training and validation sets used for backtesting. |
False |
Returns:
Type | Description |
---|---|
Tuple[float, pandas.core.frame.DataFrame] |
Value of the metric. |
Source code in skforecast/model_selection/model_selection.py
def backtesting_forecaster(
forecaster,
y: pd.Series,
steps: int,
metric: Union[str, callable],
initial_train_size: Optional[int],
fixed_train_size: bool=False,
exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
refit: bool=False,
interval: Optional[list]=None,
n_boot: int=500,
random_state: int=123,
in_sample_residuals: bool=True,
verbose: bool=False,
set_out_sample_residuals: Any='deprecated'
) -> Tuple[float, pd.DataFrame]:
'''
Backtesting of forecaster model.
If `refit` is False, the model is trained only once using the `initial_train_size`
first observations. If `refit` is True, the model is trained in each iteration
increasing the training set. A copy of the original forecaster is created so
it is not modified during the process.
Parameters
----------
forecaster : ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput
Forecaster model.
y : pandas Series
Training time series values.
initial_train_size: int, default `None`
Number of samples in the initial train split. If `None` and `forecaster` is already
trained, no initial train is done and all data is used to evaluate the model. However,
the first `len(forecaster.last_window)` observations are needed to create the
initial predictors, so no predictions are calculated for them.
`None` is only allowed when `refit` is False.
fixed_train_size: bool, default `False`
If True, train size doesn't increases but moves by `steps` in each iteration.
steps : int
Number of steps to predict.
metric : str, callable
Metric used to quantify the goodness of fit of the model.
If string:
{'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}
It callable:
Function with arguments y_true, y_pred that returns a float.
exog :panda Series, pandas DataFrame, default `None`
Exogenous variable/s included as predictor/s. Must have the same
number of observations as `y` and should be aligned so that y[i] is
regressed on exog[i].
refit: bool, default False
Whether to re-fit the forecaster in each iteration.
interval: list, default `None`
Confidence of the prediction interval estimated. Sequence of percentiles
to compute, which must be between 0 and 100 inclusive. If `None`, no
intervals are estimated. Only available for forecaster of type ForecasterAutoreg
and ForecasterAutoregCustom.
n_boot: int, default `500`
Number of bootstrapping iterations used to estimate prediction
intervals.
random_state: int, default 123
Sets a seed to the random generator, so that boot intervals are always
deterministic.
in_sample_residuals: bool, default `True`
If `True`, residuals from the training data are used as proxy of
prediction error to create prediction intervals. If `False`, out_sample_residuals
are used if they are already stored inside the forecaster.
set_out_sample_residuals: 'deprecated'
Deprecated since version 0.4.2, will be removed on version 0.5.0.
verbose : bool, default `False`
Print number of folds and index of training and validation sets used for backtesting.
Returns
-------
metric_value: float
Value of the metric.
backtest_predictions: pandas DataFrame
Value of predictions and their estimated interval if `interval` is not `None`.
column pred = predictions.
column lower_bound = lower bound of the interval.
column upper_bound = upper bound interval of the interval.
'''
if initial_train_size is not None and initial_train_size > len(y):
raise Exception(
'If used, `initial_train_size` must be smaller than length of `y`.'
)
if initial_train_size is not None and initial_train_size < forecaster.window_size:
raise Exception(
f"`initial_train_size` must be greater than "
f"forecaster's window_size ({forecaster.window_size})."
)
if initial_train_size is None and not forecaster.fitted:
raise Exception(
'`forecaster` must be already trained if no `initial_train_size` is provided.'
)
if not isinstance(refit, bool):
raise Exception(
f'`refit` must be boolean: True, False.'
)
if initial_train_size is None and refit:
raise Exception(
f'`refit` is only allowed when there is a initial_train_size.'
)
if interval is not None and isinstance(forecaster, ForecasterAutoregMultiOutput):
raise Exception(
('Interval prediction is only available when forecaster is of type '
'ForecasterAutoreg or ForecasterAutoregCustom.')
)
if set_out_sample_residuals != 'deprecated':
warnings.warn(
('`set_out_sample_residuals` is deprecated since version 0.4.2, '
'will be removed on version 0.5.0.')
)
if refit:
metric_value, backtest_predictions = _backtesting_forecaster_refit(
forecaster = forecaster,
y = y,
steps = steps,
metric = metric,
initial_train_size = initial_train_size,
fixed_train_size = fixed_train_size,
exog = exog,
interval = interval,
n_boot = n_boot,
random_state = random_state,
in_sample_residuals = in_sample_residuals,
verbose = verbose
)
else:
metric_value, backtest_predictions = _backtesting_forecaster_no_refit(
forecaster = forecaster,
y = y,
steps = steps,
metric = metric,
initial_train_size = initial_train_size,
exog = exog,
interval = interval,
n_boot = n_boot,
random_state = random_state,
in_sample_residuals = in_sample_residuals,
verbose = verbose
)
return metric_value, backtest_predictions
skforecast.model_selection.model_selection.grid_search_forecaster(forecaster, y, param_grid, steps, metric, initial_train_size, fixed_train_size=False, exog=None, lags_grid=None, refit=False, return_best=True, verbose=True)
¶
Exhaustive search over specified parameter values for a Forecaster object.
Validation is done using time series backtesting.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
forecaster |
ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput |
Forcaster model. |
required |
y |
Series |
Training time series values. |
required |
param_grid |
dict |
Dictionary with parameters names ( |
required |
steps |
int |
Number of steps to predict. |
required |
metric |
Union[str, <built-in function callable>] |
Metric used to quantify the goodness of fit of the model. If string: {'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'} It callable: Function with arguments y_true, y_pred that returns a float. |
required |
initial_train_size |
int |
Number of samples in the initial train split. |
required |
fixed_train_size |
bool |
If True, train size doesn't increases but moves by |
False |
exog |
Union[pandas.core.series.Series, pandas.core.frame.DataFrame] |
Exogenous variable/s included as predictor/s. Must have the same
number of observations as |
None |
lags_grid |
Optional[list] |
Lists of |
None |
refit |
bool |
Whether to re-fit the forecaster in each iteration of backtesting. |
False |
return_best |
bool |
Refit the |
True |
verbose |
bool |
Print number of folds used for cv or backtesting. |
True |
Returns:
Type | Description |
---|---|
DataFrame |
Metric value estimated for each combination of parameters. |
Source code in skforecast/model_selection/model_selection.py
def grid_search_forecaster(
forecaster,
y: pd.Series,
param_grid: dict,
steps: int,
metric: Union[str, callable],
initial_train_size: int,
fixed_train_size: bool=False,
exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
lags_grid: Optional[list]=None,
refit: bool=False,
return_best: bool=True,
verbose: bool=True
) -> pd.DataFrame:
'''
Exhaustive search over specified parameter values for a Forecaster object.
Validation is done using time series backtesting.
Parameters
----------
forecaster : ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput
Forcaster model.
y : pandas Series
Training time series values.
param_grid : dict
Dictionary with parameters names (`str`) as keys and lists of parameter
settings to try as values.
steps : int
Number of steps to predict.
metric : str, callable
Metric used to quantify the goodness of fit of the model.
If string:
{'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}
It callable:
Function with arguments y_true, y_pred that returns a float.
initial_train_size: int
Number of samples in the initial train split.
fixed_train_size: bool, default `False`
If True, train size doesn't increases but moves by `steps` in each iteration.
exog : pandas Series, pandas DataFrame, default `None`
Exogenous variable/s included as predictor/s. Must have the same
number of observations as `y` and should be aligned so that y[i] is
regressed on exog[i].
lags_grid : list of int, lists, np.narray or range.
Lists of `lags` to try. Only used if forecaster is an instance of
`ForecasterAutoreg`.
refit: bool, default False
Whether to re-fit the forecaster in each iteration of backtesting.
return_best : bool
Refit the `forecaster` using the best found parameters on the whole data.
verbose : bool, default `True`
Print number of folds used for cv or backtesting.
Returns
-------
results: pandas DataFrame
Metric value estimated for each combination of parameters.
'''
if isinstance(forecaster, ForecasterAutoregCustom):
if lags_grid is not None:
warnings.warn(
'`lags_grid` ignored if forecaster is an instance of `ForecasterAutoregCustom`.'
)
lags_grid = ['custom predictors']
elif lags_grid is None:
lags_grid = [forecaster.lags]
lags_list = []
params_list = []
metric_list = []
param_grid = list(ParameterGrid(param_grid))
print(
f"Number of models compared: {len(param_grid)*len(lags_grid)}"
)
for lags in tqdm(lags_grid, desc='loop lags_grid', position=0, ncols=90):
if isinstance(forecaster, (ForecasterAutoreg, ForecasterAutoregMultiOutput)):
forecaster.set_lags(lags)
lags = forecaster.lags.copy()
for params in tqdm(param_grid, desc='loop param_grid', position=1, leave=False, ncols=90):
forecaster.set_params(**params)
metrics = backtesting_forecaster(
forecaster = forecaster,
y = y,
exog = exog,
steps = steps,
metric = metric,
initial_train_size = initial_train_size,
fixed_train_size = fixed_train_size,
refit = refit,
interval = None,
verbose = verbose
)[0]
lags_list.append(lags)
params_list.append(params)
metric_list.append(metrics)
results = pd.DataFrame({
'lags' : lags_list,
'params': params_list,
'metric': metric_list})
results = results.sort_values(by='metric', ascending=True)
results = pd.concat([results, results['params'].apply(pd.Series)], axis=1)
if return_best:
best_lags = results['lags'].iloc[0]
best_params = results['params'].iloc[0]
best_metric = results['metric'].iloc[0]
if isinstance(forecaster, (ForecasterAutoreg, ForecasterAutoregMultiOutput)):
forecaster.set_lags(best_lags)
forecaster.set_params(**best_params)
forecaster.fit(y=y, exog=exog)
print(
f"`Forecaster` refitted using the best-found lags and parameters, and the whole data set: \n"
f" Lags: {best_lags} \n"
f" Parameters: {best_params}\n"
f" Backtesting metric: {best_metric}\n"
)
return results