Skip to content

model_selection

function
skforecast.model_selection.model_selection.backtesting_forecaster(forecaster, y, steps, metric, initial_train_size, exog=None, refit=False, interval=None, n_boot=500, in_sample_residuals=True, set_out_sample_residuals=True, verbose=False)

Backtesting of forecaster model.

If refit is False, the model is trained only once using the initial_train_size first observations. If refit is True, the model is trained in each iteration increasing the training set.

Parameters
  • forecaster (ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput) Forecaster model.
  • y (pandas Series) Training time series values.
  • steps (int) Number of steps to predict.
  • metric ({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) Metric used to quantify the goodness of fit of the model.
  • initial_train_size (int, default `None`) Number of samples in the initial train split. If None and forecaster is already trained, no initial train is done and all data is used to evaluate the model. However, the first len(forecaster.last_window) observations are needed to create the initial predictors. Therefore, no predictions are calculated for them.
    None is only allowed when refit is False.
  • exog (panda Series, pandas DataFrame, default `None`) Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i].
  • refit (bool, default False) Whether to re-fit the forecaster in each iteration.
  • interval (list, default `None`) Confidence of the prediction interval estimated. Sequence of percentiles to compute, which must be between 0 and 100 inclusive. If None, no intervals are estimated. Only available for forecaster of type ForecasterAutoreg and ForecasterAutoregCustom.
  • n_boot (int, default `500`) Number of bootstrapping iterations used to estimate prediction intervals.
  • in_sample_residuals (bool, default `True`) If True, residuals from the training data are used as proxy of prediction error to create prediction intervals.
  • set_out_sample_residuals (bool, default `True`) Save residuals generated during the cross-validation process as out of sample residuals. Ignored if forecaster is of class ForecasterAutoregMultiOutput.
  • verbose (bool, default `False`) Print number of folds and index of training and validation sets used for backtesting.
Returns (metric_value: numpy ndarray shape (1,))

Value of the metric.

test_predictions: pandas DataFrame Value of predictions and their estimated interval if interval is not None. column pred = predictions. column lower_bound = lower bound of the interval. column upper_bound = upper bound interval of the interval.

function
skforecast.model_selection.model_selection.cv_forecaster(forecaster, y, initial_train_size, steps, metric, exog=None, allow_incomplete_fold=True, set_out_sample_residuals=True, verbose=True)

Cross-validation of forecaster. The order of data is maintained and the training set increases in each iteration.

Parameters
  • forecaster (ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput) Forecaster model.
  • y (pandas Series) Training time series values.
  • initial_train_size (int) Number of samples in the initial train split.
  • steps (int) Number of steps to predict.
  • metric ({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) Metric used to quantify the goodness of fit of the model.
  • exog (pandas Series, pandas DataFrame, default `None`) Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i].
  • allow_incomplete_fold (bool, default `True`) The last test partition is allowed to be incomplete if it does not reach steps observations. Otherwise, the latest observations are discarded.
  • set_out_sample_residuals (bool, default `True`) Save residuals generated during the cross-validation process as out of sample residuals.
  • verbose (bool, default `True`) Print number of folds used for cross validation.
Returns (cv_metrics: 1d numpy ndarray)

Value of the metric for each fold.

redictions: pandas DataFrame Predictions.

function
skforecast.model_selection.model_selection.grid_search_forecaster(forecaster, y, param_grid, initial_train_size, steps, metric, exog=None, lags_grid=None, refit=False, return_best=True, verbose=True)

Exhaustive search over specified parameter values for a Forecaster object. Validation is done using time series backtesting.

Parameters
  • forecaster (ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput) Forcaster model.
  • y (pandas Series) Training time series values.
  • param_grid (dict) Dictionary with parameters names (str) as keys and lists of parameter settings to try as values.
  • initial_train_size (int) Number of samples in the initial train split.
  • steps (int) Number of steps to predict.
  • metric ({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) Metric used to quantify the goodness of fit of the model.
  • exog (pandas Series, pandas DataFrame, default `None`) Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i].
  • lags_grid (list of int, lists, np.narray or range.) Lists of lags to try. Only used if forecaster is an instance of ForecasterAutoreg.
  • refit (bool, default False) Whether to re-fit the forecaster in each iteration of backtesting.
  • return_best (bool) Refit the forecaster using the best found parameters on the whole data.
  • verbose (bool, default `True`) Print number of folds used for cv or backtesting.
Returns (results: pandas DataFrame)

Metric value estimated for each combination of parameters.

generator
skforecast.model_selection.model_selection.time_series_spliter(y, initial_train_size, steps, allow_incomplete_fold=True, verbose=True)

Split indices of a time series into multiple train-test pairs. The order of is maintained and the training set increases in each iteration.

Parameters
  • y (1d numpy ndarray, pandas Series) Training time series values.
  • initial_train_size (int) Number of samples in the initial train split.
  • steps (int) Number of steps to predict.
  • allow_incomplete_fold (bool, default `True`) The last test set is allowed to be incomplete if it does not reach steps observations. Otherwise, the latest observations are discarded.
  • verbose (bool, default `True`) Print number of splits created.
Yields (train : 1d numpy ndarray)

Training indices.

: 1d numpy ndarray Test indices.