model_selection
¶
skforecast.model_selection.model_selection.
backtesting_forecaster
(
forecaster
, y
, steps
, metric
, initial_train_size
, exog=None
, refit=False
, interval=None
, n_boot=500
, in_sample_residuals=True
, set_out_sample_residuals=True
, verbose=False
)
Backtesting of forecaster model.
If refit
is False, the model is trained only once using the initial_train_size
first observations. If refit
is True, the model is trained in each iteration
increasing the training set.
forecaster
(ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput) — Forecaster model.y
(pandas Series) — Training time series values.steps
(int) — Number of steps to predict.metric
({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.initial_train_size
(int, default `None`) — Number of samples in the initial train split. IfNone
andforecaster
is already trained, no initial train is done and all data is used to evaluate the model. However, the firstlen(forecaster.last_window)
observations are needed to create the initial predictors. Therefore, no predictions are calculated for them.
None
is only allowed whenrefit
is False.exog
(panda Series, pandas DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asy
and should be aligned so that y[i] is regressed on exog[i].refit
(bool, default False) — Whether to re-fit the forecaster in each iteration.interval
(list, default `None`) — Confidence of the prediction interval estimated. Sequence of percentiles to compute, which must be between 0 and 100 inclusive. IfNone
, no intervals are estimated. Only available for forecaster of type ForecasterAutoreg and ForecasterAutoregCustom.n_boot
(int, default `500`) — Number of bootstrapping iterations used to estimate prediction intervals.in_sample_residuals
(bool, default `True`) — IfTrue
, residuals from the training data are used as proxy of prediction error to create prediction intervals.set_out_sample_residuals
(bool, default `True`) — Save residuals generated during the cross-validation process as out of sample residuals. Ignored if forecaster is of classForecasterAutoregMultiOutput
.verbose
(bool, default `False`) — Print number of folds used for backtesting.
Value of the metric.
test_predictions: pandas DataFrame
Value of predictions and their estimated interval if interval
is not None
.
column pred = predictions.
column lower_bound = lower bound of the interval.
column upper_bound = upper bound interval of the interval.
skforecast.model_selection.model_selection.
backtesting_forecaster
(
forecaster
, y
, steps
, metric
, initial_train_size
, exog=None
, refit=False
, interval=None
, n_boot=500
, in_sample_residuals=True
, set_out_sample_residuals=True
, verbose=False
)
Backtesting of forecaster model.
If refit
is False, the model is trained only once using the initial_train_size
first observations. If refit
is True, the model is trained in each iteration
increasing the training set.
forecaster
(ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput) — Forecaster model.y
(pandas Series) — Training time series values.steps
(int) — Number of steps to predict.metric
({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.initial_train_size
(int, default `None`) — Number of samples in the initial train split. IfNone
andforecaster
is already trained, no initial train is done and all data is used to evaluate the model. However, the firstlen(forecaster.last_window)
observations are needed to create the initial predictors. Therefore, no predictions are calculated for them.
None
is only allowed whenrefit
is False.exog
(panda Series, pandas DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asy
and should be aligned so that y[i] is regressed on exog[i].refit
(bool, default False) — Whether to re-fit the forecaster in each iteration.interval
(list, default `None`) — Confidence of the prediction interval estimated. Sequence of percentiles to compute, which must be between 0 and 100 inclusive. IfNone
, no intervals are estimated. Only available for forecaster of type ForecasterAutoreg and ForecasterAutoregCustom.n_boot
(int, default `500`) — Number of bootstrapping iterations used to estimate prediction intervals.in_sample_residuals
(bool, default `True`) — IfTrue
, residuals from the training data are used as proxy of prediction error to create prediction intervals.set_out_sample_residuals
(bool, default `True`) — Save residuals generated during the cross-validation process as out of sample residuals. Ignored if forecaster is of classForecasterAutoregMultiOutput
.verbose
(bool, default `False`) — Print number of folds used for backtesting.
Value of the metric.
test_predictions: pandas DataFrame
Value of predictions and their estimated interval if interval
is not None
.
column pred = predictions.
column lower_bound = lower bound of the interval.
column upper_bound = upper bound interval of the interval.
skforecast.model_selection.model_selection.
cv_forecaster
(
forecaster
, y
, initial_train_size
, steps
, metric
, exog=None
, allow_incomplete_fold=True
, set_out_sample_residuals=True
, verbose=True
)
Cross-validation of forecaster. The order of data is maintained and the training set increases in each iteration.
forecaster
(ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput) — Forecaster model.y
(pandas Series) — Training time series values.initial_train_size
(int) — Number of samples in the initial train split.steps
(int) — Number of steps to predict.metric
({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.exog
(pandas Series, pandas DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asy
and should be aligned so that y[i] is regressed on exog[i].allow_incomplete_fold
(bool, default `True`) — The last test partition is allowed to be incomplete if it does not reachsteps
observations. Otherwise, the latest observations are discarded.set_out_sample_residuals
(bool, default `True`) — Save residuals generated during the cross-validation process as out of sample residuals.verbose
(bool, default `True`) — Print number of folds used for cross validation.
Value of the metric for each fold.
redictions: pandas DataFrame Predictions.
skforecast.model_selection.model_selection.
grid_search_forecaster
(
forecaster
, y
, param_grid
, initial_train_size
, steps
, metric
, exog=None
, lags_grid=None
, refit=False
, return_best=True
, verbose=True
)
Exhaustive search over specified parameter values for a Forecaster object. Validation is done using time series backtesting.
forecaster
(ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput) — Forcaster model.y
(pandas Series) — Training time series values.param_grid
(dict) — Dictionary with parameters names (str
) as keys and lists of parameter settings to try as values.initial_train_size
(int) — Number of samples in the initial train split.steps
(int) — Number of steps to predict.metric
({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.exog
(pandas Series, pandas DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asy
and should be aligned so that y[i] is regressed on exog[i].lags_grid
(list of int, lists, np.narray or range.) — Lists oflags
to try. Only used if forecaster is an instance ofForecasterAutoreg
.refit
(bool, default False) — Whether to re-fit the forecaster in each iteration of backtesting.return_best
(bool) — Refit theforecaster
using the best found parameters on the whole data.verbose
(bool, default `True`) — Print number of folds used for cv or backtesting.
Metric value estimated for each combination of parameters.
skforecast.model_selection.model_selection.
time_series_spliter
(
y
, initial_train_size
, steps
, allow_incomplete_fold=True
, verbose=True
)
Split indices of a time series into multiple train-test pairs. The order of is maintained and the training set increases in each iteration.
y
(1d numpy ndarray, pandas Series) — Training time series values.initial_train_size
(int) — Number of samples in the initial train split.steps
(int) — Number of steps to predict.allow_incomplete_fold
(bool, default `True`) — The last test set is allowed to be incomplete if it does not reachsteps
observations. Otherwise, the latest observations are discarded.verbose
(bool, default `True`) — Print number of splits created.
Training indices.
: 1d numpy ndarray Test indices.