model_selection¶
skforecast.model_selection.backtesting_forecaster(forecaster, y, steps, metric, initial_train_size, exog=None, set_out_sample_residuals=True, verbose=False)Backtesting (validation) of ForecasterAutoreg, ForecasterAutoregCustom or
ForecasterAutoregMultiOutput object.
The model is trained only once using the initial_train_size first observations.
In each iteration, a number of steps predictions are evaluated.
This evaluation is much faster than cv_forecaster() since the model is
trained only once.
If forecaster is already trained and initial_train_size is None,
no initial train is done and all data is used to evaluate the model.
However, the first len(forecaster.last_window) observations are needed
to create the initial predictors, therefore, no predictions are
calculated for them.
forecaster(ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput) —ForecasterAutoreg,ForecasterAutoregCustomorForecasterAutoregMultiOutputobject.y(1D np.ndarray, pd.Series) — Training time series values.steps(int, None) — Number of steps to predict. Ignored ifforecasteris aForecasterAutoregMultiOutputsince this information is already stored inside it.metric({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.initial_train_size(int, default `None`) — Number of samples in the initial train split. IfNoneandforecasteris already trained, no initial train is done and all data is used to evaluate the model. However, the firstlen(forecaster.last_window)observations are needed to create the initial predictors. Therefore, no predictions are calculated for them.exog(np.ndarray, pd.Series, pd.DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asyand should be aligned so that y[i] is regressed on exog[i].set_out_sample_residuals(bool, default `True`) — Save residuals generated during the cross-validation process as out of sample residuals.verbose(bool, default `False`) — Print number of folds used for backtesting.
Value of the metric.
test_predictions: 1D np.ndarray Value of predictions.
skforecast.model_selection.backtesting_forecaster(forecaster, y, steps, metric, initial_train_size, exog=None, set_out_sample_residuals=True, verbose=False)Backtesting (validation) of ForecasterAutoreg, ForecasterAutoregCustom or
ForecasterAutoregMultiOutput object.
The model is trained only once using the initial_train_size first observations.
In each iteration, a number of steps predictions are evaluated.
This evaluation is much faster than cv_forecaster() since the model is
trained only once.
If forecaster is already trained and initial_train_size is None,
no initial train is done and all data is used to evaluate the model.
However, the first len(forecaster.last_window) observations are needed
to create the initial predictors, therefore, no predictions are
calculated for them.
forecaster(ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput) —ForecasterAutoreg,ForecasterAutoregCustomorForecasterAutoregMultiOutputobject.y(1D np.ndarray, pd.Series) — Training time series values.steps(int, None) — Number of steps to predict. Ignored ifforecasteris aForecasterAutoregMultiOutputsince this information is already stored inside it.metric({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.initial_train_size(int, default `None`) — Number of samples in the initial train split. IfNoneandforecasteris already trained, no initial train is done and all data is used to evaluate the model. However, the firstlen(forecaster.last_window)observations are needed to create the initial predictors. Therefore, no predictions are calculated for them.exog(np.ndarray, pd.Series, pd.DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asyand should be aligned so that y[i] is regressed on exog[i].set_out_sample_residuals(bool, default `True`) — Save residuals generated during the cross-validation process as out of sample residuals.verbose(bool, default `False`) — Print number of folds used for backtesting.
Value of the metric.
test_predictions: 1D np.ndarray Value of predictions.
skforecast.model_selection.cv_forecaster(forecaster, y, initial_train_size, steps, metric, exog=None, allow_incomplete_fold=True, set_out_sample_residuals=True, verbose=True)Cross-validation of ForecasterAutoreg, ForecasterAutoregCustom
or ForecasterAutoregMultiOutput object. The order of data is maintained
and the training set increases in each iteration.
forecaster(ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput) —ForecasterAutoreg,ForecasterAutoregCustomorForecasterAutoregMultiOutputobject.y(1D np.ndarray, pd.Series) — Training time series values.initial_train_size(int) — Number of samples in the initial train split.steps(int, None) — Number of steps to predict. Ignored ifforecasteris aForecasterAutoregMultiOutputsince this information is already stored inside it.metric({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.exog(np.ndarray, pd.Series, pd.DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asyand should be aligned so that y[i] is regressed on exog[i].allow_incomplete_fold(bool, default `True`) — The last test partition is allowed to be incomplete if it does not reachstepsobservations. Otherwise, the latest observations are discarded. This is set automatically set toFalsewhen forecaster isForecasterAutoregMultiOutput.set_out_sample_residuals(bool, default `True`) — Save residuals generated during the cross-validation process as out of sample residuals.verbose(bool, default `True`) — Print number of folds used for cross validation.
Value of the metric for each fold.
redictions: 1D np.ndarray Predictions.
skforecast.model_selection.grid_search_forecaster(forecaster, y, param_grid, initial_train_size, steps, metric, exog=None, lags_grid=None, method='cv', allow_incomplete_fold=True, return_best=True, verbose=True)Exhaustive search over specified parameter values for a Forecaster object. Validation is done using time series cross-validation or backtesting.
forecaster(ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput) —ForecasterAutoreg,ForecasterAutoregCustomorForecasterAutoregMultiOutputobject.y(1D np.ndarray, pd.Series) — Training time series values.param_grid(dict) — Dictionary with parameters names (str) as keys and lists of parameter settings to try as values.initial_train_size(int) — Number of samples in the initial train split.steps(int) — Number of steps to predict.metric({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.exog(np.ndarray, pd.Series, pd.DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asyand should be aligned so that y[i] is regressed on exog[i].lags_grid(list of int, lists, np.narray or range.) — Lists oflagsto try. Only used if forecaster is an instance ofForecasterAutoreg.method({'cv', 'backtesting'}) — Method used to estimate the metric for each parameter combination. 'cv' for time series crosvalidation and 'backtesting' for simple backtesting. 'backtesting' is much faster since the model is fitted only once.allow_incomplete_fold(bool, default `True`) — The last test set is allowed to be incomplete if it does not reachstepsobservations. Otherwise, the latest observations are discarded.return_best(bool) — Refit theforecasterusing the best found parameters on the whole data.verbose(bool, default `True`) — Print number of folds used for cv or backtesting.
Metric value estimated for each combination of parameters.
skforecast.model_selection.time_series_spliter(y, initial_train_size, steps, allow_incomplete_fold=True, verbose=True)Split indices of a time series into multiple train-test pairs. The order of is maintained and the training set increases in each iteration.
y(1D np.ndarray, pd.Series) — Training time series values.initial_train_size(int) — Number of samples in the initial train split.steps(int) — Number of steps to predict.allow_incomplete_fold(bool, default `True`) — The last test set is allowed to be incomplete if it does not reachstepsobservations. Otherwise, the latest observations are discarded.verbose(bool, default `True`) — Print number of splits created.
Training indices.
: 1D np.ndarray Test indices.