model_selection¶
skforecast.model_selection.model_selection.backtesting_forecaster(forecaster, y, steps, metric, initial_train_size, exog=None, refit=False, interval=None, n_boot=500, in_sample_residuals=True, verbose=False)Backtesting of forecaster model.
If refit is False, the model is trained only once using the initial_train_size
first observations. If refit is True, the model is trained in each iteration
increasing the training set.
forecaster(ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput) — Forecaster model.y(pandas Series) — Training time series values.steps(int) — Number of steps to predict.metric({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.initial_train_size(int, default `None`) — Number of samples in the initial train split. IfNoneandforecasteris already trained, no initial train is done and all data is used to evaluate the model. However, the firstlen(forecaster.last_window)observations are needed to create the initial predictors. Therefore, no predictions are calculated for them.
Noneis only allowed whenrefitis False.exog(panda Series, pandas DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asyand should be aligned so that y[i] is regressed on exog[i].refit(bool, default False) — Whether to re-fit the forecaster in each iteration.interval(list, default `None`) — Confidence of the prediction interval estimated. Sequence of percentiles to compute, which must be between 0 and 100 inclusive. IfNone, no intervals are estimated. Only available for forecaster of type ForecasterAutoreg and ForecasterAutoregCustom.n_boot(int, default `500`) — Number of bootstrapping iterations used to estimate prediction intervals.in_sample_residuals(bool, default `True`) — IfTrue, residuals from the training data are used as proxy of prediction error to create prediction intervals. IfFalse, out_sample_residuals are used if they are already stored inside the forecaster.verbose(bool, default `False`) — Print number of folds and index of training and validation sets used for backtesting.
Value of the metric.
test_predictions: pandas DataFrame
Value of predictions and their estimated interval if interval is not None.
column pred = predictions.
column lower_bound = lower bound of the interval.
column upper_bound = upper bound interval of the interval.
skforecast.model_selection.model_selection.cv_forecaster(forecaster, y, initial_train_size, steps, metric, exog=None, allow_incomplete_fold=True, verbose=True)Cross-validation of forecaster. The order of data is maintained and the training set increases in each iteration.
forecaster(ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput) — Forecaster model.y(pandas Series) — Training time series values.initial_train_size(int) — Number of samples in the initial train split.steps(int) — Number of steps to predict.metric({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.exog(pandas Series, pandas DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asyand should be aligned so that y[i] is regressed on exog[i].allow_incomplete_fold(bool, default `True`) — The last test partition is allowed to be incomplete if it does not reachstepsobservations. Otherwise, the latest observations are discarded.verbose(bool, default `True`) — Print number of folds used for cross validation.
Value of the metric for each fold.
redictions: pandas DataFrame Predictions.
skforecast.model_selection.model_selection.grid_search_forecaster(forecaster, y, param_grid, initial_train_size, steps, metric, exog=None, lags_grid=None, refit=False, return_best=True, verbose=True)Exhaustive search over specified parameter values for a Forecaster object. Validation is done using time series backtesting.
forecaster(ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiOutput) — Forcaster model.y(pandas Series) — Training time series values.param_grid(dict) — Dictionary with parameters names (str) as keys and lists of parameter settings to try as values.initial_train_size(int) — Number of samples in the initial train split.steps(int) — Number of steps to predict.metric({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.exog(pandas Series, pandas DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asyand should be aligned so that y[i] is regressed on exog[i].lags_grid(list of int, lists, np.narray or range.) — Lists oflagsto try. Only used if forecaster is an instance ofForecasterAutoreg.refit(bool, default False) — Whether to re-fit the forecaster in each iteration of backtesting.return_best(bool) — Refit theforecasterusing the best found parameters on the whole data.verbose(bool, default `True`) — Print number of folds used for cv or backtesting.
Metric value estimated for each combination of parameters.
skforecast.model_selection.model_selection.time_series_splitter(y, initial_train_size, steps, allow_incomplete_fold=True, verbose=True)Split indices of a time series into multiple train-test pairs. The order of is maintained and the training set increases in each iteration.
y(1d numpy ndarray, pandas Series) — Training time series values.initial_train_size(int) — Number of samples in the initial train split.steps(int) — Number of steps to predict.allow_incomplete_fold(bool, default `True`) — The last test set is allowed to be incomplete if it does not reachstepsobservations. Otherwise, the latest observations are discarded.verbose(bool, default `True`) — Print number of splits created.
Training indices.
: 1d numpy ndarray Test indices.