Skip to content

model_selection_statsmodels

function
skforecast.model_selection_statsmodels.backtesting_autoreg_statsmodels(y, lags, initial_train_size, steps, metric, exog=None, verbose=False)

Backtesting (validation) of AutoReg model from statsmodels v0.12. The model is trained only once using the initial_train_size first observations. In each iteration, a number of steps predictions are evaluated. This evaluation is much faster than cross-validation since the model is trained only once.

Parameters
  • y (1D np.ndarray, pd.Series) Training time series values.
  • lags (int, list) The number of lags to include in the model if an integer or the list of lag indices to include. For example, [1, 4] will only include lags 1 and 4 while lags=4 will include lags 1, 2, 3, and 4.
  • initial_train_size (int) Number of samples in the initial train split.
  • steps (int) Number of steps to predict.
  • metric ({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) Metric used to quantify the goodness of fit of the model.
  • exog (np.ndarray, pd.Series, pd.DataFrame, default `None`) Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i].
  • verbose (bool, default `False`) Print number of folds used for backtesting.
Returns (metric_value: 1D np.ndarray)

Value of the metric.

test_predictions: 1D np.ndarray Value of predictions.

function
skforecast.model_selection_statsmodels.cv_autoreg_statsmodels(y, lags, initial_train_size, steps, metric, exog=None, allow_incomplete_fold=True, verbose=False)

Cross-validation of AutoReg model from statsmodels v0.12. The order of data is maintained and the training set increases in each iteration.

Parameters
  • y (1D np.ndarray, pd.Series) Training time series values.
  • lags (int, list) The number of lags to include in the model if an integer or the list of lag indices to include. For example, [1, 4] will only include lags 1 and 4 while lags=4 will include lags 1, 2, 3, and 4.
  • initial_train_size (int) Number of samples in the initial train split.
  • steps (int) Number of steps to predict.
  • metric ({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) Metric used to quantify the goodness of fit of the model.
  • exog (np.ndarray, pd.Series, pd.DataFrame, default `None`) Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i].
  • verbose (bool, default `False`) Print number of folds used for cross-validation.
Returns (cv_metrics: 1D np.ndarray)

Value of the metric for each fold.

redictions: 1D np.ndarray Predictions.

function
skforecast.model_selection_statsmodels.backtesting_sarimax_statsmodels(y, initial_train_size, steps, metric, order=(1, 0, 0), seasonal_order=(0, 0, 0, 0), trend=None, alpha=0.05, exog=None, sarimax_kwargs={}, fit_kwargs={'disp': 0}, verbose=False)

Backtesting (validation) of SARIMAX model from statsmodels v0.12. The model is trained only once using the initial_train_size first observations. In each iteration, a number of steps predictions are evaluated. This evaluation is much faster than cross-validation since the model is trained only once.

https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_forecasting.html

Parameters
  • y (1D np.ndarray, pd.Series) Training time series values.
  • initial_train_size (int) Number of samples in the initial train split.
  • steps (int) Number of steps to predict.
  • metric ({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) Metric used to quantify the goodness of fit of the model.
  • order (tuple) The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters. d must be an integer indicating the integration order of the process, while p and q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. Default is an AR(1) model: (1,0,0).
  • seasonal_order (tuple) The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity. D must be an integer indicating the integration order of the process, while P and Q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. s is an integer giving the periodicity (number of periods in season), often it is 4 for quarterly data or 12 for monthly data. Default is no seasonal effect.
  • trend (str {‘n’,’c’,’t’,’ct’}) Parameter controlling the deterministic trend polynomial A(t). Can be specified as a string where ‘c’ indicates a constant (i.e. a degree zero component of the trend polynomial), ‘t’ indicates a linear trend with time, and ‘ct’ is both. Can also be specified as an iterable defining the non-zero polynomial exponents to include, in increasing order. For example, [1,1,0,1] denotes a+bt+ct3. Default is to not include a trend component.
  • alpha (float, default 0.05) The significance level for the confidence interval. The default alpha = .05 returns a 95% confidence interval.
  • exog (np.ndarray, pd.Series, pd.DataFrame, default `None`) Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i].
  • sarimax_kwargs (dict, default `{}`) Additional keyword arguments passed to SARIMAX constructor. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAX
  • fit_kwargs (dict, default `{'disp':0}`) Additional keyword arguments passed to SARIMAX fit. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.fit.html#statsmodels.tsa.statespace.sarimax.SARIMAX.fit
  • verbose (bool, default `False`) Print number of folds used for backtesting.
Returns (metric_value: np.ndarray shape (1,))

Value of the metric.

test_predictions: 1D np.ndarray 2D np.ndarray with predicted value and their estimated interval. Column 0 = predictions Column 1 = lower bound interval Column 2 = upper bound interval.

function
skforecast.model_selection_statsmodels.cv_sarimax_statsmodels(y, initial_train_size, steps, metric, order=(1, 0, 0), seasonal_order=(0, 0, 0, 0), trend=None, alpha=0.05, exog=None, allow_incomplete_fold=True, sarimax_kwargs={}, fit_kwargs={'disp': 0}, verbose=False)

Cross-validation of SARIMAX model from statsmodels v0.12. The order of data is maintained and the training set increases in each iteration.

Parameters
  • y (1D np.ndarray, pd.Series) Training time series values.
  • initial_train_size (int) Number of samples in the initial train split.
  • steps (int) Number of steps to predict.
  • metric ({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) Metric used to quantify the goodness of fit of the model.
  • order (tuple) The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters. d must be an integer indicating the integration order of the process, while p and q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. Default is an AR(1) model: (1,0,0).
  • seasonal_order (tuple) The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity. D must be an integer indicating the integration order of the process, while P and Q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. s is an integer giving the periodicity (number of periods in season), often it is 4 for quarterly data or 12 for monthly data. Default is no seasonal effect.
  • trend (str {‘n’,’c’,’t’,’ct’}) Parameter controlling the deterministic trend polynomial A(t). Can be specified as a string where ‘c’ indicates a constant (i.e. a degree zero component of the trend polynomial), ‘t’ indicates a linear trend with time, and ‘ct’ is both. Can also be specified as an iterable defining the non-zero polynomial exponents to include, in increasing order. For example, [1,1,0,1] denotes a+bt+ct3. Default is to not include a trend component.
  • alpha (float, default 0.05) The significance level for the confidence interval. The default alpha = .05 returns a 95% confidence interval.
  • exog (np.ndarray, pd.Series, pd.DataFrame, default `None`) Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i].
  • sarimax_kwargs (dict, default {}) Additional keyword arguments passed to SARIMAX initialization. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAX
  • fit_kwargs (dict, default `{'disp':0}`) Additional keyword arguments passed to SARIMAX fit. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.fit.html#statsmodels.tsa.statespace.sarimax.SARIMAX.fit
  • verbose (bool, default `False`) Print number of folds used for cross-validation.
Returns (cv_metrics: 1D np.ndarray)

Value of the metric for each partition.

redictions: np.ndarray 2D np.ndarray with predicted value and their estimated interval. Column 0 = predictions Column 1 = lower bound interval Column 2 = upper bound interval.

function
skforecast.model_selection_statsmodels.grid_search_sarimax_statsmodels(y, param_grid, initial_train_size, steps, metric, exog=None, method='cv', allow_incomplete_fold=True, sarimax_kwargs={}, fit_kwargs={'disp': 0}, verbose=False)

Exhaustive search over specified parameter values for a SARIMAX model from statsmodels v0.12. Validation is done using time series cross-validation or backtesting.

Parameters
  • y (1D np.ndarray, pd.Series) Training time series values.
  • param_grid (dict) Dictionary with parameters names (str) as keys and lists of parameter settings to try as values. Allowed parameters in the grid are: order, seasonal_order and trend.
  • initial_train_size (int) Number of samples in the initial train split.
  • steps (int) Number of steps to predict.
  • metric ({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) Metric used to quantify the goodness of fit of the model.
  • exog (np.ndarray, pd.Series, pd.DataFrame, default `None`) Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i].
  • method ({'cv', 'backtesting'}) Method used to estimate the metric for each parameter combination. 'cv' for time series crosvalidation and 'backtesting' for simple backtesting. 'backtesting' is much faster since the model is fitted only once.
  • allow_incomplete_fold (bool, default `True`) The last test set is allowed to be incomplete if it does not reach steps observations. Otherwise, the latest observations are discarded.
  • sarimax_kwargs (dict, default `{}`) Additional keyword arguments passed to SARIMAX initialization. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAX
  • fit_kwargs (dict, default `{'disp':0}`) Additional keyword arguments passed to SARIMAX fit. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.fit.html#statsmodels.tsa.statespace.sarimax.SARIMAX.fit
  • verbose (bool, default `True`) Print number of folds used for cv or backtesting.
  • return_best (bool) Refit the forecaster using the best found parameters on the whole data.
Returns (results: pandas.DataFrame)

Metric value estimated for each combination of parameters.