Skip to content

model_selection_statsmodels

function
skforecast.model_selection_statsmodels.model_selection_statsmodels.backtesting_sarimax(y, initial_train_size, steps, metric, refit=False, order=(1, 0, 0), seasonal_order=(0, 0, 0, 0), trend=None, alpha=0.05, exog=None, sarimax_kwargs={}, fit_kwargs={'disp': 0}, verbose=False)

Backtesting (validation) of SARIMAX model from statsmodels v0.12. The model is trained using the initial_train_size first observations, then, in each iteration, a number of steps predictions are evaluated. If refit is True, the model is re-fitted in each iteration before making predictions.

https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_forecasting.html

Parameters
  • y (pandas Series) Time series values.
  • initial_train_size (int) Number of samples used in the initial train.
  • steps (int) Number of steps to predict.
  • metric ({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) Metric used to quantify the goodness of fit of the model.
  • refit (bool, default False) Whether to re-fit the model in each iteration.
  • order (tuple) The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters. d must be an integer indicating the integration order of the process, while p and q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. Default is an AR(1) model: (1,0,0).
  • seasonal_order (tuple) The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity. D must be an integer indicating the integration order of the process, while P and Q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. s is an integer giving the periodicity (number of periods in season), often it is 4 for quarterly data or 12 for monthly data. Default is no seasonal effect.
  • trend (str {'n', 'c', 't', 'ct'}) Parameter controlling the deterministic trend polynomial A(t). Can be specified as a string where 'c' indicates a constant (i.e. a degree zero component of the trend polynomial), 't' indicates a linear trend with time, and 'ct' is both. Can also be specified as an iterable defining the non-zero polynomial exponents to include, in increasing order. For example, [1,1,0,1] denotes a+bt+ct3. Default is to not include a trend component.
  • alpha (float, default 0.05) The significance level for the confidence interval. The default alpha = .05 returns a 95% confidence interval.
  • exog (pd.Series, pd.DataFrame, default `None`) Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i].
  • sarimax_kwargs (dict, default `{}`) Additional keyword arguments passed to SARIMAX constructor. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAX
  • fit_kwargs (dict, default `{'disp':0}`) Additional keyword arguments passed to SARIMAX fit. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.fit.html#statsmodels.tsa.statespace.sarimax.SARIMAX.fit
  • verbose (bool, default `False`) Print number of folds used for backtesting.
Returns (metric_value: np.ndarray shape (1,))

Value of the metric.

test_predictions: pandas DataFrame Values predicted and their estimated interval: column pred = predictions. column lower = lower bound of the interval. column upper = upper bound interval of the interval.

function
skforecast.model_selection_statsmodels.model_selection_statsmodels.cv_sarimax(y, initial_train_size, steps, metric, order=(1, 0, 0), seasonal_order=(0, 0, 0, 0), trend=None, alpha=0.05, exog=None, allow_incomplete_fold=True, sarimax_kwargs={}, fit_kwargs={'disp': 0}, verbose=False)

Cross-validation of SARIMAX model from statsmodels v0.12. The order of data is maintained and the training set increases in each iteration.

Parameters
  • y (pandas Series) Time series values.
  • initial_train_size (int) Number of samples in the initial train split.
  • steps (int) Number of steps to predict.
  • metric ({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) Metric used to quantify the goodness of fit of the model.
  • order (tuple) The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters. d must be an integer indicating the integration order of the process, while p and q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. Default is an AR(1) model: (1,0,0).
  • seasonal_order (tuple) The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity. D must be an integer indicating the integration order of the process, while P and Q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. s is an integer giving the periodicity (number of periods in season), often it is 4 for quarterly data or 12 for monthly data. Default is no seasonal effect.
  • trend (str {'n', 'c', 't', 'ct'}) Parameter controlling the deterministic trend polynomial A(t). Can be specified as a string where 'c' indicates a constant (i.e. a degree zero component of the trend polynomial), 't' indicates a linear trend with time, and 'ct' is both. Can also be specified as an iterable defining the non-zero polynomial exponents to include, in increasing order. For example, [1,1,0,1] denotes a+bt+ct3. Default is to not include a trend component.
  • alpha (float, default 0.05) The significance level for the confidence interval. The default alpha = .05 returns a 95% confidence interval.
  • exog (pandas Series, pandas DataFrame, default `None`) Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i].
  • sarimax_kwargs (dict, default {}) Additional keyword arguments passed to SARIMAX initialization. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAX
  • fit_kwargs (dict, default `{'disp':0}`) Additional keyword arguments passed to SARIMAX fit. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.fit.html#statsmodels.tsa.statespace.sarimax.SARIMAX.fit
  • verbose (bool, default `False`) Print number of folds used for cross-validation.
Returns (cv_metrics: 1D np.ndarray)

Value of the metric for each partition.

redictions: pandas DataFrame Values predicted and their estimated interval: column pred = predictions. column lower = lower bound of the interval. column upper = upper bound interval of the interval.

function
skforecast.model_selection_statsmodels.model_selection_statsmodels.grid_search_sarimax(y, param_grid, initial_train_size, steps, metric, exog=None, refit=False, sarimax_kwargs={}, fit_kwargs={'disp': 0}, verbose=False)

Exhaustive search over specified parameter values for a SARIMAX model from statsmodels v0.12. Validation is done using time series cross-validation or backtesting.

Parameters
Returns (results: pandas DataFrame)

Metric value estimated for each combination of parameters.