model_selection_statsmodels¶
skforecast.model_selection_statsmodels.model_selection_statsmodels.backtesting_sarimax(y, initial_train_size, steps, metric, refit=False, order=(1, 0, 0), seasonal_order=(0, 0, 0, 0), trend=None, alpha=0.05, exog=None, sarimax_kwargs={}, fit_kwargs={'disp': 0}, verbose=False)Backtesting (validation) of SARIMAX model from statsmodels v0.12. The model
is trained using the initial_train_size first observations, then, in each
iteration, a number of steps predictions are evaluated. If refit is True,
the model is re-fitted in each iteration before making predictions.
https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_forecasting.html
y(pandas Series) — Time series values.initial_train_size(int) — Number of samples used in the initial train.steps(int) — Number of steps to predict.metric({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.refit(bool, default False) — Whether to re-fit the model in each iteration.order(tuple) — The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters. d must be an integer indicating the integration order of the process, while p and q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. Default is an AR(1) model: (1,0,0).seasonal_order(tuple) — The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity. D must be an integer indicating the integration order of the process, while P and Q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. s is an integer giving the periodicity (number of periods in season), often it is 4 for quarterly data or 12 for monthly data. Default is no seasonal effect.trend(str {‘n’,’c’,’t’,’ct’}) — Parameter controlling the deterministic trend polynomial A(t). Can be specified as a string where ‘c’ indicates a constant (i.e. a degree zero component of the trend polynomial), ‘t’ indicates a linear trend with time, and ‘ct’ is both. Can also be specified as an iterable defining the non-zero polynomial exponents to include, in increasing order. For example, [1,1,0,1] denotes a+bt+ct3. Default is to not include a trend component.alpha(float, default 0.05) — The significance level for the confidence interval. The default alpha = .05 returns a 95% confidence interval.exog(pd.Series, pd.DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asyand should be aligned so that y[i] is regressed on exog[i].sarimax_kwargs(dict, default `{}`) — Additional keyword arguments passed to SARIMAX constructor. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAXfit_kwargs(dict, default `{'disp':0}`) — Additional keyword arguments passed to SARIMAX fit. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.fit.html#statsmodels.tsa.statespace.sarimax.SARIMAX.fitverbose(bool, default `False`) — Print number of folds used for backtesting.
Value of the metric.
test_predictions: pandas DataFrame Values predicted and their estimated interval: column pred = predictions. column lower = lower bound of the interval. column upper = upper bound interval of the interval.
skforecast.model_selection_statsmodels.model_selection_statsmodels.cv_sarimax(y, initial_train_size, steps, metric, order=(1, 0, 0), seasonal_order=(0, 0, 0, 0), trend=None, alpha=0.05, exog=None, allow_incomplete_fold=True, sarimax_kwargs={}, fit_kwargs={'disp': 0}, verbose=False)Cross-validation of SARIMAX model from statsmodels v0.12. The order of data
is maintained and the training set increases in each iteration.
y(pandas Series) — Time series values.initial_train_size(int) — Number of samples in the initial train split.steps(int) — Number of steps to predict.metric({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.order(tuple) — The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters. d must be an integer indicating the integration order of the process, while p and q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. Default is an AR(1) model: (1,0,0).seasonal_order(tuple) — The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity. D must be an integer indicating the integration order of the process, while P and Q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. s is an integer giving the periodicity (number of periods in season), often it is 4 for quarterly data or 12 for monthly data. Default is no seasonal effect.trend(str {‘n’,’c’,’t’,’ct’}) — Parameter controlling the deterministic trend polynomial A(t). Can be specified as a string where ‘c’ indicates a constant (i.e. a degree zero component of the trend polynomial), ‘t’ indicates a linear trend with time, and ‘ct’ is both. Can also be specified as an iterable defining the non-zero polynomial exponents to include, in increasing order. For example, [1,1,0,1] denotes a+bt+ct3. Default is to not include a trend component.alpha(float, default 0.05) — The significance level for the confidence interval. The default alpha = .05 returns a 95% confidence interval.exog(pandas Series, pandas DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asyand should be aligned so that y[i] is regressed on exog[i].sarimax_kwargs(dict, default {}) — Additional keyword arguments passed to SARIMAX initialization. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAXfit_kwargs(dict, default `{'disp':0}`) — Additional keyword arguments passed to SARIMAX fit. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.fit.html#statsmodels.tsa.statespace.sarimax.SARIMAX.fitverbose(bool, default `False`) — Print number of folds used for cross-validation.
Value of the metric for each partition.
redictions: pandas DataFrame Values predicted and their estimated interval: column pred = predictions. column lower = lower bound of the interval. column upper = upper bound interval of the interval.
skforecast.model_selection_statsmodels.model_selection_statsmodels.grid_search_sarimax(y, param_grid, initial_train_size, steps, metric, exog=None, refit=False, sarimax_kwargs={}, fit_kwargs={'disp': 0}, verbose=False)Exhaustive search over specified parameter values for a SARIMAX model from
statsmodels v0.12. Validation is done using time series cross-validation or
backtesting.
y(pandas Series) — Time series values.param_grid(dict) — Dictionary with parameters names (str) as keys and lists of parameter settings to try as values. Allowed parameters in the grid are: order, seasonal_order and trend.initial_train_size(int) — Number of samples used in the initial train.steps(int) — Number of steps to predict.metric({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.exog(np.ndarray, pd.Series, pd.DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asyand should be aligned so that y[i] is regressed on exog[i].refit(bool, default False) — Whether to re-fit the model in each iteration.sarimax_kwargs(dict, default `{}`) — Additional keyword arguments passed to SARIMAX initialization. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAXfit_kwargs(dict, default `{'disp':0}`) — Additional keyword arguments passed to SARIMAX fit. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.fit.html#statsmodels.tsa.statespace.sarimax.SARIMAX.fitverbose(bool, default `True`) — Print number of folds used for cv or backtesting.
Metric value estimated for each combination of parameters.