model_selection_statsmodels
¶
skforecast.model_selection_statsmodels.model_selection_statsmodels.
backtesting_sarimax
(
y
, initial_train_size
, steps
, metric
, refit=False
, order=(1, 0, 0)
, seasonal_order=(0, 0, 0, 0)
, trend=None
, alpha=0.05
, exog=None
, sarimax_kwargs={}
, fit_kwargs={'disp': 0}
, verbose=False
)
Backtesting (validation) of SARIMAX
model from statsmodels v0.12. The model
is trained using the initial_train_size
first observations, then, in each
iteration, a number of steps
predictions are evaluated. If refit is True
,
the model is re-fitted in each iteration before making predictions.
https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_forecasting.html
y
(pandas Series) — Time series values.initial_train_size
(int) — Number of samples used in the initial train.steps
(int) — Number of steps to predict.metric
({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.refit
(bool, default False) — Whether to re-fit the model in each iteration.order
(tuple) — The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters. d must be an integer indicating the integration order of the process, while p and q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. Default is an AR(1) model: (1,0,0).seasonal_order
(tuple) — The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity. D must be an integer indicating the integration order of the process, while P and Q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. s is an integer giving the periodicity (number of periods in season), often it is 4 for quarterly data or 12 for monthly data. Default is no seasonal effect.trend
(str {‘n’,’c’,’t’,’ct’}) — Parameter controlling the deterministic trend polynomial A(t). Can be specified as a string where ‘c’ indicates a constant (i.e. a degree zero component of the trend polynomial), ‘t’ indicates a linear trend with time, and ‘ct’ is both. Can also be specified as an iterable defining the non-zero polynomial exponents to include, in increasing order. For example, [1,1,0,1] denotes a+bt+ct3. Default is to not include a trend component.alpha
(float, default 0.05) — The significance level for the confidence interval. The default alpha = .05 returns a 95% confidence interval.exog
(pd.Series, pd.DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asy
and should be aligned so that y[i] is regressed on exog[i].sarimax_kwargs
(dict, default `{}`) — Additional keyword arguments passed to SARIMAX constructor. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAXfit_kwargs
(dict, default `{'disp':0}`) — Additional keyword arguments passed to SARIMAX fit. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.fit.html#statsmodels.tsa.statespace.sarimax.SARIMAX.fitverbose
(bool, default `False`) — Print number of folds used for backtesting.
Value of the metric.
test_predictions: pandas DataFrame Values predicted and their estimated interval: column pred = predictions. column lower = lower bound of the interval. column upper = upper bound interval of the interval.
skforecast.model_selection_statsmodels.model_selection_statsmodels.
cv_sarimax_statsmodels
(
y
, initial_train_size
, steps
, metric
, order=(1, 0, 0)
, seasonal_order=(0, 0, 0, 0)
, trend=None
, alpha=0.05
, exog=None
, allow_incomplete_fold=True
, sarimax_kwargs={}
, fit_kwargs={'disp': 0}
, verbose=False
)
Cross-validation of SARIMAX
model from statsmodels v0.12. The order of data
is maintained and the training set increases in each iteration.
y
(pandas Series) — Time series values.initial_train_size
(int) — Number of samples in the initial train split.steps
(int) — Number of steps to predict.metric
({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.order
(tuple) — The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters. d must be an integer indicating the integration order of the process, while p and q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. Default is an AR(1) model: (1,0,0).seasonal_order
(tuple) — The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity. D must be an integer indicating the integration order of the process, while P and Q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. s is an integer giving the periodicity (number of periods in season), often it is 4 for quarterly data or 12 for monthly data. Default is no seasonal effect.trend
(str {‘n’,’c’,’t’,’ct’}) — Parameter controlling the deterministic trend polynomial A(t). Can be specified as a string where ‘c’ indicates a constant (i.e. a degree zero component of the trend polynomial), ‘t’ indicates a linear trend with time, and ‘ct’ is both. Can also be specified as an iterable defining the non-zero polynomial exponents to include, in increasing order. For example, [1,1,0,1] denotes a+bt+ct3. Default is to not include a trend component.alpha
(float, default 0.05) — The significance level for the confidence interval. The default alpha = .05 returns a 95% confidence interval.exog
(pandas Series, pandas DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asy
and should be aligned so that y[i] is regressed on exog[i].sarimax_kwargs
(dict, default {}) — Additional keyword arguments passed to SARIMAX initialization. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAXfit_kwargs
(dict, default `{'disp':0}`) — Additional keyword arguments passed to SARIMAX fit. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.fit.html#statsmodels.tsa.statespace.sarimax.SARIMAX.fitverbose
(bool, default `False`) — Print number of folds used for cross-validation.
Value of the metric for each partition.
redictions: pandas DataFrame Values predicted and their estimated interval: column pred = predictions. column lower = lower bound of the interval. column upper = upper bound interval of the interval.
skforecast.model_selection_statsmodels.model_selection_statsmodels.
grid_search_sarimax
(
y
, param_grid
, initial_train_size
, steps
, metric
, exog=None
, refit=False
, sarimax_kwargs={}
, fit_kwargs={'disp': 0}
, verbose=False
)
Exhaustive search over specified parameter values for a SARIMAX
model from
statsmodels v0.12. Validation is done using time series cross-validation or
backtesting.
y
(pandas Series) — Time series values.param_grid
(dict) — Dictionary with parameters names (str
) as keys and lists of parameter settings to try as values. Allowed parameters in the grid are: order, seasonal_order and trend.initial_train_size
(int) — Number of samples used in the initial train.steps
(int) — Number of steps to predict.metric
({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.exog
(np.ndarray, pd.Series, pd.DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asy
and should be aligned so that y[i] is regressed on exog[i].refit
(bool, default False) — Whether to re-fit the model in each iteration.sarimax_kwargs
(dict, default `{}`) — Additional keyword arguments passed to SARIMAX initialization. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAXfit_kwargs
(dict, default `{'disp':0}`) — Additional keyword arguments passed to SARIMAX fit. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.fit.html#statsmodels.tsa.statespace.sarimax.SARIMAX.fitverbose
(bool, default `True`) — Print number of folds used for cv or backtesting.
Metric value estimated for each combination of parameters.