model_selection_statsmodels
¶
skforecast.model_selection_statsmodels.
backtesting_autoreg_statsmodels
(
y
, lags
, initial_train_size
, steps
, metric
, exog=None
, verbose=False
)
Backtesting (validation) of AutoReg
model from statsmodels v0.12. The model is
trained only once using the initial_train_size
first observations. In each
iteration, a number of steps
predictions are evaluated. This evaluation is
much faster than cross-validation since the model is trained only once.
y
(1D np.ndarray, pd.Series) — Training time series values.lags
(int, list) — The number of lags to include in the model if an integer or the list of lag indices to include. For example, [1, 4] will only include lags 1 and 4 while lags=4 will include lags 1, 2, 3, and 4.initial_train_size
(int) — Number of samples in the initial train split.steps
(int) — Number of steps to predict.metric
({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.exog
(np.ndarray, pd.Series, pd.DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asy
and should be aligned so that y[i] is regressed on exog[i].verbose
(bool, default `False`) — Print number of folds used for backtesting.
Value of the metric.
test_predictions: 1D np.ndarray Value of predictions.
skforecast.model_selection_statsmodels.
cv_autoreg_statsmodels
(
y
, lags
, initial_train_size
, steps
, metric
, exog=None
, allow_incomplete_fold=True
, verbose=False
)
Cross-validation of AutoReg
model from statsmodels v0.12. The order of data
is maintained and the training set increases in each iteration.
y
(1D np.ndarray, pd.Series) — Training time series values.lags
(int, list) — The number of lags to include in the model if an integer or the list of lag indices to include. For example, [1, 4] will only include lags 1 and 4 while lags=4 will include lags 1, 2, 3, and 4.initial_train_size
(int) — Number of samples in the initial train split.steps
(int) — Number of steps to predict.metric
({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.exog
(np.ndarray, pd.Series, pd.DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asy
and should be aligned so that y[i] is regressed on exog[i].verbose
(bool, default `False`) — Print number of folds used for cross-validation.
Value of the metric for each fold.
redictions: 1D np.ndarray Predictions.
skforecast.model_selection_statsmodels.
backtesting_sarimax_statsmodels
(
y
, initial_train_size
, steps
, metric
, order=(1, 0, 0)
, seasonal_order=(0, 0, 0, 0)
, trend=None
, alpha=0.05
, exog=None
, sarimax_kwargs={}
, fit_kwargs={'disp': 0}
, verbose=False
)
Backtesting (validation) of SARIMAX
model from statsmodels v0.12. The model
is trained only once using the initial_train_size
first observations. In each
iteration, a number of steps
predictions are evaluated. This evaluation is
much faster than cross-validation since the model is trained only once.
https://www.statsmodels.org/dev/examples/notebooks/generated/statespace_forecasting.html
y
(1D np.ndarray, pd.Series) — Training time series values.initial_train_size
(int) — Number of samples in the initial train split.steps
(int) — Number of steps to predict.metric
({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.order
(tuple) — The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters. d must be an integer indicating the integration order of the process, while p and q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. Default is an AR(1) model: (1,0,0).seasonal_order
(tuple) — The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity. D must be an integer indicating the integration order of the process, while P and Q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. s is an integer giving the periodicity (number of periods in season), often it is 4 for quarterly data or 12 for monthly data. Default is no seasonal effect.trend
(str {‘n’,’c’,’t’,’ct’}) — Parameter controlling the deterministic trend polynomial A(t). Can be specified as a string where ‘c’ indicates a constant (i.e. a degree zero component of the trend polynomial), ‘t’ indicates a linear trend with time, and ‘ct’ is both. Can also be specified as an iterable defining the non-zero polynomial exponents to include, in increasing order. For example, [1,1,0,1] denotes a+bt+ct3. Default is to not include a trend component.alpha
(float, default 0.05) — The significance level for the confidence interval. The default alpha = .05 returns a 95% confidence interval.exog
(np.ndarray, pd.Series, pd.DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asy
and should be aligned so that y[i] is regressed on exog[i].sarimax_kwargs
(dict, default `{}`) — Additional keyword arguments passed to SARIMAX constructor. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAXfit_kwargs
(dict, default `{'disp':0}`) — Additional keyword arguments passed to SARIMAX fit. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.fit.html#statsmodels.tsa.statespace.sarimax.SARIMAX.fitverbose
(bool, default `False`) — Print number of folds used for backtesting.
Value of the metric.
test_predictions: 1D np.ndarray 2D np.ndarray with predicted value and their estimated interval. Column 0 = predictions Column 1 = lower bound interval Column 2 = upper bound interval.
skforecast.model_selection_statsmodels.
cv_sarimax_statsmodels
(
y
, initial_train_size
, steps
, metric
, order=(1, 0, 0)
, seasonal_order=(0, 0, 0, 0)
, trend=None
, alpha=0.05
, exog=None
, allow_incomplete_fold=True
, sarimax_kwargs={}
, fit_kwargs={'disp': 0}
, verbose=False
)
Cross-validation of SARIMAX
model from statsmodels v0.12. The order of data
is maintained and the training set increases in each iteration.
y
(1D np.ndarray, pd.Series) — Training time series values.initial_train_size
(int) — Number of samples in the initial train split.steps
(int) — Number of steps to predict.metric
({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.order
(tuple) — The (p,d,q) order of the model for the number of AR parameters, differences, and MA parameters. d must be an integer indicating the integration order of the process, while p and q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. Default is an AR(1) model: (1,0,0).seasonal_order
(tuple) — The (P,D,Q,s) order of the seasonal component of the model for the AR parameters, differences, MA parameters, and periodicity. D must be an integer indicating the integration order of the process, while P and Q may either be an integers indicating the AR and MA orders (so that all lags up to those orders are included) or else iterables giving specific AR and / or MA lags to include. s is an integer giving the periodicity (number of periods in season), often it is 4 for quarterly data or 12 for monthly data. Default is no seasonal effect.trend
(str {‘n’,’c’,’t’,’ct’}) — Parameter controlling the deterministic trend polynomial A(t). Can be specified as a string where ‘c’ indicates a constant (i.e. a degree zero component of the trend polynomial), ‘t’ indicates a linear trend with time, and ‘ct’ is both. Can also be specified as an iterable defining the non-zero polynomial exponents to include, in increasing order. For example, [1,1,0,1] denotes a+bt+ct3. Default is to not include a trend component.alpha
(float, default 0.05) — The significance level for the confidence interval. The default alpha = .05 returns a 95% confidence interval.exog
(np.ndarray, pd.Series, pd.DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asy
and should be aligned so that y[i] is regressed on exog[i].sarimax_kwargs
(dict, default {}) — Additional keyword arguments passed to SARIMAX initialization. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAXfit_kwargs
(dict, default `{'disp':0}`) — Additional keyword arguments passed to SARIMAX fit. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.fit.html#statsmodels.tsa.statespace.sarimax.SARIMAX.fitverbose
(bool, default `False`) — Print number of folds used for cross-validation.
Value of the metric for each partition.
redictions: np.ndarray 2D np.ndarray with predicted value and their estimated interval. Column 0 = predictions Column 1 = lower bound interval Column 2 = upper bound interval.
skforecast.model_selection_statsmodels.
grid_search_sarimax_statsmodels
(
y
, param_grid
, initial_train_size
, steps
, metric
, exog=None
, method='cv'
, allow_incomplete_fold=True
, sarimax_kwargs={}
, fit_kwargs={'disp': 0}
, verbose=False
)
Exhaustive search over specified parameter values for a SARIMAX
model from
statsmodels v0.12. Validation is done using time series cross-validation or
backtesting.
y
(1D np.ndarray, pd.Series) — Training time series values.param_grid
(dict) — Dictionary with parameters names (str
) as keys and lists of parameter settings to try as values. Allowed parameters in the grid are: order, seasonal_order and trend.initial_train_size
(int) — Number of samples in the initial train split.steps
(int) — Number of steps to predict.metric
({'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error'}) — Metric used to quantify the goodness of fit of the model.exog
(np.ndarray, pd.Series, pd.DataFrame, default `None`) — Exogenous variable/s included as predictor/s. Must have the same number of observations asy
and should be aligned so that y[i] is regressed on exog[i].method
({'cv', 'backtesting'}) — Method used to estimate the metric for each parameter combination. 'cv' for time series crosvalidation and 'backtesting' for simple backtesting. 'backtesting' is much faster since the model is fitted only once.allow_incomplete_fold
(bool, default `True`) — The last test set is allowed to be incomplete if it does not reachsteps
observations. Otherwise, the latest observations are discarded.sarimax_kwargs
(dict, default `{}`) — Additional keyword arguments passed to SARIMAX initialization. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html#statsmodels.tsa.statespace.sarimax.SARIMAXfit_kwargs
(dict, default `{'disp':0}`) — Additional keyword arguments passed to SARIMAX fit. See more in https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.fit.html#statsmodels.tsa.statespace.sarimax.SARIMAX.fitverbose
(bool, default `True`) — Print number of folds used for cv or backtesting.return_best
(bool) — Refit theforecaster
using the best found parameters on the whole data.
Metric value estimated for each combination of parameters.