Forecasting with statistical models¶
Statistical models such as ARIMA, SARIMAX, and Exponential Smoothing have served as the backbone of time series analysis for decades. While machine learning models typically frame forecasting as a tabular regression problem (mapping
Integrating these traditional methods into the forecasting workflow offers several key advantages:
Robustness with limited data: Statistical methods often outperform machine learning when historical data is scarce or when the series exhibits a highly stable, dominant seasonal pattern.
Interpretability and simplicity: With fewer parameters and well-understood linear relationships, these models are often easier to explain to stakeholders than "black-box" ensembles.
Rigorous benchmarking: They provide a critical baseline. Before deploying a complex Gradient Boosting Machine (GBM), it is essential to verify if the added complexity provides significant value over a simpler, traditional approach.
Probabilistic foundations: Many statistical models are inherently probabilistic, providing reliable confidence intervals and uncertainty quantification natively.
One of the primary challenges in the Python ecosystem is that statistical libraries often lack a unified API. Unlike the scikit-learn ecosystem, each library typically follows its own convention, making it difficult to switch between models or integrate them into a standardized pipeline. skforecast bridges this gap in two ways:
Native scikit-learn compatible implementations: provide efficient implementations of
Arima,Auto Arima,SarimaX,Ets,Auto Ets, andARARmodels that strictly follow the scikit-learn API.The
ForecasterStatsclass: This specialized class acts as a wrapper, allowing the use of one or multiple statistical models while leveraging the full power of the skforecast ecosystem including: backtesting, hyperparameter search, and exogenous variable integration. In addition to the native implementations, ForecasterStats is compatible with other libraries such as statsmodels, sktime and aeon.
💡 Tip
To learn more about modeling time series with statistical models, visit our examples:
The following sections focus on how to use ForecasterStats to train an ARIMA model and predict future values.
ForecasterStats¶
The ForecasterStats class allows for the training and validation of statistical models using the skforecast API. It is compatible with several statistical models from different libraries:
skforecast Arima, Auto Arima, Sarimax, Ets, Auto Ets, Arar
sktime ARIMA
Aeon ARIMA
⚠ Warning
Since version 0.19.0 of skforecast, the class ForecasterSarimax has been deprecated in favor of the new class ForecasterStats, which provides improved performance and broader compatibility with statistical models. Users are encouraged to transition to ForecasterStats for future projects.
Libraries and data¶
# Libraries
# ==============================================================================
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics import mean_absolute_error
from skforecast.datasets import fetch_dataset
from skforecast.stats import Arima, Ets, Arar
from skforecast.recursive import ForecasterStats
from skforecast.model_selection import TimeSeriesFold, backtesting_stats, grid_search_stats
from skforecast.plot import set_dark_theme, plot_prediction_intervals
import warnings
set_dark_theme()
warnings.filterwarnings('once')
warnings.filterwarnings('ignore', category=DeprecationWarning)
# Download data
# ==============================================================================
data = fetch_dataset(name='fuel_consumption', raw=False)
data = data.loc[:'1990-01-01 00:00:00', 'Gasolinas']
data.name = 'y'
data.index.name = 'datetime'
data
╭──────────────────────────────── fuel_consumption ────────────────────────────────╮ │ Description: │ │ Monthly fuel consumption in Spain from 1969-01-01 to 2022-08-01. │ │ │ │ Source: │ │ Obtained from Corporación de Reservas Estratégicas de Productos Petrolíferos and │ │ Corporación de Derecho Público tutelada por el Ministerio para la Transición │ │ Ecológica y el Reto Demográfico. https://www.cores.es/es/estadisticas │ │ │ │ URL: │ │ https://raw.githubusercontent.com/skforecast/skforecast- │ │ datasets/main/data/consumos-combustibles-mensual.csv │ │ │ │ Shape: 644 rows x 5 columns │ ╰──────────────────────────────────────────────────────────────────────────────────╯
datetime
1969-01-01 166875.2129
1969-02-01 155466.8105
1969-03-01 184983.6699
1969-04-01 202319.8164
1969-05-01 206259.1523
...
1989-09-01 687649.2852
1989-10-01 669889.1602
1989-11-01 601413.8867
1989-12-01 663568.1055
1990-01-01 610241.2461
Freq: MS, Name: y, Length: 253, dtype: float64
# Split data in train-test partitions
# ======================================================================================
end_train = '1983-01-01 23:59:59'
print(
f"Train dates : {data.index.min()} --- {data.loc[:end_train].index.max()} "
f"(n={len(data.loc[:end_train])})"
)
print(
f"Test dates : {data.loc[end_train:].index.min()} --- {data.loc[:].index.max()} "
f"(n={len(data.loc[end_train:])})"
)
data_train = data.loc[:end_train]
data_test = data.loc[end_train:]
# Plot
# ======================================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data_train.plot(ax=ax, label='train')
data_test.plot(ax=ax, label='test')
ax.set_title('Monthly fuel consumption in Spain')
ax.legend();
Train dates : 1969-01-01 00:00:00 --- 1983-01-01 00:00:00 (n=169) Test dates : 1983-02-01 00:00:00 --- 1990-01-01 00:00:00 (n=84)
Training¶
An ARIMA model is used as estimator within a ForecasterStats object, allowing us to take advantage of the skforecast functionalities such as backtesting and hyperparameter tuning.
💡 Tip
Skforecast's ARIMA implementation is optimized for speed using Just-In-Time (JIT) compilation with Numba. Consequently, the initial fit of the model is slower due to the overhead of the compilation process; however, subsequent fits and predictions are significantly faster than those of other libraries.
# Create and fit ForecasterStats with an ARIMA estimator
# ==============================================================================
forecaster = ForecasterStats(
estimator=Arima(order=(1, 1, 1), seasonal_order=(1, 1, 1), m=12),
)
forecaster.fit(y=data_train, suppress_warnings=True)
forecaster
ForecasterStats
General Information
- Estimators:
- skforecast.Arima: Arima(1,1,1)(1,1,1)[12]
- Window size: 1
- Series name: y
- Exogenous included: False
- Creation date: 2026-01-31 19:58:55
- Last fit date: 2026-01-31 19:58:56
- Skforecast version: 0.20.0
- Python version: 3.13.11
- Forecaster id: None
Exogenous Variables
-
None
Data Transformations
- Transformer for y: None
- Transformer for exog: None
Training Information
- Training range: [Timestamp('1969-01-01 00:00:00'), Timestamp('1983-01-01 00:00:00')]
- Training index type: DatetimeIndex
- Training index frequency: MS
Estimator Parameters
- skforecast.Arima: {'order': (1, 1, 1), 'seasonal_order': (1, 1, 1), 'm': 12, 'include_mean': True, 'transform_pars': True, 'method': 'CSS-ML', 'n_cond': None, 'SSinit': 'Gardner1980', 'optim_method': 'BFGS', 'optim_kwargs': {'maxiter': 1000}, 'kappa': 1000000.0}
Fit Kwargs
-
None
The suppress_warnings=True parameter silences convergence warnings and other non-critical messages that may appear during model fitting, keeping the output clean.
Prediction¶
# Predictions
# ==============================================================================
steps = len(data_test)
predictions = forecaster.predict(steps=steps)
predictions.head(3)
1983-02-01 414755.657086 1983-03-01 465102.493791 1983-04-01 504458.858737 Freq: MS, Name: pred, dtype: float64
# Plot predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 3))
data_train.plot(ax=ax, label='train')
data_test.plot(ax=ax, label='test')
predictions.plot(ax=ax, label='predictions')
ax.legend();
# Prediction error
# ==============================================================================
error_mae = mean_absolute_error(
y_true = data_test,
y_pred = predictions
)
print(f"Test error (mae): {error_mae}")
Test error (mae): 59203.058715250874
Prediction intervals¶
The method predict_interval enables the calculation of prediction intervals for the forecasted values. Users can specify the confidence level using:
alpha: The significance level (e.g.,alpha=0.05produces a 95% confidence interval)interval: The confidence level directly (e.g.,interval=[2.5, 97.5]for the same 95% interval)
# Prediction intervals
# ==============================================================================
predictions = forecaster.predict_interval(steps=steps, alpha=0.05)
predictions.head(5)
| pred | lower_bound | upper_bound | |
|---|---|---|---|
| 1983-02-01 | 414755.657086 | 383541.980897 | 445969.333274 |
| 1983-03-01 | 465102.493791 | 431519.674752 | 498685.312829 |
| 1983-04-01 | 504458.858737 | 469463.167098 | 539454.550376 |
| 1983-05-01 | 472054.363692 | 435480.362414 | 508628.364969 |
| 1983-06-01 | 494879.128788 | 456858.949999 | 532899.307577 |
# Plot prediction intervals
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 3))
plot_prediction_intervals(
predictions = predictions,
y_true = data_test,
target_variable = "y",
title = "Prediction intervals in test data",
kwargs_fill_between = {'color': 'white', 'alpha': 0.3, 'zorder': 1},
ax = ax
)
Feature importances¶
For the ARIMA model, the method get_feature_importances returns the estimated coefficients of the model (AR, MA, and seasonal components). These coefficients indicate the weight assigned to each lag term in the model's equations.
# Feature importances
# ==============================================================================
forecaster.get_feature_importances()
| feature | importance | |
|---|---|---|
| 0 | sar1 | -0.138924 |
| 1 | ar1 | -0.308182 |
| 2 | sma1 | -0.558172 |
| 3 | ma1 | -0.558438 |
Backtesting¶
One advantage of encapsulating statistical models within the ForecasterStats class is that they can be evaluated using any of the backtesting strategies implemented in skforecast.
✏️ Note
Why do statistical models require refitting during backtesting?
Unlike machine learning models, statistical models like ARIMA maintain an internal state that depends on the sequence of observations. They can only generate predictions starting from the last observed time step — they cannot "jump" to an arbitrary point in the future without knowing all previous values.
During backtesting, when the validation window moves forward, the model must be refitted to incorporate the new observations and update its internal state. This is why refit=True is typically required.
Performance optimization: Because refitting is mandatory, skforecast's Numba-optimized backend becomes essential. It enables hundreds of refits during backtesting in a fraction of the time required by non-optimized libraries.
Exception: The skforecast.stats.Sarimax model implements an efficient state-space representation that allows updating predictions without full model refitting.
# Backtest forecaster
# ==============================================================================
forecaster = ForecasterStats(
estimator=Arima(order=(1, 1, 1), seasonal_order=(1, 1, 1), m=12)
)
cv = TimeSeriesFold(
steps = 12, # predict 12 months per fold
initial_train_size = len(data_train),
refit = True,
fixed_train_size = False,
)
metric, predictions = backtesting_stats(
forecaster = forecaster,
y = data,
cv = cv,
metric = 'mean_absolute_error',
n_jobs = 'auto',
suppress_warnings = True,
verbose = True,
show_progress = True
)
Information of folds
--------------------
Number of observations used for initial training: 169
Number of observations used for backtesting: 84
Number of folds: 7
Number skipped folds: 0
Number of steps per fold: 12
Number of steps to exclude between last observed data (last window) and predictions (gap): 0
Fold: 0
Training: 1969-01-01 00:00:00 -- 1983-01-01 00:00:00 (n=169)
Validation: 1983-02-01 00:00:00 -- 1984-01-01 00:00:00 (n=12)
Fold: 1
Training: 1969-01-01 00:00:00 -- 1984-01-01 00:00:00 (n=181)
Validation: 1984-02-01 00:00:00 -- 1985-01-01 00:00:00 (n=12)
Fold: 2
Training: 1969-01-01 00:00:00 -- 1985-01-01 00:00:00 (n=193)
Validation: 1985-02-01 00:00:00 -- 1986-01-01 00:00:00 (n=12)
Fold: 3
Training: 1969-01-01 00:00:00 -- 1986-01-01 00:00:00 (n=205)
Validation: 1986-02-01 00:00:00 -- 1987-01-01 00:00:00 (n=12)
Fold: 4
Training: 1969-01-01 00:00:00 -- 1987-01-01 00:00:00 (n=217)
Validation: 1987-02-01 00:00:00 -- 1988-01-01 00:00:00 (n=12)
Fold: 5
Training: 1969-01-01 00:00:00 -- 1988-01-01 00:00:00 (n=229)
Validation: 1988-02-01 00:00:00 -- 1989-01-01 00:00:00 (n=12)
Fold: 6
Training: 1969-01-01 00:00:00 -- 1989-01-01 00:00:00 (n=241)
Validation: 1989-02-01 00:00:00 -- 1990-01-01 00:00:00 (n=12)
0%| | 0/7 [00:00<?, ?it/s]
# Backtest predictions
# ==============================================================================
predictions.head(4)
| fold | pred | |
|---|---|---|
| 1983-02-01 | 0 | 414755.657086 |
| 1983-03-01 | 0 | 465102.493791 |
| 1983-04-01 | 0 | 504458.858737 |
| 1983-05-01 | 0 | 472054.363692 |
# Bactesting metrics
# ==============================================================================
metric
| mean_absolute_error | |
|---|---|
| 0 | 25776.610245 |
# Plot backtest predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 3))
data.loc[end_train:].plot(ax=ax, label='test')
predictions['pred'].plot(ax=ax)
ax.set_title('Backtest predictions with ARIMA model')
ax.legend();
The freeze_params argument¶
For models that perform automatic selection (AutoArima, AutoETS), re-running the selection procedure in every backtesting fold can be computationally expensive. The freeze_params argument in the backtesting_stats function controls this behavior:
freeze_params = True(default): The model parameters found during the first fit (e.g.,orderandseasonal_orderfor ARIMA) are reused in all subsequent refits. Only the coefficients are re-estimated with new data. This significantly reduces runtime.freeze_params = False: Automatic model selection is performed independently in each refit, allowing parameters to adapt across folds. This increases runtime but may capture structural changes in the data. Aparamscolumn is added to the output showing the parameters selected per fold.
Model tuning¶
To find the optimal hyperparameters for statistical models, two approaches are widely used:
Statistical Criteria: Information criteria, such as Akaike's Information Criterion (AIC) or Bayesian Information Criterion (BIC), apply different penalties to the maximum likelihood (log-likelihood) estimate to measure model fit. The advantage of these criteria is that they are computed solely on the training data, eliminating the need for out-of-sample predictions. Consequently, the optimization process is significantly accelerated. The well-known AutoArima and AutoETS algorithms use this approach.
Validation Techniques: Utilizing validation techniques, backtesting, is another effective strategy. Backtesting evaluates model performance using independent partitions of historical data to simulate real-world conditions. This provides a practical assessment of how hyperparameters perform across different time windows.
In the first approach, calculations are based solely on training data, eliminating the need for a separate data partition. This makes the optimization process extremely efficient. However, it is important to note that information criteria only measure the relative quality of models within the defined search space. A model with the lowest AIC could still be a poor fit in absolute terms. Therefore, the selected model should ideally undergo a backtesting phase. This phase calculates forecast error metrics (such as MAE, MSE, or MAPE) to validate performance on a meaningful, interpretable scale.
AutoArima¶
AutoArima is an algorithm designed to automate the selection of optimal hyperparameters for an ARIMA model. The algorithm systematically evaluates various combinations of non-seasonal parameters (
Skforecast's Arima class triggers the AutoArima functionality whenever the order or seasonal_order arguments are set to None. After the model is fitted, the optimal parameters can be accessed via the best_params_ attribute.
For all subsequent predictions, the model automatically utilizes the optimal configuration identified during the fitting process. This functionality is also available in the Ets when model is set to None or to 'ZZZ'.
# Skforecast Auto Arima
# ==============================================================================
auto_arima = Arima(
order = None, # Must be None to use AutoArima
seasonal_order = None, # Must be None to use AutoArima
start_p = 0,
start_q = 0,
max_p = 12,
max_q = 3,
max_P = 2,
max_Q = 2,
max_order = 5,
max_d = 2,
max_D = 1,
ic = "aic",
m = 12,
trace = True, # True for detailed information of the process
)
forecaster = ForecasterStats(estimator=auto_arima)
forecaster.fit(y=data_train, suppress_warnings=True)
Fitting models using approximations... ARIMA(p,d,q)(P,D,Q)[m] : aic ------------------------------------------------ ARIMA(0,1,0)(1,1,1)[12] : 3564.3887 ARIMA(0,1,0)(0,1,0)[12] : 3637.1363 ARIMA(1,1,0)(1,1,0)[12] : 3506.9117 ARIMA(0,1,1)(0,1,1)[12] : 3481.3432 ARIMA(0,1,1)(0,1,0)[12] : 3542.1014 ARIMA(0,1,1)(1,1,1)[12] : 3482.0043 ARIMA(0,1,1)(0,1,2)[12] : 3481.2735 ARIMA(0,1,1)(1,1,2)[12] : 3482.5762 ARIMA(0,1,0)(0,1,2)[12] : 3563.7959 ARIMA(1,1,1)(0,1,2)[12] : 3476.4900 ARIMA(1,1,1)(0,1,1)[12] : 3476.4887 ARIMA(1,1,1)(0,1,0)[12] : 3531.8215 ARIMA(1,1,1)(1,1,1)[12] : 3477.1082 ARIMA(1,1,1)(1,1,0)[12] : 3493.3277 ARIMA(1,1,1)(1,1,2)[12] : 3478.4277 ARIMA(1,1,0)(0,1,1)[12] : 3491.3027 ARIMA(2,1,1)(0,1,1)[12] : 3477.5383 ARIMA(1,1,2)(0,1,1)[12] : 3477.8529 ARIMA(0,1,0)(0,1,1)[12] : 3563.8825 ARIMA(0,1,2)(0,1,1)[12] : 3478.2811 ARIMA(2,1,0)(0,1,1)[12] : 3482.5359 ARIMA(2,1,2)(0,1,1)[12] : 3479.4579 Now re-fitting the best model(s) without approximations... ARIMA(1,1,1)(0,1,1)[12] : 3476.4887 Best model found: ARIMA(1,1,1)(0,1,1)[12] with aic: 3476.4886775085856
# Access the optimal parameters selected by AutoArima
# ==============================================================================
print(f"Best parameters: {forecaster.estimators_[0].best_params_}")
Best parameters: {'order': (1, 1, 1), 'seasonal_order': (0, 1, 1), 'm': 12}
# Predictions
# ==============================================================================
steps = len(data_test)
predictions = forecaster.predict(steps=steps)
predictions.head(3)
# Prediction error
# ==============================================================================
error_mae = mean_absolute_error(
y_true = data_test,
y_pred = predictions
)
print(f"Test error (mae): {error_mae}")
Test error (mae): 53517.26243040556
⚠ Warning
When evaluating ARIMA-SARIMAX models, it is important to note that AIC assumes that all models are trained on the same data. Thus, using AIC to decide between different orders of differencing is technically invalid, since one data point is lost with each order of differencing. Therefore, the Auto Arima algorithm uses a unit root test to select the order of differencing, and only uses the AIC to select the order of the AR and MA components.
For a detailed explanation of Akaike's Information Criterion (AIC) see Rob J Hyndman's blog and AIC Myths and Misunderstandings by Anderson and Burnham.
Grid search with backtesting¶
Although less common in the context of statistical models, it is also possible to use grid search and random search to find optimal hyperparameters. It is crucial to conduct the search using a validation dataset, rather than the test dataset, to ensure an accurate and unbiased evaluation of model performance.
The training partition is split again to separate a validation partition, which is used for the hyperparameter search. The test partition remains untouched and is reserved exclusively for the final evaluation of the model's performance.
# Split Train-validation-test data
# ======================================================================================
end_train = '1979-01-01 23:59:59'
end_val = '1983-01-01 23:59:59'
print(
f"Train dates : {data.index.min()} --- {data.loc[:end_train].index.max()} "
f"(n={len(data.loc[:end_train])})"
)
print(
f"Validation dates : {data.loc[end_train:].index.min()} --- {data.loc[:end_val].index.max()} "
f"(n={len(data.loc[end_train:end_val])})"
)
print(
f"Test dates : {data.loc[end_val:].index.min()} --- {data.index.max()} "
f"(n={len(data.loc[end_val:])})"
)
Train dates : 1969-01-01 00:00:00 --- 1979-01-01 00:00:00 (n=121) Validation dates : 1979-02-01 00:00:00 --- 1983-01-01 00:00:00 (n=48) Test dates : 1983-02-01 00:00:00 --- 1990-01-01 00:00:00 (n=84)
# Grid search hyperparameter
# ======================================================================================
forecaster = ForecasterStats(
estimator=Arima(order=(1, 0, 0)) # Placeholder replaced in the grid search
)
param_grid = {
'order': [(0, 1, 0), (0, 1, 1), (1, 1, 0), (1, 1, 1), (2, 1, 1)],
'seasonal_order': [(0, 0, 0,), (0, 1, 0), (1, 1, 1)],
'm': [1, 12]
}
cv = TimeSeriesFold(steps=12, initial_train_size=len(data.loc[:end_train]))
results_grid = grid_search_stats(
forecaster = forecaster,
y = data.loc[:end_val],
param_grid = param_grid,
cv = cv,
metric = 'mean_absolute_error',
return_best = True,
n_jobs = 'auto',
suppress_warnings = True,
verbose = False,
show_progress = True
)
results_grid.head(5)
Number of models compared: 30.
params grid: 0%| | 0/30 [00:00<?, ?it/s]
`Forecaster` refitted using the best-found parameters, and the whole data set:
Parameters: {'m': 12, 'order': (0, 1, 0), 'seasonal_order': (1, 1, 1)}
Backtesting metric: 17746.438118342612
| params | mean_absolute_error | m | order | seasonal_order | |
|---|---|---|---|---|---|
| 0 | {'m': 12, 'order': (0, 1, 0), 'seasonal_order'... | 17746.438118 | 12 | (0, 1, 0) | (1, 1, 1) |
| 1 | {'m': 12, 'order': (1, 1, 0), 'seasonal_order'... | 18100.683177 | 12 | (1, 1, 0) | (1, 1, 1) |
| 2 | {'m': 12, 'order': (1, 1, 1), 'seasonal_order'... | 18591.412656 | 12 | (1, 1, 1) | (1, 1, 1) |
| 3 | {'m': 12, 'order': (0, 1, 1), 'seasonal_order'... | 19638.353224 | 12 | (0, 1, 1) | (1, 1, 1) |
| 4 | {'m': 12, 'order': (2, 1, 1), 'seasonal_order'... | 21982.623994 | 12 | (2, 1, 1) | (1, 1, 1) |
Since return_best = True, the Forecaster object is updated with the most optimal configuration found and trained with the entire dataset.
# Predictions
# ==============================================================================
steps = len(data_test)
predictions = forecaster.predict(steps=steps)
predictions.head(3)
# Prediction error
# ==============================================================================
error_mae = mean_absolute_error(
y_true = data_test,
y_pred = predictions
)
print(f"Test error (mae): {error_mae}")
Test error (mae): 53646.64174573329
Multiple estimators¶
ForecasterStats supports multiple statistical estimators, allowing users to train, predict, and compare several models within a single workflow. This guarantees that all models operate on the same data partitions and time indices, making comparisons fair and reproducible.
Furthermore, combining predictions from multiple models (ensembling) is a well-established technique that often improves forecasting accuracy compared to relying on a single model.
# Forecaster with multiple statistical models as estimators
# ==============================================================================
estimators = [
Arima(order=(1, 1, 1), seasonal_order=(1, 1, 1), m=12),
Ets(m=12, model="MAM"),
Arar()
]
forecaster = ForecasterStats(estimator=estimators)
forecaster.fit(y=data_train, suppress_warnings=True)
forecaster
ForecasterStats
General Information
- Estimators:
- skforecast.Arima: Arima(1,1,1)(1,1,1)[12]
- skforecast.Ets: Ets(MAM)
- skforecast.Arar: Arar(lags=(1, 2, 12, 13))
- Window size: 1
- Series name: y
- Exogenous included: False
- Creation date: 2026-01-31 19:59:09
- Last fit date: 2026-01-31 19:59:13
- Skforecast version: 0.20.0
- Python version: 3.13.11
- Forecaster id: None
Exogenous Variables
-
None
Data Transformations
- Transformer for y: None
- Transformer for exog: None
Training Information
- Training range: [Timestamp('1969-01-01 00:00:00'), Timestamp('1983-01-01 00:00:00')]
- Training index type: DatetimeIndex
- Training index frequency: MS
Estimator Parameters
- skforecast.Arima: {'order': (1, 1, 1), 'seasonal_order': (1, 1, 1), 'm': 12, 'include_mean': True, 'transform_pars': True, 'method': 'CSS-ML', 'n_cond': None, 'SSinit': 'Gardner1980', 'optim_method': 'BFGS', 'optim_kwargs': {'maxiter': 1000}, 'kappa': 1000000.0}
- skforecast.Ets: {'m': 12, 'model': 'MAM', 'damped': None, 'alpha': None, 'beta': None, 'gamma': None, 'phi': None, 'seasonal': True, 'trend': None, 'allow_multiplicative': True, 'allow_multiplicative_trend': False}
- skforecast.Arar: {'max_ar_depth': 26, 'max_lag': 40, 'safe': True}
Fit Kwargs
-
None
# Prediction with multiple estimators
# ==============================================================================
predictions = forecaster.predict(steps=len(data_test))
predictions.head(6)
| estimator_id | pred | |
|---|---|---|
| 1983-02-01 | skforecast.Arima | 414755.657086 |
| 1983-02-01 | skforecast.Ets | 408101.282195 |
| 1983-02-01 | skforecast.Arar | 409292.695102 |
| 1983-03-01 | skforecast.Arima | 465102.493791 |
| 1983-03-01 | skforecast.Ets | 468889.543146 |
| 1983-03-01 | skforecast.Arar | 474934.656818 |
# Plot predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data_train.plot(ax=ax, label='train')
data_test.plot(ax=ax, label='test')
(
predictions
.reset_index()
.pivot(index='index', columns='estimator_id', values='pred')
.plot(ax=ax)
)
ax.legend();
ForecasterStats assigns a unique identifier and a name to each model passed as an estimator. While the IDs remain constant to ensure consistent tracking, if AutoArima or AutoETS is triggered, the names are automatically updated after fitting to reflect the specific parameters of the selected estimators.
# Access estimators information
# ==============================================================================
print(f"Estimators : {forecaster.estimators_}")
print(f"Estimators id : {forecaster.estimator_ids}")
print(f"Estimators names : {forecaster.estimator_names_}")
Estimators : [Arima(1,1,1)(1,1,1)[12], Ets(model=MAM, m=12, damped=None, alpha=None, beta=None, gamma=None, phi=None, lambda_param=None, lambda_auto=False, bias_adjust=True, bounds='both', seasonal=True, trend=None, ic='aicc', allow_multiplicative=True, allow_multiplicative_trend=False), Arar(lags=(1, 2, 12, 13))] Estimators id : ['skforecast.Arima', 'skforecast.Ets', 'skforecast.Arar'] Estimators names : ['Arima(1,1,1)(1,1,1)[12]', 'Ets(MAM)', 'Arar(lags=(1, 2, 12, 13))']
A detailed summary of all estimators included in the forecaster can be accessed using the get_estimators_info() method.
# Information of the estimators included in the forecaster
# ==============================================================================
forecaster.get_estimators_info()
| id | name | type | supports_exog | supports_interval | params | |
|---|---|---|---|---|---|---|
| 0 | skforecast.Arima | Arima(1,1,1)(1,1,1)[12] | skforecast.stats._arima.Arima | True | True | {'order': (1, 1, 1), 'seasonal_order': (1, 1, ... |
| 1 | skforecast.Ets | Ets(MAM) | skforecast.stats._ets.Ets | False | True | {'m': 12, 'model': 'MAM', 'damped': None, 'alp... |
| 2 | skforecast.Arar | Arar(lags=(1, 2, 12, 13)) | skforecast.stats._arar.Arar | True | True | {'max_ar_depth': 26, 'max_lag': 40, 'safe': True} |
Other methods that allow for the access and manipulation of estimators include get_estimator(), remove_estimators(), and set_params(). Visit ForecasterStats API.
Exogenous variables¶
Exogenous variables can be incorporated into the model using the exog argument. The primary requirement for including an exogenous variable is that its values must be known for both the training period and the entire forecast horizon.
💡 Tip
To learn more about exogenous variables and how to correctly manage them with skforecast visit: Exogenous variables (features) user guide.
# Create calendar features to be used as exogenous variables
# ==============================================================================
data_exog = pd.DataFrame(
{'y': data.to_numpy(), 'month': data.index.month},
index = data.index
)
end_train = '1980-01-01 23:59:59'
data_exog_train = data_exog.loc[:end_train]
data_exog_test = data_exog.loc[end_train:]
data_exog.head()
| y | month | |
|---|---|---|
| datetime | ||
| 1969-01-01 | 166875.2129 | 1 |
| 1969-02-01 | 155466.8105 | 2 |
| 1969-03-01 | 184983.6699 | 3 |
| 1969-04-01 | 202319.8164 | 4 |
| 1969-05-01 | 206259.1523 | 5 |
# Create and fit ForecasterStats with exogenous variables
# ==============================================================================
forecaster = ForecasterStats(
estimator=Arima(order=(1, 1, 1), seasonal_order=(1, 1, 1), m=12),
)
forecaster.fit(
y = data_exog_train['y'],
exog = data_exog_train['month'],
suppress_warnings = True
)
# Predict with exog
# ==============================================================================
predictions = forecaster.predict(
steps = 36,
exog = data_exog_test['month']
)
predictions.head(3)
1980-02-01 422825.805875 1980-03-01 486540.206317 1980-04-01 487971.238942 Freq: MS, Name: pred, dtype: float64
Prediction on training data (In-sample Predictions)¶
In-sample predictions are crucial for evaluating the accuracy and effectiveness of the model. By comparing the predicted values with the actual observed values in the training dataset, you can assess how well the model has learned the underlying patterns and trends in the data. This comparison helps in understanding the model's performance and identify areas where it may need improvement or adjustment. In essence, they act as a mirror, reflecting how the model interprets and reconstructs the historical data on which it was trained.
Predictions of the observations used to fit the model are stored in the fitted_values_ attribute of the Arima object.
# Create and fit ForecasterStats (skforecast)
# ==============================================================================
forecaster = ForecasterStats(
estimator=Arima(order=(1, 1, 1), seasonal_order=(1, 1, 1), m=12),
)
forecaster.fit(y=data_train, suppress_warnings=True)
# In-sample Predictions
# ==============================================================================
# Show only the first 5 values
forecaster.estimators_[0].fitted_values_[:5]
array([166778.86748562, 155441.28198886, 184949.38956579, 202282.72293861,
206228.03030826])