Backtesting SARIMAX and ARIMA models¶
SARIMAX (Seasonal Auto-Regressive Integrated Moving Average with eXogenous factors) is a generalization of the ARIMA model that allows incorporating seasonality and exogenous variables. This model has a total of 6 hyperparameters that must be specified when training the model:
p: Trend autoregression order.
d: Trend difference order.
q: Trend moving average order.
P: Seasonal autoregressive order.
D: Seasonal difference order.
Q: Seasonal moving average order.
m: The number of time steps for a single seasonal period.
The backtesting_sarimax
function of the skforecast.model_selection_statsmodels module is a wrapper that allows backtesting SARIMAX models.
Libraries¶
In [1]:
Copied!
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.model_selection_statsmodels import backtesting_sarimax
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.model_selection_statsmodels import backtesting_sarimax
c:\Users\jaesc2\Miniconda3\envs\skforecast\lib\site-packages\statsmodels\compat\pandas.py:61: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead. from pandas import Int64Index as NumericIndex
Data¶
In [2]:
Copied!
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'datetime'])
# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data['y']
data = data.sort_index()
# Split data in train snd backtest
# ==============================================================================
n_backtest = 36*3 # Last 9 years are used for backtest
data_train = data[:-n_backtest]
data_backtest = data[-n_backtest:]
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'datetime'])
# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data['y']
data = data.sort_index()
# Split data in train snd backtest
# ==============================================================================
n_backtest = 36*3 # Last 9 years are used for backtest
data_train = data[:-n_backtest]
data_backtest = data[-n_backtest:]
Backtest¶
In [3]:
Copied!
import warnings
warnings.filterwarnings('ignore')
import warnings
warnings.filterwarnings('ignore')
In [4]:
Copied!
metric, predictions_backtest = backtesting_sarimax(
y = data,
order = (12, 1, 1),
seasonal_order = (0, 0, 0, 0),
initial_train_size = len(data_train),
fixed_train_size = False,
steps = 7,
metric = 'mean_absolute_error',
refit = False,
verbose = True,
fit_kwargs = {'maxiter': 250, 'disp': 0},
)
metric, predictions_backtest = backtesting_sarimax(
y = data,
order = (12, 1, 1),
seasonal_order = (0, 0, 0, 0),
initial_train_size = len(data_train),
fixed_train_size = False,
steps = 7,
metric = 'mean_absolute_error',
refit = False,
verbose = True,
fit_kwargs = {'maxiter': 250, 'disp': 0},
)
Number of observations used for training: 96 Number of observations used for backtesting: 108 Number of folds: 16 Number of steps per fold: 7 Last fold only includes 3 observations.
In [5]:
Copied!
print(f"Error backtest: {metric}")
print(f"Error backtest: {metric}")
Error backtest: 0.055444446244429825
In [6]:
Copied!
predictions_backtest.head(4)
predictions_backtest.head(4)
Out[6]:
predicted_mean | lower y | upper y | |
---|---|---|---|
1999-07-01 | 0.734647 | 0.650016 | 0.819278 |
1999-08-01 | 0.751779 | 0.660625 | 0.842933 |
1999-09-01 | 0.865059 | 0.767857 | 0.962262 |
1999-10-01 | 0.832461 | 0.730425 | 0.934497 |
In [7]:
Copied!
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>
%%html