Multi-series forecasting¶
In univariate time series forecasting, a single time series is modeled as a linear or nonlinear combination of its lags. That is, the past values of the series are used to forecast its future. In multi-series forecasting, two or more time series are modeled together using a single model. Two strategies can be distinguished:
No multivariate time series
A single model is trained, but each time series remains independent of the others. In other words, the past values of one series are not used as predictors of other series. Why is it useful then to model everything together? Although the series do not depend on each other, they may follow the same intrinsic pattern regarding their past and future values. For example, in the same store, the sales of products A and B may not be related, but they follow the same dynamics, that of the store.
In order to predict the next n steps, the same strategy of recursive multi-step forecasting is applied. The only difference is that, the series' name for which to estimate the predictions needs to be indicated.
Multivariate time series
All series are modeled considering that each time series depends not only on its past values but also on the past values of the other series. The forecaster is expected not only to learn the information of each series separately but also to relate them. For example, the measurements made by all the sensors (flow, item_2, pressure...) installed on an industrial machine such as a compressor.
For a more detailed example visit: Multi-series-forecasting
  Note
ForecasterAutoregMultiSeries
class covers the use case of No Multivariate time series.
ForecasterAutoregMultivariate
will be released in a future version of Skforecast - stay tuned!
Libraries¶
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error
from skforecast.ForecasterAutoregMultiSeries import ForecasterAutoregMultiSeries
from skforecast.model_selection_multiseries import backtesting_forecaster_multiseries
from skforecast.model_selection_multiseries import grid_search_forecaster_multiseries
Data¶
# Data download
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/' +
'data/simulated_items_sales.csv')
data = pd.read_csv(url, sep=',')
# Data preparation (aggregation at daily level)
# ==============================================================================
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data = data.set_index('date')
data = data.asfreq('D')
data = data.sort_index()
data.head()
# Split data into train-val-test
# ==============================================================================
end_train = '2014-07-15 23:59:00'
data_train = data.loc[:end_train, :].copy()
data_test = data.loc[end_train:, :].copy()
print(f"Train dates : {data_train.index.min()} --- {data_train.index.max()} (n={len(data_train)})")
print(f"Test dates : {data_test.index.min()} --- {data_test.index.max()} (n={len(data_test)})")
Train dates : 2012-01-01 00:00:00 --- 2014-07-15 00:00:00 (n=927) Test dates : 2014-07-16 00:00:00 --- 2015-01-01 00:00:00 (n=170)
# Plot time series
# ==============================================================================
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(10, 6), sharex=True)
data_train['item_1'].plot(label='train', ax=axes[0])
data_test['item_1'].plot(label='test', ax=axes[0])
axes[0].set_xlabel('')
axes[0].set_ylabel('MW')
axes[0].set_title('Item 1')
axes[0].legend()
data_train['item_2'].plot(label='train', ax=axes[1])
data_test['item_2'].plot(label='test', ax=axes[1])
axes[1].set_xlabel('')
axes[1].set_ylabel('sales')
axes[1].set_title('Item 2')
data_train['item_3'].plot(label='train', ax=axes[2])
data_test['item_3'].plot(label='test', ax=axes[2])
axes[2].set_xlabel('')
axes[2].set_ylabel('sales')
axes[2].set_title('Item 3')
fig.tight_layout()
plt.show();
Train and predict ForecasterAutoregMultiSeries¶
# Create and fit forecaster multi series
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
regressor = LinearRegression(),
lags = 24,
transformer_series = None,
transformer_exog = None
)
forecaster.fit(series=data_train)
forecaster
============================ ForecasterAutoregMultiSeries ============================ Regressor: LinearRegression() Lags: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] Transformer for series: {'item_1': None, 'item_2': None, 'item_3': None} Transformer for exog: None Window size: 24 Series levels: ['item_1', 'item_2', 'item_3'] Included exogenous: False Type of exogenous variable: None Exogenous variables names: None Training range: [Timestamp('2012-01-01 00:00:00'), Timestamp('2014-07-15 00:00:00')] Training index type: DatetimeIndex Training index frequency: D Regressor parameters: {'copy_X': True, 'fit_intercept': True, 'n_jobs': None, 'normalize': 'deprecated', 'positive': False} Creation date: 2022-10-13 16:46:31 Last fit date: 2022-10-13 16:46:32 Skforecast version: 0.5.1 Python version: 3.9.13
Two methods can be use to predict the next n steps: predict()
or predict_interval()
. The argument level
is used to indicate for which series estimate predictions.
# Predict and predict_interval
# ==============================================================================
steps = 24
# Predictions for item_1
predictions_item_1 = forecaster.predict(steps=steps, level='item_1')
display(predictions_item_1.head(3))
# Interval predictions for item_1
predictions_temp = forecaster.predict_interval(steps=steps, level='item_1')
display(predictions_temp.head(3))
2014-07-16 25.497655 2014-07-17 24.867333 2014-07-18 24.281557 Freq: D, Name: pred, dtype: float64
pred | lower_bound | upper_bound | |
---|---|---|---|
2014-07-16 | 25.497655 | 23.220017 | 28.226023 |
2014-07-17 | 24.867333 | 22.141024 | 27.389717 |
2014-07-18 | 24.281557 | 21.688227 | 26.981138 |
Backtesting Multi Series¶
As in the predict
method, the level
at which backtesting is performed must be indicated.
# Backtesting Multi Series
# ==============================================================================
metric, predictions_item_1 = backtesting_forecaster_multiseries(
forecaster = forecaster,
series = data,
level = 'item_1',
initial_train_size = len(data_train),
fixed_train_size = True,
steps = 24,
metric = 'mean_absolute_error',
refit = True,
verbose = False
)
print(f"Backtest error item_1: {metric}")
predictions_item_1.head(4)
Backtest error item_1: 1.3607708085816819
pred | |
---|---|
2014-07-16 | 25.497655 |
2014-07-17 | 24.867333 |
2014-07-18 | 24.281557 |
2014-07-19 | 23.515885 |
Hyperparameter tuning and lags selection Multi Series¶
Functions grid_search_forecaster_multiseries
and random_search_forecaster_multiseries
in the module model_selection_multiseries
allow for lag and hyperparameter optimization. The optimization is performed in the same way as in the other forecasters, see the user guide here, except for two arguments:
levels
: level(s) at which the forecaster is optimized, for example:If
levels = ['item_1', 'item_2']
(Same aslevels = None
), the function will search for the lags and hyperparameters that minimize the average error of the predictions of both time series. The resulting metric will be a weighted average of the optimization of both levels.If
levels = 'item_1'
(Same aslevels = ['item_1']
), the function will search for the lags and hyperparameters that minimize the error of theitem_1
predictions.
levels_weights
: weights given to each level during optimization, for example:- If
levels = ['item_1', 'item_2']
andlevels_weights = {'item_1': 0.5, 'item_2': 0.5}
(Same aslevels_weights = None
), both time series will have the same weight in the calculation of the resulting metric.
- If
The following example shows how to use grid_search_forecaster_multiseries
to find the best lags and model hyperparameters for both time series:
# Create and fit forecaster multi series
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
regressor = Ridge(random_state=123),
lags = 24,
transformer_series = StandardScaler(),
transformer_exog = None
)
# Grid search Multi Series
# ==============================================================================
lags_grid = [24, 48]
param_grid = {'alpha': [0.01, 0.1, 1]}
levels = ['item_1', 'item_2', 'item_3']
results = grid_search_forecaster_multiseries(
forecaster = forecaster,
series = data,
lags_grid = lags_grid,
param_grid = param_grid,
steps = 24,
metric = 'mean_absolute_error',
initial_train_size = len(data_train),
fixed_train_size = True,
levels = levels,
exog = None,
refit = True,
return_best = False,
verbose = False
)
results
6 models compared for 3 level(s). Number of iterations: 18. Level weights for metric evaluation: {'item_1': 0.3333333333333333, 'item_2': 0.3333333333333333, 'item_3': 0.3333333333333333}
loop lags_grid: 100%|███████████████████████████████████████| 2/2 [00:08<00:00, 4.04s/it]
levels | lags | params | mean_absolute_error | alpha | |
---|---|---|---|---|---|
5 | [item_1, item_2, item_3] | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... | {'alpha': 1} | 2.207648 | 1.00 |
4 | [item_1, item_2, item_3] | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... | {'alpha': 0.1} | 2.207700 | 0.10 |
3 | [item_1, item_2, item_3] | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... | {'alpha': 0.01} | 2.207706 | 0.01 |
2 | [item_1, item_2, item_3] | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... | {'alpha': 1} | 2.335039 | 1.00 |
1 | [item_1, item_2, item_3] | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... | {'alpha': 0.1} | 2.335149 | 0.10 |
0 | [item_1, item_2, item_3] | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... | {'alpha': 0.01} | 2.335161 | 0.01 |
  Warning
Use multiple metrics in the hyperparameter tuning andbayesian_search_forecaster_multiseries
will be released in a future version of Skforecast
Stay tuned!
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>