Global Forecasting Models: Independent multi-series forecasting¶

Univariate time series forecasting models a single time series as a linear or nonlinear combination of its lags, using past values of the series to predict its future. Global forecasting, involves building a single predictive model that considers all time series simultaneously. It attempts to capture the core patterns that govern the series, thereby mitigating the potential noise that each series might introduce. This approach is computationally efficient, easy to maintain, and can yield more robust generalizations across time series.

In independent multi-series forecasting a single model is trained for all time series, but each time series remains independent of the others, meaning that past values of one series are not used as predictors of other series. However, modeling them together is useful because the series may follow the same intrinsic pattern regarding their past and future values. For instance, the sales of products A and B in the same store may not be related, but they follow the same dynamics, that of the store.

No description has been provided for this image
Internal Forecaster transformation of two time series and an exogenous variable into the matrices needed to train a machine learning model in a multi-series context.

To predict the next n steps, the strategy of recursive multi-step forecasting is applied, with the only difference being that the series name for which to estimate the predictions needs to be indicated.

No description has been provided for this image
Diagram of recursive forecasting with multiple independent time series.

Using the ForecasterAutoregMultiSeries and ForecasterAutoregMultiSeriesCustom classes, it is possible to easily build machine learning models for independent multi-series forecasting.

✎ Note

Skforecast offers additional approaches to create Global Forecasting Models:

💡 Tip

To learn more about global forecasting models visit our examples:

Libraries¶

In [1]:

Copied!





# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

from skforecast.datasets import fetch_dataset
from skforecast.ForecasterAutoregMultiSeries import ForecasterAutoregMultiSeries
from skforecast.model_selection_multiseries import backtesting_forecaster_multiseries
from skforecast.model_selection_multiseries import grid_search_forecaster_multiseries
from skforecast.model_selection_multiseries import bayesian_search_forecaster_multiseries
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

from skforecast.datasets import fetch_dataset
from skforecast.ForecasterAutoregMultiSeries import ForecasterAutoregMultiSeries
from skforecast.model_selection_multiseries import backtesting_forecaster_multiseries
from skforecast.model_selection_multiseries import grid_search_forecaster_multiseries
from skforecast.model_selection_multiseries import bayesian_search_forecaster_multiseries

Data¶

In [2]:

Copied!





# Data download
# ==============================================================================
data = fetch_dataset(name="items_sales")
data.head()
# Data download
# ==============================================================================
data = fetch_dataset(name="items_sales")
data.head()

items_sales
-----------
Simulated time series for the sales of 3 different items.
Simulated data.
Shape of the dataset: (1097, 3)

Out[2]:

	item_1	item_2	item_3
date
2012-01-01	8.253175	21.047727	19.429739
2012-01-02	22.777826	26.578125	28.009863
2012-01-03	27.549099	31.751042	32.078922
2012-01-04	25.895533	24.567708	27.252276
2012-01-05	21.379238	18.191667	20.357737

In [3]:

Copied!





# Split data into train-val-test
# ==============================================================================
end_train = '2014-07-15 23:59:00'
data_train = data.loc[:end_train, :].copy()
data_test  = data.loc[end_train:, :].copy()

print(
    f"Train dates : {data_train.index.min()} --- {data_train.index.max()}   "
    f"(n={len(data_train)})"
)
print(
    f"Test dates  : {data_test.index.min()} --- {data_test.index.max()}   "
    f"(n={len(data_test)})"
)
# Split data into train-val-test
# ==============================================================================
end_train = '2014-07-15 23:59:00'
data_train = data.loc[:end_train, :].copy()
data_test  = data.loc[end_train:, :].copy()

print(
    f"Train dates : {data_train.index.min()} --- {data_train.index.max()}   "
    f"(n={len(data_train)})"
)
print(
    f"Test dates  : {data_test.index.min()} --- {data_test.index.max()}   "
    f"(n={len(data_test)})"
)

Train dates : 2012-01-01 00:00:00 --- 2014-07-15 00:00:00   (n=927)
Test dates  : 2014-07-16 00:00:00 --- 2015-01-01 00:00:00   (n=170)

In [4]:

Copied!





# Plot time series
# ==============================================================================
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(9, 5), sharex=True)

data_train['item_1'].plot(label='train', ax=axes[0])
data_test['item_1'].plot(label='test', ax=axes[0])
axes[0].set_xlabel('')
axes[0].set_ylabel('sales')
axes[0].set_title('Item 1')
axes[0].legend()

data_train['item_2'].plot(label='train', ax=axes[1])
data_test['item_2'].plot(label='test', ax=axes[1])
axes[1].set_xlabel('')
axes[1].set_ylabel('sales')
axes[1].set_title('Item 2')

data_train['item_3'].plot(label='train', ax=axes[2])
data_test['item_3'].plot(label='test', ax=axes[2])
axes[2].set_xlabel('')
axes[2].set_ylabel('sales')
axes[2].set_title('Item 3')

fig.tight_layout()
plt.show();
# Plot time series
# ==============================================================================
fig, axes = plt.subplots(nrows=3, ncols=1, figsize=(9, 5), sharex=True)

data_train['item_1'].plot(label='train', ax=axes[0])
data_test['item_1'].plot(label='test', ax=axes[0])
axes[0].set_xlabel('')
axes[0].set_ylabel('sales')
axes[0].set_title('Item 1')
axes[0].legend()

data_train['item_2'].plot(label='train', ax=axes[1])
data_test['item_2'].plot(label='test', ax=axes[1])
axes[1].set_xlabel('')
axes[1].set_ylabel('sales')
axes[1].set_title('Item 2')

data_train['item_3'].plot(label='train', ax=axes[2])
data_test['item_3'].plot(label='test', ax=axes[2])
axes[2].set_xlabel('')
axes[2].set_ylabel('sales')
axes[2].set_title('Item 3')

fig.tight_layout()
plt.show();

No description has been provided for this image

Train and predict ForecasterAutoregMultiSeries¶

In [5]:

Copied!





# Create and fit a Forecaster Multi-Series
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor          = RandomForestRegressor(random_state=123),
                 lags               = 24,
                 encoding           = 'ordinal',
                 transformer_series = StandardScaler(),
                 transformer_exog   = None,
                 weight_func        = None,
                 series_weights     = None,
                 differentiation    = None,
                 dropna_from_series = False,
                 fit_kwargs         = None,
                 forecaster_id      = None
             )

forecaster.fit(series=data_train)
forecaster
# Create and fit a Forecaster Multi-Series
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor          = RandomForestRegressor(random_state=123),
                 lags               = 24,
                 encoding           = 'ordinal',
                 transformer_series = StandardScaler(),
                 transformer_exog   = None,
                 weight_func        = None,
                 series_weights     = None,
                 differentiation    = None,
                 dropna_from_series = False,
                 fit_kwargs         = None,
                 forecaster_id      = None
             )

forecaster.fit(series=data_train)
forecaster

Out[5]:

============================ 
ForecasterAutoregMultiSeries 
============================ 
Regressor: RandomForestRegressor(random_state=123) 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] 
Transformer for series: StandardScaler() 
Transformer for exog: None 
Series encoding: ordinal 
Window size: 24 
Series levels (names): ['item_1', 'item_2', 'item_3'] 
Series weights: None 
Weight function included: False 
Differentiation order: None 
Exogenous included: False 
Type of exogenous variable: None 
Exogenous variables names: None 
Training range: ["'item_1': ['2012-01-01', '2014-07-15']", "'item_2': ['2012-01-01', '2014-07-15']", "'item_3': ['2012-01-01', '2014-07-15']"] 
Training index type: DatetimeIndex 
Training index frequency: D 
Regressor parameters: bootstrap: True, ccp_alpha: 0.0, criterion: squared_error, max_depth: None, max_features: 1.0, ... 
fit_kwargs: {} 
Creation date: 2024-05-20 14:52:33 
Last fit date: 2024-05-20 14:52:38 
Skforecast version: 0.12.0 
Python version: 3.11.5 
Forecaster id: None

Two methods can be use to predict the next n steps: predict() or predict_interval(). The argument levels is used to indicate for which series estimate predictions. If None all series will be predicted.

In [6]:

Copied!





# Predict and predict_interval
# ==============================================================================
steps = 24

# Predictions for item_1
predictions_item_1 = forecaster.predict(steps=steps, levels='item_1')
display(predictions_item_1.head(3))

# Interval predictions for item_1 and item_2
predictions_intervals = forecaster.predict_interval(steps=steps, levels=['item_1', 'item_2'])
display(predictions_intervals.head(3))
# Predict and predict_interval
# ==============================================================================
steps = 24

# Predictions for item_1
predictions_item_1 = forecaster.predict(steps=steps, levels='item_1')
display(predictions_item_1.head(3))

# Interval predictions for item_1 and item_2
predictions_intervals = forecaster.predict_interval(steps=steps, levels=['item_1', 'item_2'])
display(predictions_intervals.head(3))

	item_1
2014-07-16	25.727855
2014-07-17	25.846049
2014-07-18	25.605574

	item_1	item_1_lower_bound	item_1_upper_bound	item_2	item_2_lower_bound	item_2_upper_bound
2014-07-16	25.727855	24.987747	26.574986	11.236102	9.801042	13.136002
2014-07-17	25.846049	24.870217	26.658201	10.930129	9.388806	13.134655
2014-07-18	25.605574	24.442493	26.235981	11.445854	9.573716	13.466252

Backtesting Multi Series¶

As in the predict method, the levels at which backtesting is performed must be indicated. The argument can also be set to None to perform backtesting at all levels.

In [7]:

Copied!





# Backtesting Multi-Series
# ==============================================================================
metrics_levels, backtest_predictions = backtesting_forecaster_multiseries(
    forecaster            = forecaster,
    series                = data,
    exog                  = None,
    levels                = None,
    steps                 = 24,
    metric                = 'mean_absolute_error',
    initial_train_size    = len(data_train),
    fixed_train_size      = True,
    gap                   = 0,
    allow_incomplete_fold = True,
    refit                 = True,
    n_jobs                = 'auto',
    verbose               = False,
    show_progress         = True,
    suppress_warnings     = False
)

print("Backtest metrics")
display(metrics_levels)
print("")
print("Backtest predictions")
backtest_predictions.head(4)
# Backtesting Multi-Series
# ==============================================================================
metrics_levels, backtest_predictions = backtesting_forecaster_multiseries(
    forecaster            = forecaster,
    series                = data,
    exog                  = None,
    levels                = None,
    steps                 = 24,
    metric                = 'mean_absolute_error',
    initial_train_size    = len(data_train),
    fixed_train_size      = True,
    gap                   = 0,
    allow_incomplete_fold = True,
    refit                 = True,
    n_jobs                = 'auto',
    verbose               = False,
    show_progress         = True,
    suppress_warnings     = False
)

print("Backtest metrics")
display(metrics_levels)
print("")
print("Backtest predictions")
backtest_predictions.head(4)

  0%|          | 0/8 [00:00<?, ?it/s]

Backtest metrics

	levels	mean_absolute_error
0	item_1	1.244997
1	item_2	2.449936
2	item_3	3.219898

Backtest predictions

Out[7]:

	item_1	item_2	item_3
2014-07-16	25.727855	11.236102	10.598939
2014-07-17	25.846049	10.930129	11.898157
2014-07-18	25.605574	11.445854	11.708411
2014-07-19	24.156949	11.298911	12.194366

Hyperparameter tuning and lags selection Multi Series¶

The grid_search_forecaster_multiseries, random_search_forecaster_multiseries and bayesian_search_forecaster_multiseries functions in the model_selection_multiseries module allow for lags and hyperparameter optimization. It is performed using the backtesting strategy for validation as in other Forecasters, see the user guide here, except for the levels argument:

levels: level(s) at which the forecaster is optimized, for example:
- If levels = ['item_1', 'item_2', 'item_3'] (Same as levels = None), the function will search for the lags and hyperparameters that minimize the average error of the predictions of all the time series. The resulting metric will be the average of all levels.
- If levels = 'item_1' (Same as levels = ['item_1']), the function will search for the lags and hyperparameters that minimize the error of the item_1 predictions. The resulting metric will be the one calculated for item_1.

The following example shows how to use grid_search_forecaster_multiseries to find the best lags and model hyperparameters for all time series (all levels).

In [8]:

Copied!





# Create Forecaster Multi-Series
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor          = RandomForestRegressor(random_state=123),
                 lags               = 24,
                 encoding           = 'ordinal',
                 transformer_series = StandardScaler()
             )
# Create Forecaster Multi-Series
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor          = RandomForestRegressor(random_state=123),
                 lags               = 24,
                 encoding           = 'ordinal',
                 transformer_series = StandardScaler()
             )

In [9]:

Copied!





# Grid search Multi-Series
# ==============================================================================
lags_grid = [24, 48]
param_grid = {
    'n_estimators': [10, 20],
    'max_depth': [3, 7]
}

levels = ['item_1', 'item_2', 'item_3']

results = grid_search_forecaster_multiseries(
              forecaster         = forecaster,
              series             = data,
              exog               = None,
              levels             = levels, # Same as levels=None
              lags_grid          = lags_grid,
              param_grid         = param_grid,
              steps              = 24,
              metric             = 'mean_absolute_error',
              initial_train_size = len(data_train),
              refit              = False,
              fixed_train_size   = False,
              return_best        = False,
              n_jobs             = 'auto',
              verbose            = False,
              show_progress      = True
          )

results
# Grid search Multi-Series
# ==============================================================================
lags_grid = [24, 48]
param_grid = {
    'n_estimators': [10, 20],
    'max_depth': [3, 7]
}

levels = ['item_1', 'item_2', 'item_3']

results = grid_search_forecaster_multiseries(
              forecaster         = forecaster,
              series             = data,
              exog               = None,
              levels             = levels, # Same as levels=None
              lags_grid          = lags_grid,
              param_grid         = param_grid,
              steps              = 24,
              metric             = 'mean_absolute_error',
              initial_train_size = len(data_train),
              refit              = False,
              fixed_train_size   = False,
              return_best        = False,
              n_jobs             = 'auto',
              verbose            = False,
              show_progress      = True
          )

results

8 models compared for 3 level(s). Number of iterations: 8.

lags grid:   0%|          | 0/2 [00:00<?, ?it/s]

params grid:   0%|          | 0/4 [00:00<?, ?it/s]

Out[9]:

	levels	lags	lags_label	params	mean_absolute_error	max_depth	n_estimators
7	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 7, 'n_estimators': 20}	2.339379	7	20
3	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 7, 'n_estimators': 20}	2.358138	7	20
6	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 7, 'n_estimators': 10}	2.392723	7	10
2	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 7, 'n_estimators': 10}	2.402785	7	10
5	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 3, 'n_estimators': 20}	2.464477	3	20
1	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 3, 'n_estimators': 20}	2.532149	3	20
4	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 3, 'n_estimators': 10}	2.588693	3	10
0	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 3, 'n_estimators': 10}	2.706529	3	10

It is also possible to perform a bayesian optimization with optuna using the bayesian_search_forecaster_multiseries function. For more information about this type of optimization, see the user guide here.

In [10]:

Copied!





# Bayesian search hyperparameters and lags with Optuna
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 24,
                 encoding  = 'ordinal'
             )

levels = ['item_1', 'item_2', 'item_3']

# Search space
def search_space(trial):
    search_space  = {
        'lags'             : trial.suggest_categorical('lags', [24, 48]),
        'n_estimators'     : trial.suggest_int('n_estimators', 10, 20),
        'min_samples_leaf' : trial.suggest_int('min_samples_leaf', 1, 10),
        'ccp_alpha'        : trial.suggest_float('ccp_alpha', 0., 1.),
        'max_features'     : trial.suggest_categorical('max_features', ['log2', 'sqrt'])
    }

    return search_space

results, best_trial = bayesian_search_forecaster_multiseries(
    forecaster            = forecaster,
    series                = data,
    exog                  = None,
    levels                = levels, # Same as levels=None
    search_space          = search_space,
    steps                 = 24,
    metric                = 'mean_absolute_error',
    refit                 = False,
    initial_train_size    = len(data_train),
    fixed_train_size      = False,
    n_trials              = 5,
    random_state          = 123,
    return_best           = False,
    n_jobs                = 'auto',
    verbose               = False,
    show_progress         = True,
    suppress_warnings     = False,
    engine                = 'optuna',
    kwargs_create_study   = {},
    kwargs_study_optimize = {}
)

results.head(4)
# Bayesian search hyperparameters and lags with Optuna
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 24,
                 encoding  = 'ordinal'
             )

levels = ['item_1', 'item_2', 'item_3']

# Search space
def search_space(trial):
    search_space  = {
        'lags'             : trial.suggest_categorical('lags', [24, 48]),
        'n_estimators'     : trial.suggest_int('n_estimators', 10, 20),
        'min_samples_leaf' : trial.suggest_int('min_samples_leaf', 1, 10),
        'ccp_alpha'        : trial.suggest_float('ccp_alpha', 0., 1.),
        'max_features'     : trial.suggest_categorical('max_features', ['log2', 'sqrt'])
    }

    return search_space

results, best_trial = bayesian_search_forecaster_multiseries(
    forecaster            = forecaster,
    series                = data,
    exog                  = None,
    levels                = levels, # Same as levels=None
    search_space          = search_space,
    steps                 = 24,
    metric                = 'mean_absolute_error',
    refit                 = False,
    initial_train_size    = len(data_train),
    fixed_train_size      = False,
    n_trials              = 5,
    random_state          = 123,
    return_best           = False,
    n_jobs                = 'auto',
    verbose               = False,
    show_progress         = True,
    suppress_warnings     = False,
    engine                = 'optuna',
    kwargs_create_study   = {},
    kwargs_study_optimize = {}
)

results.head(4)

  0%|          | 0/5 [00:00<?, ?it/s]

Out[10]:

	levels	lags	params	mean_absolute_error	n_estimators	min_samples_leaf	ccp_alpha	max_features
3	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'n_estimators': 16, 'min_samples_leaf': 8, 'c...	3.007254	16	8	0.322959	log2
4	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'n_estimators': 11, 'min_samples_leaf': 5, 'c...	3.070554	11	5	0.430863	log2
2	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'n_estimators': 12, 'min_samples_leaf': 2, 'c...	3.070939	12	2	0.531551	sqrt
1	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'n_estimators': 14, 'min_samples_leaf': 4, 'c...	3.088771	14	4	0.729050	log2

best_trial contains information of the trial which achived the best results. See more in Study class.

In [11]:

Copied!

# Optuna best trial in the study
# ==============================================================================
best_trial
# Optuna best trial in the study
# ==============================================================================
best_trial

Out[11]:

FrozenTrial(number=3, state=1, values=[3.007253701973608], datetime_start=datetime.datetime(2024, 5, 20, 14, 54, 4, 206400), datetime_complete=datetime.datetime(2024, 5, 20, 14, 54, 4, 441404), params={'lags': 24, 'n_estimators': 16, 'min_samples_leaf': 8, 'ccp_alpha': 0.3229589138531782, 'max_features': 'log2'}, user_attrs={}, system_attrs={}, intermediate_values={}, distributions={'lags': CategoricalDistribution(choices=(24, 48)), 'n_estimators': IntDistribution(high=20, log=False, low=10, step=1), 'min_samples_leaf': IntDistribution(high=10, log=False, low=1, step=1), 'ccp_alpha': FloatDistribution(high=1.0, log=False, low=0.0, step=None), 'max_features': CategoricalDistribution(choices=('log2', 'sqrt'))}, trial_id=3, value=None)

Exogenous variables in multi-series¶

Exogenous variables are predictors that are independent of the model being used for forecasting, and their future values must be known in order to include them in the prediction process.

✎ Note

Starting from version 0.12.0, the ForecasterAutoregMultiSeries allows the use of different exogenous variables for each series. See Global Forecasting Models: Time series with different lengths and different exogenous variables for more information.

💡 Tip

To learn more about exogenous variables in skforecast visit the exogenous variables user guide.

In [12]:

Copied!





# Generate exogenous variable month
# ==============================================================================
data_exog = data.copy()
data_exog['month'] = data_exog.index.month

# Split data into train-val-test
# ==============================================================================
end_train = '2014-07-15 23:59:00'
data_exog_train = data_exog.loc[:end_train, :].copy()
data_exog_test  = data_exog.loc[end_train:, :].copy()

data_exog_train.head(3)
# Generate exogenous variable month
# ==============================================================================
data_exog = data.copy()
data_exog['month'] = data_exog.index.month

# Split data into train-val-test
# ==============================================================================
end_train = '2014-07-15 23:59:00'
data_exog_train = data_exog.loc[:end_train, :].copy()
data_exog_test  = data_exog.loc[end_train:, :].copy()

data_exog_train.head(3)

Out[12]:

	item_1	item_2	item_3	month
date
2012-01-01	8.253175	21.047727	19.429739	1
2012-01-02	22.777826	26.578125	28.009863	1
2012-01-03	27.549099	31.751042	32.078922	1

In [13]:

Copied!





# Create and fit forecaster Multi-Series
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 24,
                 encoding  = 'ordinal'
             )

forecaster.fit(
    series = data_exog_train[['item_1', 'item_2', 'item_3']], 
    exog   = data_exog_train[['month']]
)
forecaster
# Create and fit forecaster Multi-Series
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 24,
                 encoding  = 'ordinal'
             )

forecaster.fit(
    series = data_exog_train[['item_1', 'item_2', 'item_3']], 
    exog   = data_exog_train[['month']]
)
forecaster

Out[13]:

============================ 
ForecasterAutoregMultiSeries 
============================ 
Regressor: RandomForestRegressor(random_state=123) 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] 
Transformer for series: StandardScaler() 
Transformer for exog: None 
Series encoding: ordinal 
Window size: 24 
Series levels (names): ['item_1', 'item_2', 'item_3'] 
Series weights: None 
Weight function included: False 
Differentiation order: None 
Exogenous included: True 
Type of exogenous variable: <class 'pandas.core.frame.DataFrame'> 
Exogenous variables names: ['month'] 
Training range: ["'item_1': ['2012-01-01', '2014-07-15']", "'item_2': ['2012-01-01', '2014-07-15']", "'item_3': ['2012-01-01', '2014-07-15']"] 
Training index type: DatetimeIndex 
Training index frequency: D 
Regressor parameters: bootstrap: True, ccp_alpha: 0.0, criterion: squared_error, max_depth: None, max_features: 1.0, ... 
fit_kwargs: {} 
Creation date: 2024-05-20 14:54:04 
Last fit date: 2024-05-20 14:54:09 
Skforecast version: 0.12.0 
Python version: 3.11.5 
Forecaster id: None

If the Forecaster has been trained using exogenous variables, they should be provided during the prediction phase.

In [14]:

Copied!





# Predict with exogenous variables
# ==============================================================================
predictions = forecaster.predict(steps=24, exog=data_exog_test[['month']])
predictions.head(3)
# Predict with exogenous variables
# ==============================================================================
predictions = forecaster.predict(steps=24, exog=data_exog_test[['month']])
predictions.head(3)

Out[14]:

	item_1	item_2	item_3
2014-07-16	25.793280	11.110627	10.682699
2014-07-17	25.846751	11.049392	12.089319
2014-07-18	25.552653	11.316862	12.093196

As mentioned earlier, the month exogenous variable is replicated for each of the series. This can be easily demonstrated using the create_train_X_y method, which returns the matrix used in the fit method.

In [15]:

Copied!





# X_train matrix
# ==============================================================================
X_train = forecaster.create_train_X_y(
    series = data_exog_train[['item_1', 'item_2', 'item_3']], 
    exog   = data_exog_train[['month']]
)[0]
# X_train matrix
# ==============================================================================
X_train = forecaster.create_train_X_y(
    series = data_exog_train[['item_1', 'item_2', 'item_3']], 
    exog   = data_exog_train[['month']]
)[0]

In [16]:

Copied!

# X_train slice for item_1
# ==============================================================================
X_train.loc[X_train['_level_skforecast'] == 0].head(3)
# X_train slice for item_1
# ==============================================================================
X_train.loc[X_train['_level_skforecast'] == 0].head(3)

Out[16]:

	lag_1	lag_2	lag_3	lag_4	lag_5	lag_6	lag_7	lag_8	lag_9	lag_10	...	lag_17	lag_18	lag_19	lag_20	lag_21	lag_22	lag_23	lag_24	_level_skforecast	month
date
2012-01-25	2.163111	0.587328	-0.656056	0.010719	0.602052	0.896105	2.641973	1.623623	-0.877145	-1.365855	...	-0.939251	-0.757959	-0.534430	-0.428047	1.334476	1.979794	0.117764	-5.550607	0	1
2012-01-26	2.447474	2.163111	0.587328	-0.656056	0.010719	0.602052	0.896105	2.641973	1.623623	-0.877145	...	-0.963902	-0.939251	-0.757959	-0.534430	-0.428047	1.334476	1.979794	0.117764	0	1
2012-01-27	0.558968	2.447474	2.163111	0.587328	-0.656056	0.010719	0.602052	0.896105	2.641973	1.623623	...	-0.334016	-0.963902	-0.939251	-0.757959	-0.534430	-0.428047	1.334476	1.979794	0	1

3 rows × 26 columns

In [17]:

Copied!

# X_train slice for item_2
# ==============================================================================
X_train.loc[X_train['_level_skforecast'] == 1].head(3)
# X_train slice for item_2
# ==============================================================================
X_train.loc[X_train['_level_skforecast'] == 1].head(3)

Out[17]:

	lag_1	lag_2	lag_3	lag_4	lag_5	lag_6	lag_7	lag_8	lag_9	lag_10	...	lag_17	lag_18	lag_19	lag_20	lag_21	lag_22	lag_23	lag_24	_level_skforecast	month
date
2012-01-25	2.050924	1.782460	1.054339	0.654039	0.551985	0.569480	2.471635	2.327927	0.567397	0.135856	...	1.535865	0.618424	0.278939	0.354751	1.629588	3.065837	2.031555	0.925797	1	1
2012-01-26	2.038219	2.050924	1.782460	1.054339	0.654039	0.551985	0.569480	2.471635	2.327927	0.567397	...	0.761091	1.535865	0.618424	0.278939	0.354751	1.629588	3.065837	2.031555	1	1
2012-01-27	0.668201	2.038219	2.050924	1.782460	1.054339	0.654039	0.551985	0.569480	2.471635	2.327927	...	0.548653	0.761091	1.535865	0.618424	0.278939	0.354751	1.629588	3.065837	1	1

3 rows × 26 columns

To use exogenous variables in backtesting or hyperparameter tuning, they must be specified with the exog argument.

In [18]:

Copied!





# Backtesting Multi-Series with exog
# ==============================================================================
metrics_levels, backtest_predictions = backtesting_forecaster_multiseries(
    forecaster            = forecaster,
    series                = data_exog[['item_1', 'item_2', 'item_3']],
    exog                  = data_exog[['month']],
    levels                = None,
    steps                 = 24,
    metric                = 'mean_absolute_error',
    initial_train_size    = len(data_exog_train),
    fixed_train_size      = True,
    gap                   = 0,
    allow_incomplete_fold = True,
    refit                 = True,
    n_jobs                = 'auto',
    verbose               = False,
    show_progress         = True,
    suppress_warnings     = False
)

print("Backtest metrics")
display(metrics_levels)
print("")
print("Backtest predictions with exogenous variables")
backtest_predictions.head(4)
# Backtesting Multi-Series with exog
# ==============================================================================
metrics_levels, backtest_predictions = backtesting_forecaster_multiseries(
    forecaster            = forecaster,
    series                = data_exog[['item_1', 'item_2', 'item_3']],
    exog                  = data_exog[['month']],
    levels                = None,
    steps                 = 24,
    metric                = 'mean_absolute_error',
    initial_train_size    = len(data_exog_train),
    fixed_train_size      = True,
    gap                   = 0,
    allow_incomplete_fold = True,
    refit                 = True,
    n_jobs                = 'auto',
    verbose               = False,
    show_progress         = True,
    suppress_warnings     = False
)

print("Backtest metrics")
display(metrics_levels)
print("")
print("Backtest predictions with exogenous variables")
backtest_predictions.head(4)

  0%|          | 0/8 [00:00<?, ?it/s]

Backtest metrics

	levels	mean_absolute_error
0	item_1	1.256685
1	item_2	2.408477
2	item_3	3.224960

Backtest predictions with exogenous variables

Out[18]:

	item_1	item_2	item_3
2014-07-16	25.793280	11.110627	10.682699
2014-07-17	25.846751	11.049392	12.089319
2014-07-18	25.552653	11.316862	12.093196
2014-07-19	24.144075	11.357579	12.357503

Scikit-learn transformers in multi-series¶

By default, the ForecasterAutoregMultiSeries class uses the scikit-learn StandardScaler transformer to scale the data. This transformer is applied to all series. However, it is possible to use different transformers for each series or not to apply any transformation at all:

If transformer_series is a transformer the same transformation will be applied to all series.
If transformer_series is a dict a different transformation can be set for each series. Series not present in the dict will not have any transformation applied to them (check warning message).

Learn more about using scikit-learn transformers with skforecast.

In [19]:

Copied!





# Transformers in Multi-Series
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor          = RandomForestRegressor(random_state=123),
                 lags               = 24,
                 encoding           = 'ordinal',
                 transformer_series = {'item_1': StandardScaler(), 'item_2': StandardScaler()},
                 transformer_exog   = None
             )

forecaster.fit(series=data_train)
forecaster
# Transformers in Multi-Series
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor          = RandomForestRegressor(random_state=123),
                 lags               = 24,
                 encoding           = 'ordinal',
                 transformer_series = {'item_1': StandardScaler(), 'item_2': StandardScaler()},
                 transformer_exog   = None
             )

forecaster.fit(series=data_train)
forecaster

c:\Users\jaesc2\Miniconda3\envs\skforecast_py11\Lib\site-packages\skforecast\utils\utils.py:233: IgnoredArgumentWarning: {'item_3'} not present in `transformer_series`. No transformation is applied to these series. 
 You can suppress this warning using: warnings.simplefilter('ignore', category=IgnoredArgumentWarning)
  warnings.warn(

Out[19]:

============================ 
ForecasterAutoregMultiSeries 
============================ 
Regressor: RandomForestRegressor(random_state=123) 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] 
Transformer for series: {'item_1': StandardScaler(), 'item_2': StandardScaler()} 
Transformer for exog: None 
Series encoding: ordinal 
Window size: 24 
Series levels (names): ['item_1', 'item_2', 'item_3'] 
Series weights: None 
Weight function included: False 
Differentiation order: None 
Exogenous included: False 
Type of exogenous variable: None 
Exogenous variables names: None 
Training range: ["'item_1': ['2012-01-01', '2014-07-15']", "'item_2': ['2012-01-01', '2014-07-15']", "'item_3': ['2012-01-01', '2014-07-15']"] 
Training index type: DatetimeIndex 
Training index frequency: D 
Regressor parameters: bootstrap: True, ccp_alpha: 0.0, criterion: squared_error, max_depth: None, max_features: 1.0, ... 
fit_kwargs: {} 
Creation date: 2024-05-20 14:54:20 
Last fit date: 2024-05-20 14:54:24 
Skforecast version: 0.12.0 
Python version: 3.11.5 
Forecaster id: None

Series with different lengths and different exogenous variables¶

Starting from version 0.12.0, the classes ForecasterAutoregMultiSeries and ForecasterAutoregMultiSeriesCustom allow the simultaneous modeling of time series of different lengths and using different exogenous variables. Various scenarios are possible:

If series is a pandas DataFrame and exog is a pandas Series or DataFrame, each exog is duplicated for each series. exog must have the same index as series (type, length and frequency).
If series is a pandas DataFrame and exog is a dict of pandas Series or DataFrames. Each key in exog must be a column in series and the values are the exog for each series. exog must have the same index as series (type, length and frequency).
If series is a dict of pandas Series, exog must be a dict of pandas Series or DataFrames. The keys in series and exog must be the same. All series and exog must have a pandas DatetimeIndex with the same frequency.

Series type	Exog type	Requirements
`DataFrame`	`Series` or `DataFrame`	Same index (type, length and frequency)
`DataFrame`	`dict`	Same index (type, length and frequency)
`dict`	`dict`	Both `pandas DatetimeIndex` (same frequency)

In [20]:

Copied!





# Series and exog as DataFrames 
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 4,
                 encoding  = 'ordinal'
             )

X, y = forecaster.create_train_X_y(
    series = data_exog_train[['item_1', 'item_2', 'item_3']], 
    exog   = data_exog_train[['month']]
)
X.head(3)
# Series and exog as DataFrames 
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 4,
                 encoding  = 'ordinal'
             )

X, y = forecaster.create_train_X_y(
    series = data_exog_train[['item_1', 'item_2', 'item_3']], 
    exog   = data_exog_train[['month']]
)
X.head(3)

Out[20]:

	lag_1	lag_2	lag_3	lag_4	_level_skforecast	month
date
2012-01-05	1.334476	1.979794	0.117764	-5.550607	0	1
2012-01-06	-0.428047	1.334476	1.979794	0.117764	0	1
2012-01-07	-0.534430	-0.428047	1.334476	1.979794	0	1

When exog is a dictionary of pandas Series or DataFrames, different exogenous variables can be used for each series or the same exogenous variable can have different values for each series.

In [21]:

Copied!





# Ilustrative example of different values for the same exogenous variable
# ==============================================================================
exog_1_item_1_train = pd.Series([1]*len(data_exog_train), name='exog_1', index=data_exog_train.index)
exog_1_item_2_train = pd.Series([10]*len(data_exog_train), name='exog_1', index=data_exog_train.index)
exog_1_item_3_train = pd.Series([100]*len(data_exog_train), name='exog_1', index=data_exog_train.index)

exog_1_item_1_test = pd.Series([1]*len(data_exog_test), name='exog_1', index=data_exog_test.index)
exog_1_item_2_test = pd.Series([10]*len(data_exog_test), name='exog_1', index=data_exog_test.index)
exog_1_item_3_test = pd.Series([100]*len(data_exog_test), name='exog_1', index=data_exog_test.index)
# Ilustrative example of different values for the same exogenous variable
# ==============================================================================
exog_1_item_1_train = pd.Series([1]*len(data_exog_train), name='exog_1', index=data_exog_train.index)
exog_1_item_2_train = pd.Series([10]*len(data_exog_train), name='exog_1', index=data_exog_train.index)
exog_1_item_3_train = pd.Series([100]*len(data_exog_train), name='exog_1', index=data_exog_train.index)

exog_1_item_1_test = pd.Series([1]*len(data_exog_test), name='exog_1', index=data_exog_test.index)
exog_1_item_2_test = pd.Series([10]*len(data_exog_test), name='exog_1', index=data_exog_test.index)
exog_1_item_3_test = pd.Series([100]*len(data_exog_test), name='exog_1', index=data_exog_test.index)

In [22]:

Copied!





# Series as DataFrame and exog as dict
# ==============================================================================
exog_train_as_dict = {
    'item_1': exog_1_item_1_train,
    'item_2': exog_1_item_2_train,
    'item_3': exog_1_item_3_train
}

forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 4,
                 encoding  = 'ordinal'
             )

X, y = forecaster.create_train_X_y(
    series = data_exog_train[['item_1', 'item_2', 'item_3']], 
    exog   = exog_train_as_dict
)

display(X.head(3))
print("")
print("Column `exog_1` as different values for each item (_level_skforecast id)")
X['exog_1'].value_counts()
# Series as DataFrame and exog as dict
# ==============================================================================
exog_train_as_dict = {
    'item_1': exog_1_item_1_train,
    'item_2': exog_1_item_2_train,
    'item_3': exog_1_item_3_train
}

forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 4,
                 encoding  = 'ordinal'
             )

X, y = forecaster.create_train_X_y(
    series = data_exog_train[['item_1', 'item_2', 'item_3']], 
    exog   = exog_train_as_dict
)

display(X.head(3))
print("")
print("Column `exog_1` as different values for each item (_level_skforecast id)")
X['exog_1'].value_counts()

	lag_1	lag_2	lag_3	lag_4	_level_skforecast	exog_1
date
2012-01-05	1.334476	1.979794	0.117764	-5.550607	0	1
2012-01-06	-0.428047	1.334476	1.979794	0.117764	0	1
2012-01-07	-0.534430	-0.428047	1.334476	1.979794	0	1

Column `exog_1` as different values for each item (_level_skforecast id)

Out[22]:

exog_1
1      923
10     923
100    923
Name: count, dtype: int64

In [23]:

Copied!





# Predict with series as DataFrame and exog as dict
# ==============================================================================
forecaster.fit(
    series = data_exog_train[['item_1', 'item_2', 'item_3']], 
    exog   = exog_train_as_dict
)

exog_pred_as_dict = {
    'item_1': exog_1_item_1_test,
    'item_2': exog_1_item_2_test,
    'item_3': exog_1_item_3_test
}

predictions = forecaster.predict(steps=24, exog=exog_pred_as_dict)
predictions.head(3)
# Predict with series as DataFrame and exog as dict
# ==============================================================================
forecaster.fit(
    series = data_exog_train[['item_1', 'item_2', 'item_3']], 
    exog   = exog_train_as_dict
)

exog_pred_as_dict = {
    'item_1': exog_1_item_1_test,
    'item_2': exog_1_item_2_test,
    'item_3': exog_1_item_3_test
}

predictions = forecaster.predict(steps=24, exog=exog_pred_as_dict)
predictions.head(3)

Out[23]:

	item_1	item_2	item_3
2014-07-16	25.697571	11.326645	13.139564
2014-07-17	25.071002	10.679352	11.981520
2014-07-18	24.614740	11.483302	13.358778

💡 Tip

When using series with different lengths and different exogenous variables, it is recommended to use series and exog as dictionaries. This way, it is easier to manage the data and avoid errors.

Visit Global Forecasting Models: Time series with different lengths and different exogenous variables for more information.

Series Encoding in multi-series¶

When creating the training matrices, the ForecasterAutoregMultiSeries class encodes the series names to identify to which series the observations belong. Different encoding methods can be used:

'ordinal_category' (default): a single column (_level_skforecast) is created with integer values from 0 to n_series - 1. Then, the column is transformed into pandas.category dtype so that it can be used as a categorical variable.
'ordinal': a single column (_level_skforecast) is created with integer values from 0 to n_series - 1.
'onehot', a binary column is created for each series.

In [24]:

Copied!





# Ordinal_category encoding
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 3,
                 encoding  = 'ordinal_category'
             )

X, y = forecaster.create_train_X_y(series=data_train)

display(X.head(3))
print("")
print(X.dtypes)
print("")
print(X['_level_skforecast'].value_counts())
# Ordinal_category encoding
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 3,
                 encoding  = 'ordinal_category'
             )

X, y = forecaster.create_train_X_y(series=data_train)

display(X.head(3))
print("")
print(X.dtypes)
print("")
print(X['_level_skforecast'].value_counts())

	lag_1	lag_2	lag_3	_level_skforecast
date
2012-01-04	1.979794	0.117764	-5.550607	0
2012-01-05	1.334476	1.979794	0.117764	0
2012-01-06	-0.428047	1.334476	1.979794	0

lag_1                 float64
lag_2                 float64
lag_3                 float64
_level_skforecast    category
dtype: object

_level_skforecast
0    924
1    924
2    924
Name: count, dtype: int64

In [25]:

Copied!





# Ordinal encoding
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 3,
                 encoding  = 'ordinal'
             )

X, y = forecaster.create_train_X_y(series=data_train)

display(X.head(3))
print("")
print(X.dtypes)
print("")
print(X['_level_skforecast'].value_counts())
# Ordinal encoding
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 3,
                 encoding  = 'ordinal'
             )

X, y = forecaster.create_train_X_y(series=data_train)

display(X.head(3))
print("")
print(X.dtypes)
print("")
print(X['_level_skforecast'].value_counts())

	lag_1	lag_2	lag_3	_level_skforecast
date
2012-01-04	1.979794	0.117764	-5.550607	0
2012-01-05	1.334476	1.979794	0.117764	0
2012-01-06	-0.428047	1.334476	1.979794	0

lag_1                float64
lag_2                float64
lag_3                float64
_level_skforecast      int32
dtype: object

_level_skforecast
0    924
1    924
2    924
Name: count, dtype: int64

In [26]:

Copied!





# Onehot encoding (one column per series)
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 3,
                 encoding  = 'onehot'
             )

X, y = forecaster.create_train_X_y(series=data_train)

display(X.head(3))
print("")
print(X.dtypes)
print("")
print(X['item_1'].value_counts())
# Onehot encoding (one column per series)
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 3,
                 encoding  = 'onehot'
             )

X, y = forecaster.create_train_X_y(series=data_train)

display(X.head(3))
print("")
print(X.dtypes)
print("")
print(X['item_1'].value_counts())

	lag_1	lag_2	lag_3	item_1	item_2	item_3
date
2012-01-04	1.979794	0.117764	-5.550607	1	0	0
2012-01-05	1.334476	1.979794	0.117764	1	0	0
2012-01-06	-0.428047	1.334476	1.979794	1	0	0

lag_1     float64
lag_2     float64
lag_3     float64
item_1      int32
item_2      int32
item_3      int32
dtype: object

item_1
0    1848
1     924
Name: count, dtype: int64

Weights in multi-series¶

The weights are used to control the influence that each observation has on the training of the model. ForecasterAutoregMultiseries accepts two types of weights:

series_weights controls the relative importance of each series. If a series has twice as much weight as the others, the observations of that series influence the training twice as much. The higher the weight of a series relative to the others, the more the model will focus on trying to learn that series.
weight_func controls the relative importance of each observation according to its index value. For example, a function that assigns a lower weight to certain dates.

If the two types of weights are indicated, they are multiplied to create the final weights. The resulting sample_weight cannot have negative values.

No description has been provided for this image
Weights in multi-series.

series_weights is a dict of the form {'series_column_name': float}. If a series is used during fit and is not present in series_weights, it will have a weight of 1.
weight_func is a function that defines the individual weights of each sample based on the index.
- If it is a callable, the same function will apply to all series.
- If it is a dict of the form {'series_column_name': callable}, a different function can be used for each series. A weight of 1 is given to all series not present in weight_func.

In [27]:

Copied!





# Weights in Multi-Series
# ==============================================================================
def custom_weights(index):
    """
    Return 0 if index is between '2013-01-01' and '2013-01-31', 1 otherwise.
    """
    weights = np.where(
                  (index >= '2013-01-01') & (index <= '2013-01-31'),
                   0,
                   1
              )
    
    return weights

forecaster = ForecasterAutoregMultiSeries(
                 regressor          = RandomForestRegressor(random_state=123),
                 lags               = 24,
                 encoding           = 'ordinal',
                 transformer_series = StandardScaler(),
                 transformer_exog   = None,
                 weight_func        = custom_weights,
                 series_weights     = {'item_1': 1., 'item_2': 2., 'item_3': 1.} # Same as {'item_2': 2.}
             )

forecaster.fit(series=data_train)
forecaster.predict(steps=24).head(3)
# Weights in Multi-Series
# ==============================================================================
def custom_weights(index):
    """
    Return 0 if index is between '2013-01-01' and '2013-01-31', 1 otherwise.
    """
    weights = np.where(
                  (index >= '2013-01-01') & (index <= '2013-01-31'),
                   0,
                   1
              )
    
    return weights

forecaster = ForecasterAutoregMultiSeries(
                 regressor          = RandomForestRegressor(random_state=123),
                 lags               = 24,
                 encoding           = 'ordinal',
                 transformer_series = StandardScaler(),
                 transformer_exog   = None,
                 weight_func        = custom_weights,
                 series_weights     = {'item_1': 1., 'item_2': 2., 'item_3': 1.} # Same as {'item_2': 2.}
             )

forecaster.fit(series=data_train)
forecaster.predict(steps=24).head(3)

Out[27]:

	item_1	item_2	item_3
2014-07-16	25.944148	11.454737	11.069946
2014-07-17	25.790280	11.181609	11.958527
2014-07-18	25.531977	11.185339	12.064547

⚠ Warning

The weight_func and series_weights arguments will be ignored if the regressor does not accept sample_weight in its fit method.

The source code of the weight_func added to the forecaster is stored in the argument source_code_weight_func. If weight_func is a dict, it will be a dict of the form {'series_column_name': source_code_weight_func} .

In [28]:

Copied!

# Source code weight function
# ==============================================================================
print(forecaster.source_code_weight_func)
# Source code weight function
# ==============================================================================
print(forecaster.source_code_weight_func)

def custom_weights(index):
    """
    Return 0 if index is between '2013-01-01' and '2013-01-31', 1 otherwise.
    """
    weights = np.where(
                  (index >= '2013-01-01') & (index <= '2013-01-31'),
                   0,
                   1
              )
    
    return weights

Differentiation¶

Time series differentiation involves computing the differences between consecutive observations in the time series. When it comes to training forecasting models, differentiation offers the advantage of focusing on relative rates of change rather than directly attempting to model the absolute values. Once the predictions have been estimated, this transformation can be easily reversed to restore the values to their original scale.

💡 Tip

To learn more about modeling time series differentiation, visit our example: Modelling time series trend with tree based models.

In [29]:

Copied!





# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor       = RandomForestRegressor(random_state=123),
                 lags            = 24,
                 differentiation = 1
             )

forecaster.fit(series=data_train)
forecaster
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor       = RandomForestRegressor(random_state=123),
                 lags            = 24,
                 differentiation = 1
             )

forecaster.fit(series=data_train)
forecaster

Out[29]:

============================ 
ForecasterAutoregMultiSeries 
============================ 
Regressor: RandomForestRegressor(random_state=123) 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] 
Transformer for series: StandardScaler() 
Transformer for exog: None 
Series encoding: ordinal_category 
Window size: 24 
Series levels (names): ['item_1', 'item_2', 'item_3'] 
Series weights: None 
Weight function included: False 
Differentiation order: 1 
Exogenous included: False 
Type of exogenous variable: None 
Exogenous variables names: None 
Training range: ["'item_1': ['2012-01-01', '2014-07-15']", "'item_2': ['2012-01-01', '2014-07-15']", "'item_3': ['2012-01-01', '2014-07-15']"] 
Training index type: DatetimeIndex 
Training index frequency: D 
Regressor parameters: bootstrap: True, ccp_alpha: 0.0, criterion: squared_error, max_depth: None, max_features: 1.0, ... 
fit_kwargs: {} 
Creation date: 2024-05-20 14:54:30 
Last fit date: 2024-05-20 14:54:36 
Skforecast version: 0.12.0 
Python version: 3.11.5 
Forecaster id: None

In [30]:

Copied!





# Predict
# ==============================================================================
predictions = forecaster.predict(steps=24)
predictions.head(3)
# Predict
# ==============================================================================
predictions = forecaster.predict(steps=24)
predictions.head(3)

Out[30]:

	item_1	item_2	item_3
2014-07-16	26.593536	10.191353	11.010808
2014-07-17	26.380795	9.726819	10.982733
2014-07-18	26.461854	9.983902	12.822251

Feature selection in multi-series¶

Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: to simplify models to make them easier to interpret, to reduce training time, to avoid the curse of dimensionality, to improve generalization by reducing overfitting (formally, variance reduction), and others.

Skforecast is compatible with the feature selection methods implemented in the scikit-learn library. Visit Global Forecasting Models: Feature Selection for more information.

Compare multiple metrics¶

All four functions (backtesting_forecaster_multiseries, grid_search_forecaster_multiseries, random_search_forecaster_multiseries, and bayesian_search_forecaster_multiseries) allow the calculation of multiple metrics, including custom metrics, for each forecaster configuration if a list is provided.

The best model is selected based on the first metric in the list.

In [31]:

Copied!





# Grid search Multi-Series with multiple metrics
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 24,
                 encoding  = 'ordinal'
             )

def custom_metric(y_true, y_pred):
    """
    Calculate the mean absolute error using only the predicted values of the last
    3 months of the year.
    """
    mask = y_true.index.month.isin([10, 11, 12])
    metric = mean_absolute_error(y_true[mask], y_pred[mask])
    
    return metric

lags_grid = [24, 48]
param_grid = {
    'n_estimators': [10, 20],
    'max_depth': [3, 7]
}

results = grid_search_forecaster_multiseries(
              forecaster         = forecaster,
              series             = data,
              lags_grid          = lags_grid,
              param_grid         = param_grid,
              steps              = 24,
              metric             = [mean_absolute_error, custom_metric, 'mean_squared_error'],
              initial_train_size = len(data_train),
              fixed_train_size   = True,
              levels             = None,
              exog               = None,
              refit              = True,
              return_best        = False,
              n_jobs             = 'auto',
              verbose            = False,
              show_progress      = True
          )

results
# Grid search Multi-Series with multiple metrics
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 24,
                 encoding  = 'ordinal'
             )

def custom_metric(y_true, y_pred):
    """
    Calculate the mean absolute error using only the predicted values of the last
    3 months of the year.
    """
    mask = y_true.index.month.isin([10, 11, 12])
    metric = mean_absolute_error(y_true[mask], y_pred[mask])
    
    return metric

lags_grid = [24, 48]
param_grid = {
    'n_estimators': [10, 20],
    'max_depth': [3, 7]
}

results = grid_search_forecaster_multiseries(
              forecaster         = forecaster,
              series             = data,
              lags_grid          = lags_grid,
              param_grid         = param_grid,
              steps              = 24,
              metric             = [mean_absolute_error, custom_metric, 'mean_squared_error'],
              initial_train_size = len(data_train),
              fixed_train_size   = True,
              levels             = None,
              exog               = None,
              refit              = True,
              return_best        = False,
              n_jobs             = 'auto',
              verbose            = False,
              show_progress      = True
          )

results

8 models compared for 3 level(s). Number of iterations: 8.

lags grid:   0%|          | 0/2 [00:00<?, ?it/s]

params grid:   0%|          | 0/4 [00:00<?, ?it/s]

Out[31]:

	levels	lags	lags_label	params	mean_absolute_error	custom_metric	mean_squared_error	max_depth	n_estimators
3	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 7, 'n_estimators': 20}	2.346023	2.441957	10.357276	7	20
7	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 7, 'n_estimators': 20}	2.368374	2.502645	10.411020	7	20
2	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 7, 'n_estimators': 10}	2.381166	2.486796	10.616470	7	10
6	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 7, 'n_estimators': 10}	2.381476	2.499558	10.586728	7	10
1	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 3, 'n_estimators': 20}	2.460152	2.555881	11.323336	3	20
4	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 3, 'n_estimators': 10}	2.490342	2.622111	11.340823	3	10
5	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 3, 'n_estimators': 20}	2.490899	2.597174	11.130521	3	20
0	[item_1, item_2, item_3]	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_depth': 3, 'n_estimators': 10}	2.561160	2.688161	12.524154	3	10