Using forecaster models in production¶
When using a trained model in production, regular predictions need to be generated, for example, on a weekly basis every Monday. By default, the predict
method on a trained forecaster object generates predictions starting right after the last training observation. Therefore, the model could be retrained weekly, just before the first prediction is needed, and its predict method called. However, this approach may not be practical due to reasons such as: expensive model training, unavailability of the training data history, or high prediction frequency.
In such scenarios, the model must be able to predict at any time, even if it has not been recently trained. Fortunately, every model generated using skforecast has the last_window
argument in its predict
method. This argument allows providing only the past values needed to create the autoregressive predictors (lags or custom predictors), enabling prediction without the need to retrain the model. This feature is particularly useful when there are limitations in retraining the model regularly or when dealing with high-frequency predictions.
Libraries¶
# Libraries
# ==============================================================================
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-darkgrid')
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.utils import save_forecaster
from skforecast.utils import load_forecaster
from skforecast.datasets import fetch_dataset
Data¶
# Download data
# ==============================================================================
data = fetch_dataset(
name="h2o", raw=True, kwargs_read_csv={"names": ["y", "date"], "header": 0}
)
h2o --- Monthly expenditure ($AUD) on corticosteroid drugs that the Australian health system had between 1991 and 2008. Hyndman R (2023). fpp3: Data for Forecasting: Principles and Practice(3rd Edition). http://pkg.robjhyndman.com/fpp3package/,https://github.com/robjhyndman /fpp3package, http://OTexts.com/fpp3. Shape of the dataset: (204, 2)
# Data preprocessing
# ==============================================================================
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data = data.set_index('date')
data = data.asfreq('MS')
data_train = data.loc[:'2005-01-01']
data_train.tail()
y | |
---|---|
date | |
2004-09-01 | 1.134432 |
2004-10-01 | 1.181011 |
2004-11-01 | 1.216037 |
2004-12-01 | 1.257238 |
2005-01-01 | 1.170690 |
# Data preprocessing
# ==============================================================================
data_train = data.loc[:'2005-01-01']
# data_train
Predicting with last window¶
# Train forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 5,
forecaster_id = 'forecasting_series_y'
)
forecaster.fit(y=data_train['y'])
# Predict
# ==============================================================================
forecaster.predict(steps=3)
2005-02-01 0.927480 2005-03-01 0.756215 2005-04-01 0.692595 Freq: MS, Name: pred, dtype: float64
As expected, predictions follow directly from the end of training data.
When the last_window
argument is provided, the forecaster uses this data to generate the necessary lags as predictors and starts the prediction thereafter.
# Predict with last_window
# ==============================================================================
last_window = data['y'].tail(5)
forecaster.predict(steps=3, last_window=last_window)
2008-07-01 0.803853 2008-08-01 0.870858 2008-09-01 0.905003 Freq: MS, Name: pred, dtype: float64
Since the provided last_window
contains values from 2008-02-01 to 2008-06-01, the forecaster can create the needed lags and predict the next 5 steps.
⚠ Warning
When using the last_window
argument, it is crucial to ensure that the length of last_window
is sufficient to include the maximum lag (or custom predictor) used by the forecaster. For instance, if the forecaster employs lags 1, 24, and 48, last_window
must include the most recent 48 values of the series. Failing to include the required number of past observations may result in an error or incorrect predictions.
Real Use Case¶
The main advantage of using the last_window
argument is that it can be used to predict at any time, even if the Forecaster has not been trained recently.
Imagine a use case where a model is trained, stored, and 1 year later the company wants to use it to make some predictions.
Data¶
A gap is created between the end of the training data and the last window data to simulate this behavior.
# Split data
# ==============================================================================
data_train = data.loc[:'2005-01-01'].copy()
data_last_window = data.loc['2005-08-01':'2005-12-01'].copy()
print(
f"Train dates : {data_train.index.min()} --- {data_train.index.max()}"
f" (n={len(data_train)})"
)
print(
f"Last window dates : {data_last_window.index.min()} --- {data_last_window.index.max()}"
f" (n={len(last_window)})"
)
Train dates : 1991-07-01 00:00:00 --- 2005-01-01 00:00:00 (n=163) Last window dates : 2005-08-01 00:00:00 --- 2005-12-01 00:00:00 (n=5)
# Plot time series partition
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data_train['y'].plot(label='train', ax=ax)
data_last_window['y'].plot(label='last window', ax=ax)
ax.set_xlabel('')
ax.legend();
Forecaster initial train¶
# Train forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 5,
forecaster_id = 'forecasting_series_y'
)
forecaster.fit(y=data_train['y'])
The size of the window needed to make predictions and the last window stored in the forecaster to make predictions immediately after the training data can be observed with the window_size
and last_window
attributes.
# Forecaster Attributes
# ==============================================================================
print('Window size:', forecaster.window_size)
print("Forecaster last window:")
print(forecaster.last_window)
Window size: 5 Forecaster last window: date 2004-09-01 1.134432 2004-10-01 1.181011 2004-11-01 1.216037 2004-12-01 1.257238 2005-01-01 1.170690 Freq: MS, Name: y, dtype: float64
The model is saved for future use.
✎ Note
Learn more about how to Save and load forecasters.
# Save Forecaster
# ==============================================================================
save_forecaster(forecaster, file_name='forecaster_001.py', verbose=False)
Future predictions¶
The model is loaded to make new predictions.
💡 Tip
Since the Forecaster has already been trained, there is no need to re-fit the model.
# Load Forecaster
# ==============================================================================
forecaster_loaded = load_forecaster('forecaster_001.py', verbose=True)
================= ForecasterAutoreg ================= Regressor: RandomForestRegressor(random_state=123) Lags: [1 2 3 4 5] Transformer for y: None Transformer for exog: None Window size: 5 Weight function included: False Differentiation order: None Exogenous included: False Type of exogenous variable: None Exogenous variables names: None Training range: [Timestamp('1991-07-01 00:00:00'), Timestamp('2005-01-01 00:00:00')] Training index type: DatetimeIndex Training index frequency: MS Regressor parameters: {'bootstrap': True, 'ccp_alpha': 0.0, 'criterion': 'squared_error', 'max_depth': None, 'max_features': 1.0, 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 100, 'n_jobs': None, 'oob_score': False, 'random_state': 123, 'verbose': 0, 'warm_start': False} fit_kwargs: {} Creation date: 2024-07-29 17:07:50 Last fit date: 2024-07-29 17:07:50 Skforecast version: 0.11.0 Python version: 3.10.11 Forecaster id: forecasting_series_y
The forecaster's training range ends at '2005-01-01'. Using a last_window
, the forecaster will be able to make predictions for '2006-01-01', 1 year later, without having to re-fit the model.
# 1 year later last window
# ==============================================================================
data_last_window
y | |
---|---|
date | |
2005-08-01 | 1.006497 |
2005-09-01 | 1.094736 |
2005-10-01 | 1.027043 |
2005-11-01 | 1.149232 |
2005-12-01 | 1.160712 |
# Predict with last_window
# ==============================================================================
forecaster.predict(steps=3, last_window=data_last_window['y'])
2006-01-01 0.979303 2006-02-01 0.760421 2006-03-01 0.634806 Freq: MS, Name: pred, dtype: float64