Using forecaster models in production¶
When using a trained model in production, regular predictions need to be generated, for example, on a weekly basis every Monday. By default, the predict
method on a trained forecaster object generates predictions starting right after the last training observation. Therefore, the model could be retrained weekly, just before the first prediction is needed, and its predict method called. However, this approach may not be practical due to reasons such as: expensive model training, unavailability of the training data history, or high prediction frequency.
In such scenarios, the model must be able to predict at any time, even if it has not been recently trained. Fortunately, every model generated using skforecast has the last_window
argument in its predict
method. This argument allows providing only the past values needed to create the autoregressive predictors (lags or custom predictors), enabling prediction without the need to retrain the model. This feature is particularly useful when there are limitations in retraining the model regularly or when dealing with high-frequency predictions.
# Libraries
# ==============================================================================
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
# Download data
# ==============================================================================
url = (
'https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/'
'data/h2o.csv'
)
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])
# Data preprocessing
# ==============================================================================
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data = data.set_index('date')
data = data.asfreq('MS')
data_train = data.loc[:'2005-01-01']
data_train.tail()
y | |
---|---|
date | |
2004-09-01 | 1.134432 |
2004-10-01 | 1.181011 |
2004-11-01 | 1.216037 |
2004-12-01 | 1.257238 |
2005-01-01 | 1.170690 |
# Train forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 5,
forecaster_id = 'forecasting_series_y'
)
forecaster.fit(y=data_train['y'])
# Predict
# ==============================================================================
forecaster.predict(steps=3)
2005-02-01 0.927480 2005-03-01 0.756215 2005-04-01 0.692595 Freq: MS, Name: pred, dtype: float64
As expected, predictions follow directly from the end of training data.
When the last_window
argument is provided, the forecaster uses this data to generate the necessary lags as predictors and starts the prediction thereafter.
# Predict
# ==============================================================================
last_window = data['y'].tail(5)
forecaster.predict(steps=3, last_window=last_window)
2008-07-01 0.803853 2008-08-01 0.870858 2008-09-01 0.905003 Freq: MS, Name: pred, dtype: float64
Since the provided last_window
contains values from 2008-02-01 to 2008-06-01, the forecaster can create the needed lags and predict the next 5 steps.
  Warning
It is important to note that last_window
's length must be enough to include the maximum lag used by the forecaster. For example, if the forecaster uses lags 1, 24 and 48, last_window
must include the last 48 values of the series.
When using the last_window
argument, it is crucial to ensure that the length of last_window
is sufficient to include the maximum lag (or custom predictor) used by the forecaster. For instance, if the forecaster employs lags 1, 24, and 48, last_window
must include the most recent 48 values of the series. Failing to include the required number of past observations may result in an error or incorrect predictions.
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>