Use forecaster in production¶
A trained model may be deployed in production to generate predictions regularly. Suppose predictions have to be generated on a weekly basis, for example, every Monday. By default, when using the predict
method on a trained forecaster object, predictions start right after the last training observation. Therefore, the model could be retrained weekly, just before the first prediction is needed, and call its predict method. This strategy, although simple, may not be possible to use for several reasons:
Model training is very expensive and cannot be run as often.
The history with which the model was trained is no longer available.
The prediction frequency is so high that there is no time to train the model between predictions.
In these scenarios, the model must be able to predict at any time, even if it has not been recently trained.
Every model generated using skforecast has the last_window
argument in its predict
method. Using this argument, it is possible to provide only the past values needs to create the autoregressive predictors (lags) and thus, generate the predictions without the need to retrain the model.
# Libraries
# ==============================================================================
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])
# Data preprocessing
# ==============================================================================
data['date'] = pd.to_datetime(data['date'], format='%Y/%m/%d')
data = data.set_index('date')
data = data.asfreq('MS')
data_train = data.loc[:'2005-01-01']
data_train.tail()
y | |
---|---|
date | |
2004-09-01 | 1.134432 |
2004-10-01 | 1.181011 |
2004-11-01 | 1.216037 |
2004-12-01 | 1.257238 |
2005-01-01 | 1.170690 |
# Train forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 5
)
forecaster.fit(y=data_train['y'])
# Predict
# ==============================================================================
forecaster.predict(steps=3)
2005-02-01 0.927480 2005-03-01 0.756215 2005-04-01 0.692595 Freq: MS, Name: pred, dtype: float64
As expected, predictions follow directly from the end of training data.
When last window
is provided, the forecaster uses this data to generate the lags needed as predictors and starts the prediction afterward.
# Predict
# ==============================================================================
forecaster.predict(steps=3, last_window=data['y'].tail(5))
2008-07-01 0.803853 2008-08-01 0.870858 2008-09-01 0.905003 Freq: MS, Name: pred, dtype: float64
Since the provided last_window
contains values from 2008-02-01 to 2008-06-01, the forecaster can create the needed lags and predict the next 5 steps.
  Warning
It is important to note that `last_window`'s length must be enough to include the maximum lag used by the forecaster. For example, if the forecaster uses lags 1, 24, 48, `last_window` must include the last 72 values of the series.%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>