Using forecaster models in production¶

When using a trained model in production, regular predictions need to be generated, for example, on a weekly basis every Monday. By default, the predict method on a trained forecaster object generates predictions starting right after the last training observation. Therefore, the model could be retrained weekly, just before the first prediction is needed, and its predict method called. However, this approach may not be practical due to reasons such as: expensive model training, unavailability of the training data history, or high prediction frequency.

In such scenarios, the model must be able to predict at any time, even if it has not been recently trained. Fortunately, every model generated using skforecast has the last_window argument in its predict method. This argument allows providing only the past values needed to create the autoregressive predictors (lags or custom predictors), enabling prediction without the need to retrain the model. This feature is particularly useful when there are limitations in retraining the model regularly or when dealing with high-frequency predictions.

In [1]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# Libraries
# ==============================================================================
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
# Libraries
# ==============================================================================
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg

In [2]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# Download data
# ==============================================================================
url = (
    'https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/'
    'data/h2o.csv'
)
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])

# Data preprocessing
# ==============================================================================
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data = data.set_index('date')
data = data.asfreq('MS')
data_train = data.loc[:'2005-01-01']
data_train.tail()
# Download data
# ==============================================================================
url = (
    'https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/'
    'data/h2o.csv'
)
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])

# Data preprocessing
# ==============================================================================
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data = data.set_index('date')
data = data.asfreq('MS')
data_train = data.loc[:'2005-01-01']
data_train.tail()

Out[2]:

	y
date
2004-09-01	1.134432
2004-10-01	1.181011
2004-11-01	1.216037
2004-12-01	1.257238
2005-01-01	1.170690

In [3]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# Train forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor     = RandomForestRegressor(random_state=123),
                 lags          = 5,
                 forecaster_id = 'forecasting_series_y'
             )

forecaster.fit(y=data_train['y'])
# Train forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor     = RandomForestRegressor(random_state=123),
                 lags          = 5,
                 forecaster_id = 'forecasting_series_y'
             )

forecaster.fit(y=data_train['y'])

In [4]:

                
                    Copied!
                    
# Predict
# ==============================================================================
forecaster.predict(steps=3)
# Predict
# ==============================================================================
forecaster.predict(steps=3)

Out[4]:

2005-02-01    0.927480
2005-03-01    0.756215
2005-04-01    0.692595
Freq: MS, Name: pred, dtype: float64

As expected, predictions follow directly from the end of training data.

When the last_window argument is provided, the forecaster uses this data to generate the necessary lags as predictors and starts the prediction thereafter.

In [5]:

                
                    Copied!
                    
# Predict
# ==============================================================================
last_window = data['y'].tail(5)
forecaster.predict(steps=3, last_window=last_window)
# Predict
# ==============================================================================
last_window = data['y'].tail(5)
forecaster.predict(steps=3, last_window=last_window)

Out[5]:

2008-07-01    0.803853
2008-08-01    0.870858
2008-09-01    0.905003
Freq: MS, Name: pred, dtype: float64

Since the provided last_window contains values from 2008-02-01 to 2008-06-01, the forecaster can create the needed lags and predict the next 5 steps.

Warning

It is important to note that last_window's length must be enough to include the maximum lag used by the forecaster. For example, if the forecaster uses lags 1, 24 and 48, last_window must include the last 48 values of the series.

When using the last_window argument, it is crucial to ensure that the length of last_window is sufficient to include the maximum lag (or custom predictor) used by the forecaster. For instance, if the forecaster employs lags 1, 24, and 48, last_window must include the most recent 48 values of the series. Failing to include the required number of past observations may result in an error or incorrect predictions.

In [6]:

                
                    Copied!
                    
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>
%%html