Using forecaster models in production¶

When using a trained model in production, regular predictions need to be generated, for example, on a weekly basis every Monday. By default, the predict method on a trained forecaster object generates predictions starting right after the last training observation. Therefore, the model could be retrained weekly, just before the first prediction is needed, and its predict method called. However, this approach may not be practical due to reasons such as: expensive model training, unavailability of the training data history, or high prediction frequency.

In such scenarios, the model must be able to predict at any time, even if it has not been recently trained. Fortunately, every model generated using skforecast has the last_window argument in its predict method. This argument allows providing only the past values needed to create the autoregressive predictors (lags or custom predictors), enabling prediction without the need to retrain the model. This feature is particularly useful when there are limitations in retraining the model regularly or when dealing with high-frequency predictions.

Libraries¶

In [1]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# Libraries
# ==============================================================================
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-darkgrid')
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.utils import save_forecaster
from skforecast.utils import load_forecaster
# Libraries
# ==============================================================================
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-darkgrid')
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.utils import save_forecaster
from skforecast.utils import load_forecaster

Data¶

In [2]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# Download data
# ==============================================================================
url = (
    'https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/'
    'data/h2o.csv'
)
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])

# Data preprocessing
# ==============================================================================
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data = data.set_index('date')
data = data.asfreq('MS')
data_train = data.loc[:'2005-01-01']
data_train.tail()
# Download data
# ==============================================================================
url = (
    'https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/'
    'data/h2o.csv'
)
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])

# Data preprocessing
# ==============================================================================
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data = data.set_index('date')
data = data.asfreq('MS')
data_train = data.loc[:'2005-01-01']
data_train.tail()

Out[2]:

	y
date
2004-09-01	1.134432
2004-10-01	1.181011
2004-11-01	1.216037
2004-12-01	1.257238
2005-01-01	1.170690

Predicting with last window¶

In [3]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# Train forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor     = RandomForestRegressor(random_state=123),
                 lags          = 5,
                 forecaster_id = 'forecasting_series_y'
             )

forecaster.fit(y=data_train['y'])
# Train forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor     = RandomForestRegressor(random_state=123),
                 lags          = 5,
                 forecaster_id = 'forecasting_series_y'
             )

forecaster.fit(y=data_train['y'])

In [4]:

                
                    Copied!
                    
# Predict
# ==============================================================================
forecaster.predict(steps=3)
# Predict
# ==============================================================================
forecaster.predict(steps=3)

Out[4]:

2005-02-01    0.927480
2005-03-01    0.756215
2005-04-01    0.692595
Freq: MS, Name: pred, dtype: float64

As expected, predictions follow directly from the end of training data.

When the last_window argument is provided, the forecaster uses this data to generate the necessary lags as predictors and starts the prediction thereafter.

In [5]:

                
                    Copied!
                    
# Predict with last_window
# ==============================================================================
last_window = data['y'].tail(5)
forecaster.predict(steps=3, last_window=last_window)
# Predict with last_window
# ==============================================================================
last_window = data['y'].tail(5)
forecaster.predict(steps=3, last_window=last_window)

Out[5]:

2008-07-01    0.803853
2008-08-01    0.870858
2008-09-01    0.905003
Freq: MS, Name: pred, dtype: float64

Since the provided last_window contains values from 2008-02-01 to 2008-06-01, the forecaster can create the needed lags and predict the next 5 steps.

Warning

It is important to note that last_window's length must be enough to include the maximum lag used by the forecaster. For example, if the forecaster uses lags 1, 24 and 48, last_window must include the last 48 values of the series.

When using the last_window argument, it is crucial to ensure that the length of last_window is sufficient to include the maximum lag (or custom predictor) used by the forecaster. For instance, if the forecaster employs lags 1, 24, and 48, last_window must include the most recent 48 values of the series. Failing to include the required number of past observations may result in an error or incorrect predictions.

Real Use Case¶

The main advantage of using the last_window argument is that it can be used to predict at any time, even if the Forecaster has not been trained recently.

Imagine a use case where a model is trained, stored, and 1 year later the company wants to use it to make some predictions.

Data¶

A gap is created between the end of the training data and the last window data to simulate this behavior.

In [6]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# Split data
# ==============================================================================
data_train = data.loc[:'2005-01-01'].copy()
data_last_window = data.loc['2005-08-01':'2005-12-01'].copy()

print(
    f"Train dates       : {data_train.index.min()} --- {data_train.index.max()}"
    f"  (n={len(data_train)})"
)
print(
    f"Last window dates : {data_last_window.index.min()} --- {data_last_window.index.max()}"
    f"  (n={len(last_window)})"
)
# Split data
# ==============================================================================
data_train = data.loc[:'2005-01-01'].copy()
data_last_window = data.loc['2005-08-01':'2005-12-01'].copy()

print(
    f"Train dates       : {data_train.index.min()} --- {data_train.index.max()}"
    f"  (n={len(data_train)})"
)
print(
    f"Last window dates : {data_last_window.index.min()} --- {data_last_window.index.max()}"
    f"  (n={len(last_window)})"
)

Train dates       : 1991-07-01 00:00:00 --- 2005-01-01 00:00:00  (n=163)
Last window dates : 2005-08-01 00:00:00 --- 2005-12-01 00:00:00  (n=5)

In [7]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# Plot time series partition
# ==============================================================================
fig, ax = plt.subplots(figsize=(9, 3))
data_train['y'].plot(label='train', ax=ax)
data_last_window['y'].plot(label='last window', ax=ax)
ax.set_xlabel('')
ax.legend();
# Plot time series partition
# ==============================================================================
fig, ax = plt.subplots(figsize=(9, 3))
data_train['y'].plot(label='train', ax=ax)
data_last_window['y'].plot(label='last window', ax=ax)
ax.set_xlabel('')
ax.legend();

Forecaster initial train¶

In [8]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# Train forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor     = RandomForestRegressor(random_state=123),
                 lags          = 5,
                 forecaster_id = 'forecasting_series_y'
             )

forecaster.fit(y=data_train['y'])
# Train forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor     = RandomForestRegressor(random_state=123),
                 lags          = 5,
                 forecaster_id = 'forecasting_series_y'
             )

forecaster.fit(y=data_train['y'])

The size of the window needed to make predictions and the last window stored in the forecaster to make predictions immediately after the training data can be observed with the window_size and last_window attributes.

In [9]:

                
                    Copied!
                    
# Forecaster Attributes
# ==============================================================================
print('Window size:', forecaster.window_size)
print("Forecaster last window:")
print(forecaster.last_window)
# Forecaster Attributes
# ==============================================================================
print('Window size:', forecaster.window_size)
print("Forecaster last window:")
print(forecaster.last_window)

Window size: 5
Forecaster last window:
date
2004-09-01    1.134432
2004-10-01    1.181011
2004-11-01    1.216037
2004-12-01    1.257238
2005-01-01    1.170690
Freq: MS, Name: y, dtype: float64

The model is saved for future use.

Note

Learn more about how to Save and load forecasters.

In [10]:

                
                    Copied!
                    
# Save Forecaster
# ==============================================================================
save_forecaster(forecaster, file_name='forecaster_001.py', verbose=False)
# Save Forecaster
# ==============================================================================
save_forecaster(forecaster, file_name='forecaster_001.py', verbose=False)

Future predictions¶

The model is loaded to make new predictions.

Tip

Since the Forecaster has already been trained, there is no need to re-fit the model.

In [11]:

                
                    Copied!
                    
# Load Forecaster
# ==============================================================================
forecaster_loaded = load_forecaster('forecaster_001.py', verbose=True)
# Load Forecaster
# ==============================================================================
forecaster_loaded = load_forecaster('forecaster_001.py', verbose=True)

================= 
ForecasterAutoreg 
================= 
Regressor: RandomForestRegressor(random_state=123) 
Lags: [1 2 3 4 5] 
Transformer for y: None 
Transformer for exog: None 
Window size: 5 
Weight function included: False 
Exogenous included: False 
Type of exogenous variable: None 
Exogenous variables names: None 
Training range: [Timestamp('1991-07-01 00:00:00'), Timestamp('2005-01-01 00:00:00')] 
Training index type: DatetimeIndex 
Training index frequency: MS 
Regressor parameters: {'bootstrap': True, 'ccp_alpha': 0.0, 'criterion': 'squared_error', 'max_depth': None, 'max_features': 1.0, 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 100, 'n_jobs': None, 'oob_score': False, 'random_state': 123, 'verbose': 0, 'warm_start': False} 
fit_kwargs: {} 
Creation date: 2023-07-28 21:16:26 
Last fit date: 2023-07-28 21:16:26 
Skforecast version: 0.9.1 
Python version: 3.10.11 
Forecaster id: forecasting_series_y

The forecaster's training range ends at '2005-01-01'. Using a last_window, the forecaster will be able to make predictions for '2006-01-01', 1 year later, without having to re-fit the model.

In [12]:

                
                    Copied!
                    
# 1 year later last window
# ==============================================================================
data_last_window
# 1 year later last window
# ==============================================================================
data_last_window

Out[12]:

	y
date
2005-08-01	1.006497
2005-09-01	1.094736
2005-10-01	1.027043
2005-11-01	1.149232
2005-12-01	1.160712

In [13]:

                
                    Copied!
                    
# Predict with last_window
# ==============================================================================
forecaster.predict(steps=3, last_window=data_last_window['y'])
# Predict with last_window
# ==============================================================================
forecaster.predict(steps=3, last_window=data_last_window['y'])

Out[13]:

2006-01-01    0.979303
2006-02-01    0.760421
2006-03-01    0.634806
Freq: MS, Name: pred, dtype: float64

In [14]:

                
                    Copied!
                    
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>
%%html