Understanding the forecaster attributes¶

During the process of creating and training a forecaster, the object stores a lot of information in its attributes that can be useful to the user. We will explore the main attributes included in a ForecasterAutoreg, but this can be extrapolated to any of the skforecast forecasters.

Create and train a forecaster¶

To be able to create and train a forecaster, at least regressor and lags must be specified.

In [1]:

Copied!





# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from sklearn.ensemble import RandomForestRegressor
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from sklearn.ensemble import RandomForestRegressor

In [2]:

Copied!





# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'datetime'])

# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y-%m-%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data.sort_index()

# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 5
             )

forecaster.fit(y=data['y'])
forecaster
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'datetime'])

# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y-%m-%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data.sort_index()

# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 5
             )

forecaster.fit(y=data['y'])
forecaster

Out[2]:

================= 
ForecasterAutoreg 
================= 
Regressor: RandomForestRegressor(random_state=123) 
Lags: [1 2 3 4 5] 
Transformer for y: None 
Transformer for exog: None 
Window size: 5 
Weight function included: False 
Differentiation order: None 
Exogenous included: False 
Type of exogenous variable: None 
Exogenous variables names: None 
Training range: [Timestamp('1991-07-01 00:00:00'), Timestamp('2008-06-01 00:00:00')] 
Training index type: DatetimeIndex 
Training index frequency: MS 
Regressor parameters: {'bootstrap': True, 'ccp_alpha': 0.0, 'criterion': 'squared_error', 'max_depth': None, 'max_features': 1.0, 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'monotonic_cst': None, 'n_estimators': 100, 'n_jobs': None, 'oob_score': False, 'random_state': 123, 'verbose': 0, 'warm_start': False} 
fit_kwargs: {} 
Creation date: 2024-05-27 12:14:21 
Last fit date: 2024-05-27 12:14:21 
Skforecast version: 0.12.1 
Python version: 3.11.5 
Forecaster id: None

In [3]:

Copied!





# List of attributes
# ==============================================================================
for attribute, value in forecaster.__dict__.items():
    print(attribute)
# List of attributes
# ==============================================================================
for attribute, value in forecaster.__dict__.items():
    print(attribute)

regressor
transformer_y
transformer_exog
weight_func
source_code_weight_func
differentiation
differentiator
last_window
index_type
index_freq
training_range
included_exog
exog_type
exog_dtypes
exog_col_names
X_train_col_names
in_sample_residuals
out_sample_residuals
in_sample_residuals_by_bin
out_sample_residuals_by_bin
fitted
creation_date
fit_date
skforecast_version
python_version
forecaster_id
lags
max_lag
window_size
window_size_diff
binner_kwargs
binner
binner_intervals
fit_kwargs

Regressor¶

Skforecast is a Python library that facilitates using scikit-learn regressors as multi-step forecasters and also works with any regressor compatible with the scikit-learn API.

In [4]:

Copied!

# Forecaster regressor
# ==============================================================================
forecaster.regressor
# Forecaster regressor
# ==============================================================================
forecaster.regressor

Out[4]:

RandomForestRegressor(random_state=123)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

In [5]:

Copied!

# Show regressor parameters
# ==============================================================================
forecaster.regressor.get_params(deep=True)
# Show regressor parameters
# ==============================================================================
forecaster.regressor.get_params(deep=True)

Out[5]:

{'bootstrap': True,
 'ccp_alpha': 0.0,
 'criterion': 'squared_error',
 'max_depth': None,
 'max_features': 1.0,
 'max_leaf_nodes': None,
 'max_samples': None,
 'min_impurity_decrease': 0.0,
 'min_samples_leaf': 1,
 'min_samples_split': 2,
 'min_weight_fraction_leaf': 0.0,
 'monotonic_cst': None,
 'n_estimators': 100,
 'n_jobs': None,
 'oob_score': False,
 'random_state': 123,
 'verbose': 0,
 'warm_start': False}

✎ Note

In the forecasters that follows a Direct Strategy, one instance of the regressor is trained for each step. All of them are stored in self.regressors_

Lags¶

Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1.

In [6]:

Copied!

# Forecaster lags
# ==============================================================================
forecaster.lags
# Forecaster lags
# ==============================================================================
forecaster.lags

Out[6]:

array([1, 2, 3, 4, 5])

Last window¶

Last window the forecaster has seen during training. It stores the values needed to predict the next step immediately after the training data.

In [7]:

Copied!

# Forecaster last window
# ==============================================================================
forecaster.last_window
# Forecaster last window
# ==============================================================================
forecaster.last_window

Out[7]:

datetime
2008-02-01    0.761822
2008-03-01    0.649435
2008-04-01    0.827887
2008-05-01    0.816255
2008-06-01    0.762137
Freq: MS, Name: y, dtype: float64

💡 Tip

Learn how to get your forecasters into production and get the most out of them with last_window. Using forecasting models in production.

Window size¶

The size of the data window needed to create the predictors. It is equal to forecaster.max_lag.

In [8]:

Copied!

# Forecaster window size
# ==============================================================================
forecaster.window_size
# Forecaster window size
# ==============================================================================
forecaster.window_size

Out[8]:

In-sample residuals¶

Residuals from models predicting training data. Only stored up to 1000 values. If transformer_series is not None, the residuals are stored in the transformed scale.

✎ Note

In the forecasters that follows a Direct Strategy and in the Global Forecasting Models: Independent multi-series forecasting this parameter is a dict containing the residuals for each regressor/serie.

In [9]:

Copied!





# Forecaster in-sample residuals
# ==============================================================================
print("Length:", len(forecaster.in_sample_residuals))
forecaster.in_sample_residuals[:5]
# Forecaster in-sample residuals
# ==============================================================================
print("Length:", len(forecaster.in_sample_residuals))
forecaster.in_sample_residuals[:5]

Length: 199

Out[9]:

array([-0.10463712, -0.03318622, -0.00291908, -0.02020715,  0.01054163])

Out-of-sample residuals¶

Residuals from models predicting non training data. Only stored up to 1000 values. If transformer_y is not None, residuals are assumed to be in the transformed scale. Use set_out_sample_residuals method to set values.

As no values have been added, the parameter is None.

In [10]:

Copied!

# Forecaster out-of-sample residuals
# ==============================================================================
forecaster.out_sample_residuals
# Forecaster out-of-sample residuals
# ==============================================================================
forecaster.out_sample_residuals