Understanding the forecaster attributes¶
During the process of creating and training a forecaster, the object stores a lot of information in its attributes that can be useful to the user. We will explore the main attributes included in a ForecasterAutoreg
, but this can be extrapolated to any of the skforecast forecasters.
Create and train a forecaster¶
To be able to create and train a forecaster, at least regressor
and lags
must be specified.
# Libraries
# ==============================================================================
import pandas as pd
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from sklearn.ensemble import RandomForestRegressor
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'datetime'])
# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y-%m-%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data.sort_index()
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 5
)
forecaster.fit(y=data['y'])
forecaster
================= ForecasterAutoreg ================= Regressor: RandomForestRegressor(random_state=123) Lags: [1 2 3 4 5] Transformer for y: None Transformer for exog: None Window size: 5 Weight function included: False Differentiation order: None Exogenous included: False Type of exogenous variable: None Exogenous variables names: None Training range: [Timestamp('1991-07-01 00:00:00'), Timestamp('2008-06-01 00:00:00')] Training index type: DatetimeIndex Training index frequency: MS Regressor parameters: {'bootstrap': True, 'ccp_alpha': 0.0, 'criterion': 'squared_error', 'max_depth': None, 'max_features': 1.0, 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'monotonic_cst': None, 'n_estimators': 100, 'n_jobs': None, 'oob_score': False, 'random_state': 123, 'verbose': 0, 'warm_start': False} fit_kwargs: {} Creation date: 2024-05-05 12:08:01 Last fit date: 2024-05-05 12:08:01 Skforecast version: 0.12.0 Python version: 3.11.5 Forecaster id: None
# List of attributes
# ==============================================================================
for attribute, value in forecaster.__dict__.items():
print(attribute)
regressor transformer_y transformer_exog weight_func source_code_weight_func differentiation differentiator last_window index_type index_freq training_range included_exog exog_type exog_dtypes exog_col_names X_train_col_names in_sample_residuals out_sample_residuals in_sample_residuals_by_bin out_sample_residuals_by_bin fitted creation_date fit_date skforecast_version python_version forecaster_id lags max_lag window_size window_size_diff binner_kwargs binner binner_intervals fit_kwargs
Regressor¶
Skforecast is a Python library that facilitates using scikit-learn regressors as multi-step forecasters and also works with any regressor compatible with the scikit-learn API.
# Forecaster regressor
# ==============================================================================
forecaster.regressor
RandomForestRegressor(random_state=123)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
RandomForestRegressor(random_state=123)
# Show regressor parameters
# ==============================================================================
forecaster.regressor.get_params(deep=True)
{'bootstrap': True, 'ccp_alpha': 0.0, 'criterion': 'squared_error', 'max_depth': None, 'max_features': 1.0, 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'monotonic_cst': None, 'n_estimators': 100, 'n_jobs': None, 'oob_score': False, 'random_state': 123, 'verbose': 0, 'warm_start': False}
Note
In the forecasters that follows a Direct Strategy, one instance of the regressor is trained for each step. All of them are stored in self.regressors_
Lags¶
Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1.
# Forecaster lags
# ==============================================================================
forecaster.lags
array([1, 2, 3, 4, 5])
Last window¶
Last window the forecaster has seen during training. It stores the values needed to predict the next step
immediately after the training data.
# Forecaster last window
# ==============================================================================
forecaster.last_window
datetime 2008-02-01 0.761822 2008-03-01 0.649435 2008-04-01 0.827887 2008-05-01 0.816255 2008-06-01 0.762137 Freq: MS, Name: y, dtype: float64
Tip
Learn how to get your forecasters into production and get the most out of them with last_window
. Using forecasting models in production.
Window size¶
The size of the data window needed to create the predictors. It is equal to forecaster.max_lag
.
# Forecaster window size
# ==============================================================================
forecaster.window_size
5
In-sample residuals¶
Residuals from models predicting training data. Only stored up to 1000 values. If transformer_series
is not None
, the residuals are stored in the transformed scale.
Note
In the forecasters that follows a Direct Strategy and in the
Independent multi-series forecasting this parameter is a dict
containing the residuals for each regressor/serie.
# Forecaster in-sample residuals
# ==============================================================================
print("Length:", len(forecaster.in_sample_residuals))
forecaster.in_sample_residuals[:5]
Length: 199
array([-0.10463712, -0.03318622, -0.00291908, -0.02020715, 0.01054163])
Out-of-sample residuals¶
Residuals from models predicting non training data. Only stored up to 1000 values. If transformer_y
is not None
, residuals are assumed to be in the transformed scale. Use set_out_sample_residuals
method to set values.
As no values have been added, the parameter is None
.
# Forecaster out-of-sample residuals
# ==============================================================================
forecaster.out_sample_residuals