Recursive multi-step forecasting with exogenous variables¶
All forecasters allow to include exogenous variables as predictors as long as their future values are known. These variables must be included during the prediction process.
  Note
When using exogenous variables, their values should be aligned so that y[i] is regressed on exog[i].  Warning
When including exogenous variables in a forecasting model, it is being assumed that all exogenous inputs are known into the future. Do not include exogenous variables as predictors if their future value will not be known when doing predictions.Libraries¶
In [9]:
Copied!
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
Data¶
In [10]:
Copied!
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o_exog.csv')
data = pd.read_csv(url, sep=',', header=0, names=['datetime', 'y', 'exog_1', 'exog_2'])
# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data.sort_index()
# Plot
# ==============================================================================
fig, ax=plt.subplots(figsize=(9, 4))
data.plot(ax=ax);
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o_exog.csv')
data = pd.read_csv(url, sep=',', header=0, names=['datetime', 'y', 'exog_1', 'exog_2'])
# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data.sort_index()
# Plot
# ==============================================================================
fig, ax=plt.subplots(figsize=(9, 4))
data.plot(ax=ax);
In [11]:
Copied!
# Split train-test
# ==============================================================================
steps = 36
data_train = data.iloc[:-steps, :]
data_test = data.iloc[-steps:, :]
# Split train-test
# ==============================================================================
steps = 36
data_train = data.iloc[:-steps, :]
data_test = data.iloc[-steps:, :]
Train forecaster¶
In [12]:
Copied!
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 15
)
forecaster.fit(
y = data_train['y'],
exog = data_train[['exog_1', 'exog_2']]
)
forecaster
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 15
)
forecaster.fit(
y = data_train['y'],
exog = data_train[['exog_1', 'exog_2']]
)
forecaster
Out[12]:
================= ForecasterAutoreg ================= Regressor: RandomForestRegressor(random_state=123) Lags: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] Window size: 15 Included exogenous: True Type of exogenous variable: <class 'pandas.core.frame.DataFrame'> Exogenous variables names: ['exog_1', 'exog_2'] Training range: [Timestamp('1992-04-01 00:00:00'), Timestamp('2005-06-01 00:00:00')] Training index type: DatetimeIndex Training index frequency: MS Regressor parameters: {'bootstrap': True, 'ccp_alpha': 0.0, 'criterion': 'squared_error', 'max_depth': None, 'max_features': 'auto', 'max_leaf_nodes': None, 'max_samples': None, 'min_impurity_decrease': 0.0, 'min_samples_leaf': 1, 'min_samples_split': 2, 'min_weight_fraction_leaf': 0.0, 'n_estimators': 100, 'n_jobs': None, 'oob_score': False, 'random_state': 123, 'verbose': 0, 'warm_start': False} Creation date: 2022-03-11 23:37:17 Last fit date: 2022-03-11 23:37:17 Skforecast version: 0.4.3
Prediction¶
If the Forecaster
has been trained with exogenous variables, they shlud be provided when predictiong.
In [13]:
Copied!
# Predict
# ==============================================================================
steps = 36
predictions = forecaster.predict(
steps = steps,
exog = data_test[['exog_1', 'exog_2']]
)
# Add datetime index to predictions
predictions = pd.Series(data=predictions, index=data_test.index)
predictions.head(3)
# Predict
# ==============================================================================
steps = 36
predictions = forecaster.predict(
steps = steps,
exog = data_test[['exog_1', 'exog_2']]
)
# Add datetime index to predictions
predictions = pd.Series(data=predictions, index=data_test.index)
predictions.head(3)
Out[13]:
datetime 2005-07-01 0.908832 2005-08-01 0.953925 2005-09-01 1.100887 Freq: MS, Name: pred, dtype: float64
In [14]:
Copied!
# Plot predictions
# ==============================================================================
fig, ax=plt.subplots(figsize=(9, 4))
data_train['y'].plot(ax=ax, label='train')
data_test['y'].plot(ax=ax, label='test')
predictions.plot(ax=ax, label='predictions')
ax.legend();
# Plot predictions
# ==============================================================================
fig, ax=plt.subplots(figsize=(9, 4))
data_train['y'].plot(ax=ax, label='train')
data_test['y'].plot(ax=ax, label='test')
predictions.plot(ax=ax, label='predictions')
ax.legend();
In [15]:
Copied!
# Prediction error
# ==============================================================================
error_mse = mean_squared_error(
y_true = data_test['y'],
y_pred = predictions
)
print(f"Test error (mse): {error_mse}")
# Prediction error
# ==============================================================================
error_mse = mean_squared_error(
y_true = data_test['y'],
y_pred = predictions
)
print(f"Test error (mse): {error_mse}")
Test error (mse): 0.004022228812838391
Feature importance¶
In [16]:
Copied!
forecaster.get_feature_importance()
forecaster.get_feature_importance()
Out[16]:
feature | importance | |
---|---|---|
0 | lag_1 | 0.013354 |
1 | lag_2 | 0.061120 |
2 | lag_3 | 0.009086 |
3 | lag_4 | 0.002721 |
4 | lag_5 | 0.002478 |
5 | lag_6 | 0.003155 |
6 | lag_7 | 0.002179 |
7 | lag_8 | 0.008154 |
8 | lag_9 | 0.010319 |
9 | lag_10 | 0.020587 |
10 | lag_11 | 0.007036 |
11 | lag_12 | 0.773389 |
12 | lag_13 | 0.004583 |
13 | lag_14 | 0.018127 |
14 | lag_15 | 0.008732 |
15 | exog_1 | 0.010364 |
16 | exog_2 | 0.044616 |
In [17]:
Copied!
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>
%%html