Forecasting with XGBoost¶
XGBoost, the acronym for Extreme Gradient Boosting, is a very efficient implementation of the stochastic gradient boosting algorithm that has become a benchmark in machine learning. Besides its API, the XGBoost library includes the XGBRegressor class which follows the scikit-learn API and, therefore it is compatible with skforecast.
Libraries¶
In [1]:
Copied!
# Libraries
# ==============================================================================
import pandas as pd
import matplotlib.pyplot as plt
from xgboost import XGBRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
# Libraries
# ==============================================================================
import pandas as pd
import matplotlib.pyplot as plt
from xgboost import XGBRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
Data¶
In [2]:
Copied!
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o_exog.csv')
data = pd.read_csv(url, sep=',', header=0, names=['date', 'y', 'exog_1', 'exog_2'])
# Data preprocessing
# ==============================================================================
data['date'] = pd.to_datetime(data['date'], format='%Y/%m/%d')
data = data.set_index('date')
data = data.asfreq('MS')
steps = 36
data_train = data.iloc[:-steps, :]
data_test = data.iloc[-steps:, :]
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o_exog.csv')
data = pd.read_csv(url, sep=',', header=0, names=['date', 'y', 'exog_1', 'exog_2'])
# Data preprocessing
# ==============================================================================
data['date'] = pd.to_datetime(data['date'], format='%Y/%m/%d')
data = data.set_index('date')
data = data.asfreq('MS')
steps = 36
data_train = data.iloc[:-steps, :]
data_test = data.iloc[-steps:, :]
Create and train forecaster¶
In [3]:
Copied!
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = XGBRegressor(),
lags = 8
)
forecaster.fit(y=data['y'], exog=data[['exog_1', 'exog_2']])
forecaster
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = XGBRegressor(),
lags = 8
)
forecaster.fit(y=data['y'], exog=data[['exog_1', 'exog_2']])
forecaster
Out[3]:
================= ForecasterAutoreg ================= Regressor: XGBRegressor(base_score=0.5, booster='gbtree', callbacks=None, colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, early_stopping_rounds=None, enable_categorical=False, eval_metric=None, gamma=0, gpu_id=-1, grow_policy='depthwise', importance_type=None, interaction_constraints='', learning_rate=0.300000012, max_bin=256, max_cat_to_onehot=4, max_delta_step=0, max_depth=6, max_leaves=0, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=0, num_parallel_tree=1, predictor='auto', random_state=0, reg_alpha=0, reg_lambda=1, ...) Lags: [1 2 3 4 5 6 7 8] Transformer for y: None Transformer for exog: None Window size: 8 Included exogenous: True Type of exogenous variable: <class 'pandas.core.frame.DataFrame'> Exogenous variables names: ['exog_1', 'exog_2'] Training range: [Timestamp('1992-04-01 00:00:00'), Timestamp('2008-06-01 00:00:00')] Training index type: DatetimeIndex Training index frequency: MS Regressor parameters: {'objective': 'reg:squarederror', 'base_score': 0.5, 'booster': 'gbtree', 'callbacks': None, 'colsample_bylevel': 1, 'colsample_bynode': 1, 'colsample_bytree': 1, 'early_stopping_rounds': None, 'enable_categorical': False, 'eval_metric': None, 'gamma': 0, 'gpu_id': -1, 'grow_policy': 'depthwise', 'importance_type': None, 'interaction_constraints': '', 'learning_rate': 0.300000012, 'max_bin': 256, 'max_cat_to_onehot': 4, 'max_delta_step': 0, 'max_depth': 6, 'max_leaves': 0, 'min_child_weight': 1, 'missing': nan, 'monotone_constraints': '()', 'n_estimators': 100, 'n_jobs': 0, 'num_parallel_tree': 1, 'predictor': 'auto', 'random_state': 0, 'reg_alpha': 0, 'reg_lambda': 1, 'sampling_method': 'uniform', 'scale_pos_weight': 1, 'subsample': 1, 'tree_method': 'exact', 'validate_parameters': 1, 'verbosity': None} Creation date: 2022-09-24 08:33:22 Last fit date: 2022-09-24 08:33:23 Skforecast version: 0.5.0 Python version: 3.9.13
Prediction¶
In [4]:
Copied!
# Predict
# ==============================================================================
forecaster.predict(steps=10, exog=data_test[['exog_1', 'exog_2']])
# Predict
# ==============================================================================
forecaster.predict(steps=10, exog=data_test[['exog_1', 'exog_2']])
Out[4]:
2008-07-01 0.700701 2008-08-01 0.829139 2008-09-01 0.983677 2008-10-01 1.098782 2008-11-01 1.078021 2008-12-01 1.206761 2009-01-01 1.149827 2009-02-01 1.049927 2009-03-01 0.947129 2009-04-01 0.700440 Freq: MS, Name: pred, dtype: float64
Feature importance¶
In [5]:
Copied!
# Predictors importance
# ==============================================================================
forecaster.get_feature_importance()
# Predictors importance
# ==============================================================================
forecaster.get_feature_importance()
Out[5]:
feature | importance | |
---|---|---|
0 | lag_1 | 0.358967 |
1 | lag_2 | 0.093567 |
2 | lag_3 | 0.016729 |
3 | lag_4 | 0.044611 |
4 | lag_5 | 0.054733 |
5 | lag_6 | 0.009510 |
6 | lag_7 | 0.097179 |
7 | lag_8 | 0.027964 |
8 | exog_1 | 0.186294 |
9 | exog_2 | 0.110447 |
In [6]:
Copied!
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>
%%html