Forecasting with XGBoost, LightGBM and other Gradient Boosting models¶
Gradient boosting models have gained popularity in the machine learning community due to their ability to achieve excellent results in a wide range of use cases, including both regression and classification. Although these models have traditionally been less common in forecasting, recent research has shown that they can be highly effective in this domain. Some of the key advantages of using gradient boosting models for forecasting include:
The ease with which exogenous variables, in addition to autoregressive variables, can be incorporated into the model.
The ability to capture non-linear relationships between variables.
High scalability, which enables the models to handle large volumes of data.
There are several popular implementations of gradient boosting in Python, with four of the most popular being XGBoost, LightGBM, scikit-learn HistGradientBoostingRegressor and CatBoost. All of these libraries follow the scikit-learn API, making them compatible with skforecast.
✎ Note
All of the gradient boosting libraries mentioned above - XGBoost, Lightgbm, HistGradientBoostingRegressor, and CatBoost - can handle categorical features natively, but they require specific encoding techniques that may not be entirely intuitive. Detailed information can be found in categorical features and in this two use cases:
💡 Tip
Tree-based models, including decision trees, random forests and gradient boosting machines (GBMs) have limitations when it comes to extrapolation, i.e. making predictions or estimates beyond the range of observed data. This limitation becomes particularly critical when forecasting time series data with a trend. Because these models cannot predict values beyond the observed range during training, their predicted values will deviate from the underlying trend.
Several strategies have been proposed to address this challenge, with one of the most common approaches being the use of differentiation. The differentiation process involves computing the differences between consecutive observations in the time series. Instead of directly modeling the absolute values, the focus shifts to modeling the relative change ratios. Skforecast, version 0.10.0 or higher, introduces a novel differentiation
parameter within its forecaster classes to indicate that a differentiation process must be applied before training the model. It is worth noting that the entire differentiation process has been automated and its effects are seamlessly reversed during the prediction phase. This ensures that the resulting forecast values are in the original scale of the time series data.
For more details in this topic visit: Modelling time series trend with tree based models.
Libraries¶
# Libraries
# ==============================================================================
import pandas as pd
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.datasets import fetch_dataset
Data¶
# Download data
# ==============================================================================
data = fetch_dataset("h2o_exog")
h2o_exog -------- Monthly expenditure ($AUD) on corticosteroid drugs that the Australian health system had between 1991 and 2008. Two additional variables (exog_1, exog_2) are simulated. Hyndman R (2023). fpp3: Data for Forecasting: Principles and Practice (3rd Edition). http://pkg.robjhyndman.com/fpp3package/, https://github.com/robjhyndman/fpp3package, http://OTexts.com/fpp3. Shape of the dataset: (195, 3)
# Data preprocessing
# ==============================================================================
data.index.name = 'date'
steps = 36
data_train = data.iloc[:-steps, :]
data_test = data.iloc[-steps:, :]
Forecaster LightGBM¶
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = LGBMRegressor(random_state=123, verbose=-1),
lags = 8,
differentiation = None
)
forecaster.fit(y=data_train['y'], exog=data_train[['exog_1', 'exog_2']])
forecaster
================= ForecasterAutoreg ================= Regressor: LGBMRegressor(random_state=123, verbose=-1) Lags: [1 2 3 4 5 6 7 8] Transformer for y: None Transformer for exog: None Window size: 8 Weight function included: False Differentiation order: None Exogenous included: True Exogenous variables names: ['exog_1', 'exog_2'] Training range: [Timestamp('1992-04-01 00:00:00'), Timestamp('2005-06-01 00:00:00')] Training index type: DatetimeIndex Training index frequency: MS Regressor parameters: {'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': -1, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 100, 'n_jobs': None, 'num_leaves': 31, 'objective': None, 'random_state': 123, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0, 'verbose': -1} fit_kwargs: {} Creation date: 2024-08-09 11:11:32 Last fit date: 2024-08-09 11:11:32 Skforecast version: 0.13.0 Python version: 3.12.4 Forecaster id: None
# Predict
# ==============================================================================
forecaster.predict(steps=10, exog=data_test[['exog_1', 'exog_2']])
2005-07-01 0.939158 2005-08-01 0.931943 2005-09-01 1.072937 2005-10-01 1.090429 2005-11-01 1.087492 2005-12-01 1.170073 2006-01-01 0.964073 2006-02-01 0.760841 2006-03-01 0.829831 2006-04-01 0.800095 Freq: MS, Name: pred, dtype: float64
# Feature importances
# ==============================================================================
forecaster.get_feature_importances()
feature | importance | |
---|---|---|
9 | exog_2 | 127 |
1 | lag_2 | 91 |
0 | lag_1 | 61 |
5 | lag_6 | 49 |
8 | exog_1 | 43 |
3 | lag_4 | 38 |
4 | lag_5 | 35 |
7 | lag_8 | 26 |
6 | lag_7 | 25 |
2 | lag_3 | 14 |
Forecaster XGBoost¶
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = XGBRegressor(random_state=123, enable_categorical=True),
lags = 8,
differentiation = None
)
forecaster.fit(y=data_train['y'], exog=data_train[['exog_1', 'exog_2']])
forecaster
================= ForecasterAutoreg ================= Regressor: XGBRegressor(base_score=None, booster=None, callbacks=None, colsample_bylevel=None, colsample_bynode=None, colsample_bytree=None, device=None, early_stopping_rounds=None, enable_categorical=True, eval_metric=None, feature_types=None, gamma=None, grow_policy=None, importance_type=None, interaction_constraints=None, learning_rate=None, max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None, max_delta_step=None, max_depth=None, max_leaves=None, min_child_weight=None, missing=nan, monotone_constraints=None, multi_strategy=None, n_estimators=None, n_jobs=None, num_parallel_tree=None, random_state=123, ...) Lags: [1 2 3 4 5 6 7 8] Transformer for y: None Transformer for exog: None Window size: 8 Weight function included: False Differentiation order: None Exogenous included: True Exogenous variables names: ['exog_1', 'exog_2'] Training range: [Timestamp('1992-04-01 00:00:00'), Timestamp('2005-06-01 00:00:00')] Training index type: DatetimeIndex Training index frequency: MS Regressor parameters: {'objective': 'reg:squarederror', 'base_score': None, 'booster': None, 'callbacks': None, 'colsample_bylevel': None, 'colsample_bynode': None, 'colsample_bytree': None, 'device': None, 'early_stopping_rounds': None, 'enable_categorical': True, 'eval_metric': None, 'feature_types': None, 'gamma': None, 'grow_policy': None, 'importance_type': None, 'interaction_constraints': None, 'learning_rate': None, 'max_bin': None, 'max_cat_threshold': None, 'max_cat_to_onehot': None, 'max_delta_step': None, 'max_depth': None, 'max_leaves': None, 'min_child_weight': None, 'missing': nan, 'monotone_constraints': None, 'multi_strategy': None, 'n_estimators': None, 'n_jobs': None, 'num_parallel_tree': None, 'random_state': 123, 'reg_alpha': None, 'reg_lambda': None, 'sampling_method': None, 'scale_pos_weight': None, 'subsample': None, 'tree_method': None, 'validate_parameters': None, 'verbosity': None} fit_kwargs: {} Creation date: 2024-08-09 11:11:32 Last fit date: 2024-08-09 11:11:32 Skforecast version: 0.13.0 Python version: 3.12.4 Forecaster id: None
# Predict
# ==============================================================================
forecaster.predict(steps=10, exog=data_test[['exog_1', 'exog_2']])
2005-07-01 0.879593 2005-08-01 0.988315 2005-09-01 1.069755 2005-10-01 1.107012 2005-11-01 1.114354 2005-12-01 1.212606 2006-01-01 0.980519 2006-02-01 0.682377 2006-03-01 0.743607 2006-04-01 0.738725 Freq: MS, Name: pred, dtype: float64
# Feature importances
# ==============================================================================
forecaster.get_feature_importances()
feature | importance | |
---|---|---|
0 | lag_1 | 0.288422 |
9 | exog_2 | 0.266819 |
1 | lag_2 | 0.118983 |
6 | lag_7 | 0.101393 |
4 | lag_5 | 0.082091 |
7 | lag_8 | 0.061150 |
8 | exog_1 | 0.050711 |
3 | lag_4 | 0.015282 |
5 | lag_6 | 0.011015 |
2 | lag_3 | 0.004135 |
⚠ Warning
Starting with version 0.13.0
, the ForecasterAutoregMultiSeries
class defaults to using encoding='ordinal_category'
for encoding time series identifiers. This approach creates a new column (_level_skforecast) of type pandas category
. Consequently, the regressors must be able to handle categorical variables. If the regressors do not support categorical variables, the user should set the encoding to 'ordinal'
or 'onehot'
for compatibility.
Some examples of regressors that support categorical variables and how to enable them are:
-
HistGradientBoostingRegressor:
HistGradientBoostingRegressor(categorical_features="from_dtype")
-
LightGBM:
LGBMRegressor
does not allow configuration of categorical features during initialization, but rather in itsfit
method. Therefore, usefit_kwargs = {'categorical_feature':'auto'}
. This is the default behavior ofLGBMRegressor
if no indication is given.
-
XGBoost:
XGBRegressor(enable_categorical=True)