Forecasting with XGBoost, LightGBM and other Gradient Boosting models¶

Gradient boosting models have gained popularity in the machine learning community due to their ability to achieve excellent results in a wide range of use cases, including both regression and classification. Although these models have traditionally been less common in forecasting, recent research has shown that they can be highly effective in this domain. Some of the key advantages of using gradient boosting models for forecasting include:

The ease with which exogenous variables, in addition to autoregressive variables, can be incorporated into the model.
The ability to capture non-linear relationships between variables.
High scalability, which enables the models to handle large volumes of data.

There are several popular implementations of gradient boosting in Python, with four of the most popular being XGBoost, LightGBM, scikit-learn HistGradientBoostingRegressor and CatBoost. All of these libraries follow the scikit-learn API, making them compatible with skforecast.

✎ Note

All of the gradient boosting libraries mentioned above - XGBoost, Lightgbm, HistGradientBoostingRegressor, and CatBoost - can handle categorical features natively, but they require specific encoding techniques that may not be entirely intuitive. Detailed information can be found in categorical features and in this two use cases:

💡 Tip

Tree-based models, including decision trees, random forests and gradient boosting machines (GBMs) have limitations when it comes to extrapolation, i.e. making predictions or estimates beyond the range of observed data. This limitation becomes particularly critical when forecasting time series data with a trend. Because these models cannot predict values beyond the observed range during training, their predicted values will deviate from the underlying trend.

Several strategies have been proposed to address this challenge, with one of the most common approaches being the use of differentiation. The differentiation process involves computing the differences between consecutive observations in the time series. Instead of directly modeling the absolute values, the focus shifts to modeling the relative change ratios. Skforecast introduces the differentiation parameter within its forecaster classes to indicate that a differentiation process must be applied before training the model. It is worth noting that the entire differentiation process is automated and its effects are seamlessly reversed during the prediction phase. This ensures that the resulting forecast values are in the original scale of the time series data.

For more details in this topic visit: Modelling time series trend with tree based models.

Libraries and data¶

In [1]:

Copied!





# Libraries
# ==============================================================================
import pandas as pd
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor

from skforecast.datasets import fetch_dataset
from skforecast.preprocessing import RollingFeatures
from skforecast.recursive import ForecasterRecursive
# Libraries
# ==============================================================================
import pandas as pd
from xgboost import XGBRegressor
from lightgbm import LGBMRegressor

from skforecast.datasets import fetch_dataset
from skforecast.preprocessing import RollingFeatures
from skforecast.recursive import ForecasterRecursive

In [2]:

Copied!

# Download data
# ==============================================================================
data = fetch_dataset("h2o_exog")
# Download data
# ==============================================================================
data = fetch_dataset("h2o_exog")

h2o_exog
--------
Monthly expenditure ($AUD) on corticosteroid drugs that the Australian health
system had between 1991 and 2008. Two additional variables (exog_1, exog_2) are
simulated.
Hyndman R (2023). fpp3: Data for Forecasting: Principles and Practice (3rd
Edition). http://pkg.robjhyndman.com/fpp3package/,
https://github.com/robjhyndman/fpp3package, http://OTexts.com/fpp3.
Shape of the dataset: (195, 3)

In [3]:

Copied!





# Data preprocessing
# ==============================================================================
data.index.name = 'date'
steps = 36
data_train = data.iloc[:-steps, :]
data_test  = data.iloc[-steps:, :]
# Data preprocessing
# ==============================================================================
data.index.name = 'date'
steps = 36
data_train = data.iloc[:-steps, :]
data_test  = data.iloc[-steps:, :]

Forecaster LightGBM¶

In [4]:

Copied!





# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
                 regressor       = LGBMRegressor(random_state=123, verbose=-1),
                 lags            = 8,
                 window_features = RollingFeatures(stats=['mean'], window_sizes=[7]),
                 differentiation = None
             )

forecaster.fit(y=data_train['y'], exog=data_train[['exog_1', 'exog_2']])
forecaster
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
                 regressor       = LGBMRegressor(random_state=123, verbose=-1),
                 lags            = 8,
                 window_features = RollingFeatures(stats=['mean'], window_sizes=[7]),
                 differentiation = None
             )

forecaster.fit(y=data_train['y'], exog=data_train[['exog_1', 'exog_2']])
forecaster

Out[4]:

ForecasterRecursive

General Information

Regressor: LGBMRegressor
Lags: [1 2 3 4 5 6 7 8]
Window features: ['roll_mean_7']
Window size: 8
Exogenous included: True
Weight function included: False
Differentiation order: None
Creation date: 2024-11-10 16:52:29
Last fit date: 2024-11-10 16:52:29
Skforecast version: 0.14.0
Python version: 3.11.10
Forecaster id: None

Exogenous Variables

exog_1, exog_2

Data Transformations

Transformer for y: None
Transformer for exog: None

Training Information

Training range: [Timestamp('1992-04-01 00:00:00'), Timestamp('2005-06-01 00:00:00')]
Training index type: DatetimeIndex
Training index frequency: MS

Regressor Parameters

{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': -1, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 100, 'n_jobs': None, 'num_leaves': 31, 'objective': None, 'random_state': 123, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0, 'verbose': -1}

Fit Kwargs

{}

🛈 API Reference 🗎 User Guide

In [5]:

Copied!

# Predict
# ==============================================================================
forecaster.predict(steps=10, exog=data_test[['exog_1', 'exog_2']])
# Predict
# ==============================================================================
forecaster.predict(steps=10, exog=data_test[['exog_1', 'exog_2']])

Out[5]:

2005-07-01    0.941524
2005-08-01    0.927756
2005-09-01    1.065236
2005-10-01    1.088430
2005-11-01    1.083675
2005-12-01    1.168836
2006-01-01    0.967025
2006-02-01    0.751336
2006-03-01    0.816116
2006-04-01    0.801357
Freq: MS, Name: pred, dtype: float64

In [6]:

Copied!

# Feature importances
# ==============================================================================
forecaster.get_feature_importances()
# Feature importances
# ==============================================================================
forecaster.get_feature_importances()

Out[6]:

	feature	importance
10	exog_2	121
1	lag_2	94
0	lag_1	59
9	exog_1	43
5	lag_6	42
3	lag_4	41
4	lag_5	37
7	lag_8	25
6	lag_7	22
2	lag_3	14
8	roll_mean_7	11

Forecaster XGBoost¶

In [7]:

Copied!





# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
                 regressor       = XGBRegressor(random_state=123, enable_categorical=True),
                 lags            = 8,
                 window_features = RollingFeatures(stats=['mean'], window_sizes=[7]),
                 differentiation = None
             )

forecaster.fit(y=data_train['y'], exog=data_train[['exog_1', 'exog_2']])
forecaster
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
                 regressor       = XGBRegressor(random_state=123, enable_categorical=True),
                 lags            = 8,
                 window_features = RollingFeatures(stats=['mean'], window_sizes=[7]),
                 differentiation = None
             )

forecaster.fit(y=data_train['y'], exog=data_train[['exog_1', 'exog_2']])
forecaster

Out[7]:

ForecasterRecursive

General Information

Regressor: XGBRegressor
Lags: [1 2 3 4 5 6 7 8]
Window features: ['roll_mean_7']
Window size: 8
Exogenous included: True
Weight function included: False
Differentiation order: None
Creation date: 2024-11-10 16:52:29
Last fit date: 2024-11-10 16:52:30
Skforecast version: 0.14.0
Python version: 3.11.10
Forecaster id: None

Exogenous Variables

exog_1, exog_2

Data Transformations

Transformer for y: None
Transformer for exog: None

Training Information

Training range: [Timestamp('1992-04-01 00:00:00'), Timestamp('2005-06-01 00:00:00')]
Training index type: DatetimeIndex
Training index frequency: MS

Regressor Parameters

{'objective': 'reg:squarederror', 'base_score': None, 'booster': None, 'callbacks': None, 'colsample_bylevel': None, 'colsample_bynode': None, 'colsample_bytree': None, 'device': None, 'early_stopping_rounds': None, 'enable_categorical': True, 'eval_metric': None, 'feature_types': None, 'gamma': None, 'grow_policy': None, 'importance_type': None, 'interaction_constraints': None, 'learning_rate': None, 'max_bin': None, 'max_cat_threshold': None, 'max_cat_to_onehot': None, 'max_delta_step': None, 'max_depth': None, 'max_leaves': None, 'min_child_weight': None, 'missing': nan, 'monotone_constraints': None, 'multi_strategy': None, 'n_estimators': None, 'n_jobs': None, 'num_parallel_tree': None, 'random_state': 123, 'reg_alpha': None, 'reg_lambda': None, 'sampling_method': None, 'scale_pos_weight': None, 'subsample': None, 'tree_method': None, 'validate_parameters': None, 'verbosity': None}

Fit Kwargs

{}

🛈 API Reference 🗎 User Guide

In [8]:

Copied!

# Predict
# ==============================================================================
forecaster.predict(steps=10, exog=data_test[['exog_1', 'exog_2']])
# Predict
# ==============================================================================
forecaster.predict(steps=10, exog=data_test[['exog_1', 'exog_2']])

Out[8]:

2005-07-01    0.881743
2005-08-01    0.985714
2005-09-01    1.070262
2005-10-01    1.099444
2005-11-01    1.116030
2005-12-01    1.206317
2006-01-01    0.977022
2006-02-01    0.679524
2006-03-01    0.740902
2006-04-01    0.742273
Freq: MS, Name: pred, dtype: float64

In [9]:

Copied!

# Feature importances
# ==============================================================================
forecaster.get_feature_importances()
# Feature importances
# ==============================================================================
forecaster.get_feature_importances()

Out[9]:

	feature	importance
0	lag_1	0.299734
10	exog_2	0.255852
1	lag_2	0.105777
6	lag_7	0.100981
4	lag_5	0.083807
7	lag_8	0.063430
9	exog_1	0.055471
3	lag_4	0.017550
5	lag_6	0.010896
2	lag_3	0.004139
8	roll_mean_7	0.002362