Avoid Negative Predictions in ForecastingĀ¶
When training a forecasting model, even if none of the training observations are negative, some predictions may turn out to be negative. This is quite common when trying to predict attendance, sales, or rainfall, among other things. To avoid this, it is possible to use the following approaches:
Use a log+K transformation to always have positive values in the transformed time series. K is a positive integer to avoid the undefinability of the $log(x)$ function at 0. Despite the simplicity of the approach, one must be aware that some caveats may arise depending on the metric we use to optimize our models.
Use a different link function in the models. The link function provides the relationship between the linear predictor and the expected value of the response variable. When using machine learning models, several algorithms support different objective functions to account for this situation. For example, when predicting the number of visitors, one can use
count:poisson
in XGBoost or LightGBM as well as gamma regression if the target is always strictly positive.
The following tutorial will explore both possibilities.
Libraries and dataĀ¶
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import warnings
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-darkgrid')
from sklearn.linear_model import Ridge
from sklearn.linear_model import GammaRegressor
from sklearn.preprocessing import FunctionTransformer
from sklearn.metrics import mean_squared_error
from xgboost import XGBRegressor
from skforecast.datasets import fetch_dataset
from skforecast.recursive import ForecasterRecursive
from skforecast.model_selection import TimeSeriesFold
from skforecast.model_selection import backtesting_forecaster
# Downloading data
# ==============================================================================
data = fetch_dataset("bike_sharing")
data = data[['users']].iloc[:1000].copy()
data
bike_sharing ------------ Hourly usage of the bike share system in the city of Washington D.C. during the years 2011 and 2012. In addition to the number of users per hour, information about weather conditions and holidays is available. Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C5W894. Shape of the dataset: (17544, 11)
users | |
---|---|
date_time | |
2011-01-01 00:00:00 | 16.0 |
2011-01-01 01:00:00 | 40.0 |
2011-01-01 02:00:00 | 32.0 |
2011-01-01 03:00:00 | 13.0 |
2011-01-01 04:00:00 | 1.0 |
... | ... |
2011-02-11 11:00:00 | 64.0 |
2011-02-11 12:00:00 | 71.0 |
2011-02-11 13:00:00 | 110.0 |
2011-02-11 14:00:00 | 84.0 |
2011-02-11 15:00:00 | 74.0 |
1000 rows Ć 1 columns
# Split train-test
# ==============================================================================
end_train = '2011-01-31 23:59:00'
data_train = data.loc[: end_train, :]
data_test = data.loc[end_train:, :]
print(f"Dates train: {data_train.index.min()} --- {data_train.index.max()} (n={len(data_train)})")
print(f"Dates test : {data_test.index.min()} --- {data_test.index.max()} (n={len(data_test)})")
Dates train: 2011-01-01 00:00:00 --- 2011-01-31 23:00:00 (n=744) Dates test : 2011-02-01 00:00:00 --- 2011-02-11 15:00:00 (n=256)
# Plot time series
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data_train['users'].plot(ax=ax, label='train')
data_test['users'].plot(ax=ax, label='test')
ax.set_title('Number of users')
ax.set_xlabel('')
ax.legend();
ForecasterĀ¶
# Create a forecaster and train it
# ==============================================================================
forecaster = ForecasterRecursive(
regressor = Ridge(random_state=123),
lags = 24
)
# Backtesting predictions on test data
# ==============================================================================
cv = TimeSeriesFold(
steps = 24,
initial_train_size = len(data_train),
)
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'],
cv = cv,
metric = 'mean_squared_error',
verbose = False,
show_progress = True
)
0%| | 0/11 [00:00<?, ?it/s]
# Plot predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data_test['users'].plot(ax=ax, label='test')
predictions['pred'].plot(ax=ax, label='predictions')
ax.set_xlabel('')
ax.legend();
The graph above shows that some predictions are negative.
# Negative predictions
# ==============================================================================
predictions[predictions.pred < 0]
pred | |
---|---|
2011-02-01 00:00:00 | -23.337236 |
2011-02-01 01:00:00 | -17.691405 |
2011-02-02 00:00:00 | -5.243456 |
2011-02-02 01:00:00 | -19.363139 |
2011-02-03 01:00:00 | -6.441943 |
2011-02-04 01:00:00 | -10.579940 |
2011-02-05 00:00:00 | -6.026119 |
2011-02-05 01:00:00 | -21.396841 |
2011-02-07 04:00:00 | -3.412043 |
2011-02-07 05:00:00 | -3.701964 |
2011-02-08 00:00:00 | -17.045913 |
2011-02-08 01:00:00 | -13.233004 |
2011-02-09 00:00:00 | -25.315228 |
2011-02-09 01:00:00 | -24.743686 |
2011-02-10 01:00:00 | -5.704407 |
2011-02-11 01:00:00 | -11.758940 |
Modeling time series in logarithmic scaleĀ¶
# Transform data into a logarithmic scale
# =============================================================================
data_log = np.log1p(data)
data_train_log = np.log1p(data_train)
data_test_log = np.log1p(data_test)
# Create a forecaster and train it
# ==============================================================================
forecaster = ForecasterRecursive(
regressor = Ridge(random_state=123),
lags = 24,
)
# Backtesting predictions on test data
# ==============================================================================
# Since the data has been transformed outside the forecaster, the predictions
# and metric are not in the original scale. They need to be transformed back.
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data_log['users'],
cv = cv,
metric = 'mean_squared_error',
verbose = False,
show_progress = True
)
0%| | 0/11 [00:00<?, ?it/s]
# Revert the transformation
# ==============================================================================
predictions = np.expm1(predictions)
predictions.head(4)
pred | |
---|---|
2011-02-01 00:00:00 | 7.936823 |
2011-02-01 01:00:00 | 4.210682 |
2011-02-01 02:00:00 | 3.009993 |
2011-02-01 03:00:00 | 3.083922 |
# Backtesting metric (test data)
# ==============================================================================
# The error metric is calculated once the transformation is reversed.
metric = mean_squared_error(y_true=data_test['users'], y_pred=predictions)
print(f"Backtesting metric (test data): {metric}")
Backtesting metric (test data): 1991.9332571760026
# Plot predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data_test['users'].plot(ax=ax, label='test')
predictions['pred'].plot(ax=ax, label='predictions')
ax.legend();
Include a logarithmic transformer as part of the forecasterĀ¶
Using scikit-learn FunctionTransformer it is possible to include custom transformers in the forecaster object, for example, a logarithmic transformation. If the FunctionTransformer
has an inverse function, the output of the predict method is automatically transformed back to the original scale.
# Create a custom transformer
# =============================================================================
transformer_y = FunctionTransformer(func=np.log1p, inverse_func=np.expm1, validate=True)
# Create a forecaster and train it
# ==============================================================================
forecaster = ForecasterRecursive(
regressor = Ridge(random_state=123),
lags = 24,
transformer_y = transformer_y,
transformer_exog = None
)
forecaster.fit(data['users'])
# Backtesting predictions on test data
# ==============================================================================
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'],
cv = cv,
metric = 'mean_squared_error',
verbose = False,
show_progress = True
)
# Since the transformation is included in the forecaster, predictions are
# automatically transformed back into the original scale. And the metric is
# calculated in the original scale.
metric
/home/ubuntu/anaconda3/envs/skforecast_14_py12/lib/python3.12/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but FunctionTransformer was fitted with feature names warnings.warn( /home/ubuntu/anaconda3/envs/skforecast_14_py12/lib/python3.12/site-packages/sklearn/base.py:493: UserWarning: X does not have valid feature names, but FunctionTransformer was fitted with feature names warnings.warn(
0%| | 0/11 [00:00<?, ?it/s]
mean_squared_error | |
---|---|
0 | 1991.933257 |
ā Warning
FunctionTransformer
returns a userwarning un setting validate=True
. To hide this warning use warnings.filterwarnings("ignore", message="X does not have valid feature names")
before fitting or backtesting the forecaster.
# Backtest predictions
# ==============================================================================
predictions.head()
pred | |
---|---|
2011-02-01 00:00:00 | 7.936823 |
2011-02-01 01:00:00 | 4.210682 |
2011-02-01 02:00:00 | 3.009993 |
2011-02-01 03:00:00 | 3.083922 |
2011-02-01 04:00:00 | 3.865169 |
The same results are obtained as if the data were log-transformed outside the model. However, by including the log transformation in the model, all inverse transformations are handled automatically.
Usage of link functionsĀ¶
A forecaster with a linear model (scikit learn GammaRegressor) that uses a different link function is evaluated.
# Create a forecaster and train it
# ==============================================================================
forecaster = ForecasterRecursive(
regressor = GammaRegressor(alpha=1, max_iter=100000),
lags = 20,
)
# Backtesting predictions on test data
# ==============================================================================
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'],
cv = cv,
metric = 'mean_squared_error',
verbose = False,
show_progress = True
)
metric
0%| | 0/11 [00:00<?, ?it/s]
mean_squared_error | |
---|---|
0 | 3509.708608 |
# Backtest predictions
# ==============================================================================
predictions.head()
pred | |
---|---|
2011-02-01 00:00:00 | 6.685792 |
2011-02-01 01:00:00 | 10.238365 |
2011-02-01 02:00:00 | 11.950816 |
2011-02-01 03:00:00 | 9.194242 |
2011-02-01 04:00:00 | 6.372484 |
In this case the gamma regressor model performed poorly, but if we look to the negative results:
# Negative predictions
# ======================================================================================
predictions[predictions.pred < 0]
pred |
---|
# Plot predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data_test['users'].plot(ax=ax, label='test')
predictions['pred'].plot(ax=ax, label='predictions')
ax.legend();
Usage of XGBoost objective functionsĀ¶
Now a XGBoost model is trained, but with the objective function set to reg:gamma
, so that the output is a mean of the gamma distribution, which is defined for positive values only.
# Create forecaster and train
# ==============================================================================
params = {
'tree_method': 'hist',
'objective': 'reg:gamma'
}
forecaster = ForecasterRecursive(
regressor = XGBRegressor(**params),
lags = 24,
)
# Backtesting predictions on test data
# ==============================================================================
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'],
cv = cv,
metric = 'mean_squared_error',
verbose = False,
show_progress = True
)
metric
0%| | 0/11 [00:00<?, ?it/s]
mean_squared_error | |
---|---|
0 | 1915.458228 |
# Negative predictions
predictions[predictions.pred < 0]
pred |
---|
# Plot predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data_test['users'].plot(ax=ax, label='test')
predictions['pred'].plot(ax=ax, label='predictions')
ax.legend();
The forecaster's performance is better than using a Ridge regressor with a log-transformer, and no negative predictions are made.