Classification forecasting for categorical time series¶

Recursive multi-step forecasting is a technique used to predict future values in a time series by using previous predictions as inputs for subsequent forecasts.

In the context of categorical time series, where the values belong to discrete categories rather than continuous values, the model is trained to predict the probability distribution over the possible categories for each future time step. The most likely category for the current time step is then used as input for predicting the next time step, and this process continues recursively.

Although the process is conceptually very similar to that used for forecasting continuous time series, many challenges arise due the need to encode and decode categorical variables, as well as allowing the underlying machine learning model to handle categorical data effectively.

Skforecast provides the class ForecasterRecursiveClassifier to facilitate recursive multi-step forecasting for categorical time series, automatically managing the encoding and decoding of categorical variables, and integrating seamlessly with various machine learning models that support classification tasks.

Libraries and data¶

In [ ]:

Copied!





# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
from matplotlib.colors import ListedColormap, BoundaryNorm

from lightgbm import LGBMClassifier
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.calibration import CalibratedClassifierCV, CalibrationDisplay, calibration_curve
from skforecast.datasets import fetch_dataset
from skforecast.preprocessing import RollingFeaturesClassification
from skforecast.recursive import ForecasterRecursiveClassifier
from skforecast.model_selection import (
    TimeSeriesFold, OneStepAheadFold, backtesting_forecaster, bayesian_search_forecaster
)
from skforecast.plot import set_dark_theme
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
from matplotlib.colors import ListedColormap, BoundaryNorm

from lightgbm import LGBMClassifier
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.metrics import accuracy_score, classification_report
from sklearn.calibration import CalibratedClassifierCV, CalibrationDisplay, calibration_curve
from skforecast.datasets import fetch_dataset
from skforecast.preprocessing import RollingFeaturesClassification
from skforecast.recursive import ForecasterRecursiveClassifier
from skforecast.model_selection import (
    TimeSeriesFold, OneStepAheadFold, backtesting_forecaster, bayesian_search_forecaster
)
from skforecast.plot import set_dark_theme

In [2]:

Copied!





# Data download
# ==============================================================================
data = fetch_dataset(name='vic_electricity_classification')
data
# Data download
# ==============================================================================
data = fetch_dataset(name='vic_electricity_classification')
data

╭──────────────────────── vic_electricity_classification ────────────────────────╮
│ Description:                                                                   │
│ Hourly electricity demand for Victoria, Australia classified into three        │
│ categories: 'low', 'medium' and 'high' according to the 20th and 80th          │
│ percentiles. The dataset also includes temperature, holiday indicator and hour │
│ of the day as features.                                                        │
│                                                                                │
│ Source:                                                                        │
│ O'Hara-Wild M, Hyndman R, Wang E, Godahewa R (2022).tsibbledata: Diverse       │
│ Datasets for 'tsibble'. https://tsibbledata.tidyverts.org/,                    │
│ https://github.com/tidyverts/tsibbledata/.                                     │
│ https://tsibbledata.tidyverts.org/reference/vic_elec.html                      │
│                                                                                │
│ URL:                                                                           │
│ https://raw.githubusercontent.com/skforecast/skforecast-                       │
│ datasets/main/data/vic_electricity_classification.csv                          │
│                                                                                │
│ Shape: 8736 rows x 5 columns                                                   │
╰────────────────────────────────────────────────────────────────────────────────╯

Out[2]:

	Demand	Temperature	Holiday	Hour_of_day	Demand_raw
Time
2014-01-01 01:00:00	low	24.80	1.0	1	3730.065980
2014-01-01 02:00:00	medium	25.90	1.0	2	3858.473927
2014-01-01 03:00:00	medium	25.30	1.0	3	3851.415438
2014-01-01 04:00:00	medium	23.65	1.0	4	3838.972926
2014-01-01 05:00:00	medium	20.70	1.0	5	3860.540970
...	...	...	...	...	...
2014-12-30 20:00:00	low	12.10	0.0	20	3527.232855
2014-12-30 21:00:00	medium	12.30	0.0	21	3846.439766
2014-12-30 22:00:00	medium	14.05	0.0	22	3961.529675
2014-12-30 23:00:00	medium	16.60	0.0	23	4059.288011
2014-12-31 00:00:00	medium	17.70	0.0	0	4071.800790

8736 rows × 5 columns

In [3]:

Copied!





# Distribution of classes in the series
# ==============================================================================
print("Distribution of classes:")
data['Demand'].value_counts(normalize=True).round(2)
# Distribution of classes in the series
# ==============================================================================
print("Distribution of classes:")
data['Demand'].value_counts(normalize=True).round(2)

Distribution of classes:

Out[3]:

Demand
medium    0.6
low       0.2
high      0.2
Name: proportion, dtype: float64

In [4]:

Copied!





# Split train-test
# ==============================================================================
end_train = '2014-10-31 23:59:00'
data_train = data[:end_train].copy()
data_test  = data[end_train:].copy()

print(
    f"Train dates : {data_train.index.min()} --- "
    f"{data_train.index.max()}  (n={len(data_train)})"
)
print(
    f"Test dates  : {data_test.index.min()} --- "
    f"{data_test.index.max()}  (n={len(data_test)})"
)
# Split train-test
# ==============================================================================
end_train = '2014-10-31 23:59:00'
data_train = data[:end_train].copy()
data_test  = data[end_train:].copy()

print(
    f"Train dates : {data_train.index.min()} --- "
    f"{data_train.index.max()}  (n={len(data_train)})"
)
print(
    f"Test dates  : {data_test.index.min()} --- "
    f"{data_test.index.max()}  (n={len(data_test)})"
)

Train dates : 2014-01-01 01:00:00 --- 2014-10-31 23:00:00  (n=7295)
Test dates  : 2014-11-01 00:00:00 --- 2014-12-31 00:00:00  (n=1441)

Plotting categorical time series¶

Visualizing categorical time series can be challenging due to the discrete nature of the data. Most of the standard plotting libraries require numerical inputs to create plots which means that categorical data needs to be encoded into numerical format for visualization purposes.

In [5]:

Copied!





# Data preprocessing
# ==============================================================================
# Encode categorical variable as numerical for plotting
encode_mapping = {'low': 0, 'medium': 1, 'high': 2}
print(f"Encoding map: {encode_mapping}")
data['encoded_value'] = data['Demand'].map(encode_mapping).astype(float)
data_train['encoded_value'] = data_train['Demand'].map(encode_mapping).astype(float)
data_test['encoded_value'] = data_test['Demand'].map(encode_mapping).astype(float)
data.head(3)
# Data preprocessing
# ==============================================================================
# Encode categorical variable as numerical for plotting
encode_mapping = {'low': 0, 'medium': 1, 'high': 2}
print(f"Encoding map: {encode_mapping}")
data['encoded_value'] = data['Demand'].map(encode_mapping).astype(float)
data_train['encoded_value'] = data_train['Demand'].map(encode_mapping).astype(float)
data_test['encoded_value'] = data_test['Demand'].map(encode_mapping).astype(float)
data.head(3)

Encoding map: {'low': 0, 'medium': 1, 'high': 2}

Out[5]:

	Demand	Temperature	Holiday	Hour_of_day	Demand_raw	encoded_value
Time
2014-01-01 01:00:00	low	24.8	1.0	1	3730.065980	0.0
2014-01-01 02:00:00	medium	25.9	1.0	2	3858.473927	1.0
2014-01-01 03:00:00	medium	25.3	1.0	3	3851.415438	1.0

In [6]:

Copied!





# Plot: line plot
# ==============================================================================
set_dark_theme()
fig, ax = plt.subplots(figsize=(8, 3))
data_train['encoded_value'].plot(ax=ax, label='train')
data_test['encoded_value'].plot(ax=ax, label='test')
y_ticks = list(encode_mapping.values())
y_labels = list(encode_mapping.keys())
ax.set_yticks(y_ticks)
ax.set_yticklabels(y_labels)
ax.set_title("Categorical Time Series (States)")
ax.set_xlabel("Date")
ax.legend();
# Plot: line plot
# ==============================================================================
set_dark_theme()
fig, ax = plt.subplots(figsize=(8, 3))
data_train['encoded_value'].plot(ax=ax, label='train')
data_test['encoded_value'].plot(ax=ax, label='test')
y_ticks = list(encode_mapping.values())
y_labels = list(encode_mapping.keys())
ax.set_yticks(y_ticks)
ax.set_yticklabels(y_labels)
ax.set_title("Categorical Time Series (States)")
ax.set_xlabel("Date")
ax.legend();

No description has been provided for this image

In [7]:

Copied!





# Plot: 1-row heatmap-style scatter
# ==================================================================================================
colors = ["skyblue", "gold", "tomato"]
cmap = ListedColormap(colors)
norm = BoundaryNorm([0, 1, 2, 3], cmap.N)
stripe_height = 20
state_array = np.tile(data["encoded_value"].values, (stripe_height, 1))
train_mask = data.index <= pd.to_datetime(end_train)
train_indices = np.where(train_mask)[0]
test_indices = np.where(~train_mask)[0]

fig, ax = plt.subplots(figsize=(8, 2.5))
ax.imshow(
    state_array,
    aspect="auto",
    cmap=cmap,
    norm=norm,
    extent=[0, len(data) - 1, 0, stripe_height],
)

ax.hlines(y=30, xmin=train_indices[0], xmax=train_indices[-1], color="#30a2da", linewidth=4)
ax.hlines(y=30, xmin=test_indices[0], xmax=test_indices[-1], color="#fc4f30", linewidth=4)
ax.set_yticks([])
ax.set_frame_on(False)
tick_indices = np.linspace(0, len(data) - 1, 10, dtype=int)
ax.set_xticks(tick_indices)
ax.set_xticklabels([data.index[i].strftime("%Y-%m") for i in tick_indices], rotation=0, ha="center")
ax.set_xlabel("Date")
ax.set_title("Categorical Daily States Over Time", pad=8)

cbar = fig.colorbar(
    plt.cm.ScalarMappable(cmap=cmap, norm=norm),
    ax=ax,
    orientation="horizontal",
    ticks=[0.5, 1.5, 2.5],
    pad=0.3,       # space between plot and colorbar
    fraction=0.05,  # thickness
    aspect=20       # length-to-height ratio
)
cbar.outline.set_visible(False)
cbar.ax.set_xticklabels(["low", "medium", "high"], fontsize=8)
cbar.set_label("", labelpad=2, fontsize=9)

legend_elements = [
    Patch(facecolor="#30a2da", label="Train"),
    Patch(facecolor="#fc4f30", label="Test"),
]
ax.legend(handles=legend_elements, loc="upper right", frameon=False, ncol=2, borderaxespad=0.5);
# Plot: 1-row heatmap-style scatter
# ==================================================================================================
colors = ["skyblue", "gold", "tomato"]
cmap = ListedColormap(colors)
norm = BoundaryNorm([0, 1, 2, 3], cmap.N)
stripe_height = 20
state_array = np.tile(data["encoded_value"].values, (stripe_height, 1))
train_mask = data.index <= pd.to_datetime(end_train)
train_indices = np.where(train_mask)[0]
test_indices = np.where(~train_mask)[0]

fig, ax = plt.subplots(figsize=(8, 2.5))
ax.imshow(
    state_array,
    aspect="auto",
    cmap=cmap,
    norm=norm,
    extent=[0, len(data) - 1, 0, stripe_height],
)

ax.hlines(y=30, xmin=train_indices[0], xmax=train_indices[-1], color="#30a2da", linewidth=4)
ax.hlines(y=30, xmin=test_indices[0], xmax=test_indices[-1], color="#fc4f30", linewidth=4)
ax.set_yticks([])
ax.set_frame_on(False)
tick_indices = np.linspace(0, len(data) - 1, 10, dtype=int)
ax.set_xticks(tick_indices)
ax.set_xticklabels([data.index[i].strftime("%Y-%m") for i in tick_indices], rotation=0, ha="center")
ax.set_xlabel("Date")
ax.set_title("Categorical Daily States Over Time", pad=8)

cbar = fig.colorbar(
    plt.cm.ScalarMappable(cmap=cmap, norm=norm),
    ax=ax,
    orientation="horizontal",
    ticks=[0.5, 1.5, 2.5],
    pad=0.3,       # space between plot and colorbar
    fraction=0.05,  # thickness
    aspect=20       # length-to-height ratio
)
cbar.outline.set_visible(False)
cbar.ax.set_xticklabels(["low", "medium", "high"], fontsize=8)
cbar.set_label("", labelpad=2, fontsize=9)

legend_elements = [
    Patch(facecolor="#30a2da", label="Train"),
    Patch(facecolor="#fc4f30", label="Test"),
]
ax.legend(handles=legend_elements, loc="upper right", frameon=False, ncol=2, borderaxespad=0.5);

In [ ]:

Copied!





# Plot: states timeline
# ==============================================================================
fig, ax = plt.subplots(figsize=(8, 3))
color_map = {"low": "skyblue", "medium": "gold", "high": "tomato"}
for state, color in color_map.items():
    mask = data["Demand"] == state
    ax.plot(
        data.index[mask], [state] * mask.sum(), '|', color=color, markersize=8, label=state
    )
ax.set_yticks(y_ticks)
ax.set_title("Categorical States Timeline")
ax.set_xlabel("Date")
ax.legend()
plt.tight_layout()
plt.show()
# Plot: states timeline
# ==============================================================================
fig, ax = plt.subplots(figsize=(8, 3))
color_map = {"low": "skyblue", "medium": "gold", "high": "tomato"}
for state, color in color_map.items():
    mask = data["Demand"] == state
    ax.plot(
        data.index[mask], [state] * mask.sum(), '|', color=color, markersize=8, label=state
    )
ax.set_yticks(y_ticks)
ax.set_title("Categorical States Timeline")
ax.set_xlabel("Date")
ax.legend()
plt.tight_layout()
plt.show()

Create and train forecaster¶

When working with categorical time series, a common challenge arises: not only is the target variable categorical, but the lagged values used as predictors are also categorical, as they represent previous observations of the target variable.

Most classification algorithms can handle categorical target variables; however, many cannot directly process categorical predictors. In such cases, it is necessary to encode the categorical features into a numerical format prior to training the model.

The features_encoding parameter in the ForecasterRecursiveClassifier class specifies how the categorical features derived from the target series (e.g., lagged values and window features) are encoded before fitting the underlying machine learning model. The parameter accepts the following options:

auto: Use a categorical data type if the classifier supports native categorical features (such as LightGBM, CatBoost, XGBoost, or HistGradientBoostingClassifier). Otherwise, fallback to numerical encoding.
categorical: Enforce the use of a categorical data type. This requires a classifier capable of handling native categorical features.
ordinal: Apply ordinal encoding (0, 1, 2, …). In this case, the classifier interprets the encoded values as numeric, implicitly assuming an ordinal relationship among the categories (for example, 'low' < 'medium' < 'high').

Proper encoding of categorical features is crucial to ensure that the model can accurately interpret and leverage the information contained in previous observations of the target series.

💡 Tip

The ForecasterRecursiveClassifier class allows the inclusion of window features through the RollingFeaturesClassification parameter. These features can capture patterns over a specified number of previous time steps, potentially enhancing the model's predictive capabilities.

The available statistics are:

proportion: The relative frequency or ratio of each class within the window.
mode: The most frequent class (value) observed in the window.
entropy: A measure of the diversity or randomness of the classes.
n_changes: The number of times the classes changes within the window.
n_unique: The count of unique classes present in the window.

To learn more about window features visit the Window and custom features user guide.

In [9]:

Copied!





# Create and fit forecaster with rolling window features
# ==============================================================================
window_features = RollingFeaturesClassification(stats=['proportion'], window_sizes=72)
forecaster = ForecasterRecursiveClassifier(
                 estimator         = LGBMClassifier(random_state=123, verbose=-1),
                 lags              = 24,
                 window_features   = window_features,
                 features_encoding = 'auto', 
             )

forecaster.fit(
    y    = data_train['Demand'], 
    exog = data_train[['Temperature', 'Holiday', 'Hour_of_day']]
)
forecaster
# Create and fit forecaster with rolling window features
# ==============================================================================
window_features = RollingFeaturesClassification(stats=['proportion'], window_sizes=72)
forecaster = ForecasterRecursiveClassifier(
                 estimator         = LGBMClassifier(random_state=123, verbose=-1),
                 lags              = 24,
                 window_features   = window_features,
                 features_encoding = 'auto', 
             )

forecaster.fit(
    y    = data_train['Demand'], 
    exog = data_train[['Temperature', 'Holiday', 'Hour_of_day']]
)
forecaster

Out[9]:

ForecasterRecursiveClassifier

General Information

Estimator: LGBMClassifier
Lags: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
Window features: ['roll_proportion_72']
Window size: 72
Series name: Demand
Exogenous included: True
Weight function included: False
Creation date: 2025-11-28 19:39:52
Last fit date: 2025-11-28 19:39:55
Skforecast version: 0.19.0
Python version: 3.12.11
Forecaster id: None

Classification Information

Classes: ['high', 'low', 'medium']
Class encoding: {'high': 0, 'low': 1, 'medium': 2}

Exogenous Variables

Exogenous names: Temperature, Holiday, Hour_of_day
Transformer for exog: None

Training Information

Training range: [Timestamp('2014-01-01 01:00:00'), Timestamp('2014-10-31 23:00:00')]
Training index type: DatetimeIndex
Training index frequency:

Estimator Parameters

{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': -1, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 100, 'n_jobs': None, 'num_leaves': 31, 'objective': None, 'random_state': 123, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0, 'verbose': -1}

Fit Kwargs

{}

🛈 API Reference 🗎 User Guide

Prediction¶

In [10]:

Copied!





# Predict
# ==============================================================================
predictions = forecaster.predict(
                  steps = 24, 
                  exog  = data_test[['Temperature', 'Holiday', 'Hour_of_day']]
              )
predictions
# Predict
# ==============================================================================
predictions = forecaster.predict(
                  steps = 24, 
                  exog  = data_test[['Temperature', 'Holiday', 'Hour_of_day']]
              )
predictions

Out[10]:

2014-11-01 00:00:00    medium
2014-11-01 01:00:00    medium
2014-11-01 02:00:00    medium
2014-11-01 03:00:00    medium
2014-11-01 04:00:00    medium
2014-11-01 05:00:00    medium
2014-11-01 06:00:00    medium
2014-11-01 07:00:00    medium
2014-11-01 08:00:00    medium
2014-11-01 09:00:00      high
2014-11-01 10:00:00      high
2014-11-01 11:00:00    medium
2014-11-01 12:00:00    medium
2014-11-01 13:00:00    medium
2014-11-01 14:00:00    medium
2014-11-01 15:00:00    medium
2014-11-01 16:00:00    medium
2014-11-01 17:00:00       low
2014-11-01 18:00:00       low
2014-11-01 19:00:00       low
2014-11-01 20:00:00       low
2014-11-01 21:00:00    medium
2014-11-01 22:00:00    medium
2014-11-01 23:00:00    medium
Freq: h, Name: pred, dtype: object

In [11]:

Copied!





# Plot predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data_test.loc[predictions.index, 'encoded_value'].plot(ax=ax, label='test')
(
    predictions
    .to_frame()
    .assign(encoded_value=predictions.map(encode_mapping))['encoded_value']
    .plot(ax=ax, label='predictions')
)
y_ticks = list(encode_mapping.values())
y_labels = list(encode_mapping.keys())
ax.set_yticks(y_ticks)
ax.set_yticklabels(y_labels)
ax.set_title("Categorical Time Series (States)")
ax.set_xlabel("Date")
ax.set_ylabel("State")
ax.legend();
# Plot predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data_test.loc[predictions.index, 'encoded_value'].plot(ax=ax, label='test')
(
    predictions
    .to_frame()
    .assign(encoded_value=predictions.map(encode_mapping))['encoded_value']
    .plot(ax=ax, label='predictions')
)
y_ticks = list(encode_mapping.values())
y_labels = list(encode_mapping.keys())
ax.set_yticks(y_ticks)
ax.set_yticklabels(y_labels)
ax.set_title("Categorical Time Series (States)")
ax.set_xlabel("Date")
ax.set_ylabel("State")
ax.legend();

In [12]:

Copied!





# Prediction metrics
# ==============================================================================
error_mse = accuracy_score(
                y_true = data_test.loc[predictions.index, 'Demand'],
                y_pred = predictions
            )
            
print(f"Test error (accuracy): {error_mse}")
print("Classification Report \n---------------------")
print(
    classification_report(
        data.loc[predictions.index, 'Demand'],
        predictions
    )
)
# Prediction metrics
# ==============================================================================
error_mse = accuracy_score(
                y_true = data_test.loc[predictions.index, 'Demand'],
                y_pred = predictions
            )
            
print(f"Test error (accuracy): {error_mse}")
print("Classification Report \n---------------------")
print(
    classification_report(
        data.loc[predictions.index, 'Demand'],
        predictions
    )
)

Test error (accuracy): 0.7083333333333334
Classification Report 
---------------------
              precision    recall  f1-score   support

        high       0.00      0.00      0.00         0
         low       1.00      0.44      0.62         9
      medium       0.72      0.87      0.79        15

    accuracy                           0.71        24
   macro avg       0.57      0.44      0.47        24
weighted avg       0.83      0.71      0.72        24

c:\Users\jaesc2\Miniconda3\envs\skforecast_py12\Lib\site-packages\sklearn\metrics\_classification.py:1565: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
c:\Users\jaesc2\Miniconda3\envs\skforecast_py12\Lib\site-packages\sklearn\metrics\_classification.py:1565: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
c:\Users\jaesc2\Miniconda3\envs\skforecast_py12\Lib\site-packages\sklearn\metrics\_classification.py:1565: UndefinedMetricWarning: Recall is ill-defined and being set to 0.0 in labels with no true samples. Use `zero_division` parameter to control this behavior.
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))

Predict probabilities¶

ForecasterRecursiveClassifier includes the method predict_proba which allows to obtain the predicted probabilities for each category at each forecasted time step. Internally, this method calls the predict_proba method of the underlying classifier used inside the forecaster.

⚠️ Warning

It is important to note that these probabilities come from the classifier’s predict_proba method. Their calibration and reliability depend on the specific model used. For example, in scikit-learn, predict_proba returns scores that approximate probabilities, but these are not always well-calibrated or truly probabilistic. See section on Probability Calibration for more details.

In [13]:

Copied!





# Predict probabilities
# ==============================================================================
predictions = forecaster.predict_proba(
                  steps = 24, 
                  exog  = data_test[['Temperature', 'Holiday', 'Hour_of_day']]
              )
predictions
# Predict probabilities
# ==============================================================================
predictions = forecaster.predict_proba(
                  steps = 24, 
                  exog  = data_test[['Temperature', 'Holiday', 'Hour_of_day']]
              )
predictions

Out[13]:

	high_proba	low_proba	medium_proba
2014-11-01 00:00:00	0.004573	0.002774	0.992653
2014-11-01 01:00:00	0.004353	0.007518	0.988129
2014-11-01 02:00:00	0.002101	0.003061	0.994838
2014-11-01 03:00:00	0.001198	0.003052	0.995751
2014-11-01 04:00:00	0.005935	0.001566	0.992499
2014-11-01 05:00:00	0.001352	0.000036	0.998612
2014-11-01 06:00:00	0.003241	0.000080	0.996678
2014-11-01 07:00:00	0.073888	0.000195	0.925917
2014-11-01 08:00:00	0.451197	0.000588	0.548215
2014-11-01 09:00:00	0.540362	0.001000	0.458639
2014-11-01 10:00:00	0.770847	0.000463	0.228690
2014-11-01 11:00:00	0.079652	0.000732	0.919616
2014-11-01 12:00:00	0.000957	0.001410	0.997633
2014-11-01 13:00:00	0.000998	0.000735	0.998268
2014-11-01 14:00:00	0.000491	0.000104	0.999404
2014-11-01 15:00:00	0.000503	0.001229	0.998267
2014-11-01 16:00:00	0.001892	0.052383	0.945725
2014-11-01 17:00:00	0.000693	0.940728	0.058580
2014-11-01 18:00:00	0.000101	0.989826	0.010072
2014-11-01 19:00:00	0.000205	0.968296	0.031499
2014-11-01 20:00:00	0.000332	0.588383	0.411285
2014-11-01 21:00:00	0.000354	0.276246	0.723400
2014-11-01 22:00:00	0.110881	0.004136	0.884984
2014-11-01 23:00:00	0.097221	0.001579	0.901201

Feature importances¶

In [14]:

Copied!

# Native feature importance of the estimator used in the forecaster
# ==============================================================================
forecaster.get_feature_importances()
# Native feature importance of the estimator used in the forecaster
# ==============================================================================
forecaster.get_feature_importances()

Out[14]:

	feature	importance
27	Temperature	2030
29	Hour_of_day	1168
24	roll_proportion_72_class_0	747
25	roll_proportion_72_class_1	701
26	roll_proportion_72_class_2	648
0	lag_1	318
23	lag_24	273
22	lag_23	235
4	lag_5	189
2	lag_3	168
6	lag_7	167
5	lag_6	167
9	lag_10	164
1	lag_2	163
13	lag_14	154
21	lag_22	145
3	lag_4	136
8	lag_9	134
10	lag_11	133
12	lag_13	127
14	lag_15	124
16	lag_17	123
7	lag_8	119
15	lag_16	118
20	lag_21	115
19	lag_20	112
18	lag_19	109
17	lag_18	101
11	lag_12	76
28	Holiday	36

Backtesting¶

The backtesting process for classification forecasters follows the same principles as that of regression forecasters. The key difference lies in the evaluation metrics used to assess model performance. Instead of using metrics like Mean Squared Error (MSE) or Mean Absolute Error (MAE), classification forecasters utilize metrics such as accuracy, precision, ..., to evaluate how well the model predicts categorical outcomes.

The returned predictions will include the most likely category for each forecasted time step in the column pred as well as the predicted probabilities for each category in separate columns.

In [15]:

Copied!





# Backtesting forecaster
# ==============================================================================
window_features = RollingFeaturesClassification(stats=['proportion'], window_sizes=72)
forecaster = ForecasterRecursiveClassifier(
                 estimator         = HistGradientBoostingClassifier(random_state=123),
                 lags              = 24,
                 window_features   = window_features,
                 features_encoding = 'auto',
             )

cv = TimeSeriesFold(steps = 24, initial_train_size = end_train)

metric, predictions = backtesting_forecaster(
                          forecaster    = forecaster,
                          y             = data['Demand'],
                          exog          = data[['Temperature', 'Holiday', 'Hour_of_day']],
                          cv            = cv,
                          metric        = 'accuracy_score',
                          n_jobs        = 'auto',
                          verbose       = False,
                          show_progress = True
                      )
# Backtesting forecaster
# ==============================================================================
window_features = RollingFeaturesClassification(stats=['proportion'], window_sizes=72)
forecaster = ForecasterRecursiveClassifier(
                 estimator         = HistGradientBoostingClassifier(random_state=123),
                 lags              = 24,
                 window_features   = window_features,
                 features_encoding = 'auto',
             )

cv = TimeSeriesFold(steps = 24, initial_train_size = end_train)

metric, predictions = backtesting_forecaster(
                          forecaster    = forecaster,
                          y             = data['Demand'],
                          exog          = data[['Temperature', 'Holiday', 'Hour_of_day']],
                          cv            = cv,
                          metric        = 'accuracy_score',
                          n_jobs        = 'auto',
                          verbose       = False,
                          show_progress = True
                      )

  0%|          | 0/61 [00:00<?, ?it/s]

In [16]:

Copied!

display(metric)
display(predictions)
display(metric)
display(predictions)

	accuracy_score
0	0.843858

	fold	pred	high_proba	low_proba	medium_proba
2014-11-01 00:00:00	0	medium	0.001643	0.001311	0.997046
2014-11-01 01:00:00	0	medium	0.002286	0.004299	0.993415
2014-11-01 02:00:00	0	medium	0.000829	0.001231	0.997940
2014-11-01 03:00:00	0	medium	0.000601	0.002429	0.996970
2014-11-01 04:00:00	0	medium	0.001399	0.000247	0.998354
...	...	...	...	...	...
2014-12-30 20:00:00	59	low	0.000042	0.746859	0.253099
2014-12-30 21:00:00	59	low	0.000015	0.990844	0.009141
2014-12-30 22:00:00	59	medium	0.000097	0.460920	0.538982
2014-12-30 23:00:00	59	medium	0.000414	0.009459	0.990127
2014-12-31 00:00:00	60	medium	0.001948	0.002009	0.996042

1441 rows × 5 columns

In [17]:

Copied!





# Summary of classification metrics on test set
# ==============================================================================
print("Classification Report \n---------------------")
print(
    classification_report(
        data.loc[predictions.index, 'Demand'],
        predictions['pred']
    )
)
# Summary of classification metrics on test set
# ==============================================================================
print("Classification Report \n---------------------")
print(
    classification_report(
        data.loc[predictions.index, 'Demand'],
        predictions['pred']
    )
)

Classification Report 
---------------------
              precision    recall  f1-score   support

        high       0.50      0.36      0.42       107
         low       0.84      0.83      0.84       364
      medium       0.87      0.90      0.89       970

    accuracy                           0.84      1441
   macro avg       0.74      0.70      0.71      1441
weighted avg       0.84      0.84      0.84      1441

In [18]:

Copied!





# Plot predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data.loc[predictions.index, 'encoded_value'].plot(ax=ax, label='test')
predictions.assign(encoded_value=predictions['pred'].map(encode_mapping))['encoded_value'].plot(ax=ax, label='predictions')
y_ticks = list(encode_mapping.values())
y_labels = list(encode_mapping.keys())
ax.set_yticks(y_ticks)
ax.set_yticklabels(y_labels)
ax.set_title("Categorical Time Series (States)")
ax.set_xlabel("Date")
ax.set_ylabel("State")
ax.legend();
# Plot predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(7, 3))
data.loc[predictions.index, 'encoded_value'].plot(ax=ax, label='test')
predictions.assign(encoded_value=predictions['pred'].map(encode_mapping))['encoded_value'].plot(ax=ax, label='predictions')
y_ticks = list(encode_mapping.values())
y_labels = list(encode_mapping.keys())
ax.set_yticks(y_ticks)
ax.set_yticklabels(y_labels)
ax.set_title("Categorical Time Series (States)")
ax.set_xlabel("Date")
ax.set_ylabel("State")
ax.legend();

Custom metrics during backtesting¶

The backtesting_forecaster allows the use of custom metrics to evaluate forecaster performance.

This is particularly useful for classification tasks, as it enables metrics to be tailored to the specific problem type (e.g., binary, multi-class, or multi-label). Common examples include f1_score, roc_auc_score, and precision_score.

In [19]:

Copied!





# Custom f1-score for multi-class classification
# ==============================================================================
from sklearn.metrics import f1_score

def custom_f1_score(y_true, y_pred):
    """
    Custom f1-score metric, it must have `y_true` and `y_pred` as parameters.
    """
    return f1_score(y_true, y_pred, average='weighted')
# Custom f1-score for multi-class classification
# ==============================================================================
from sklearn.metrics import f1_score

def custom_f1_score(y_true, y_pred):
    """
    Custom f1-score metric, it must have `y_true` and `y_pred` as parameters.
    """
    return f1_score(y_true, y_pred, average='weighted')

Hyperparameter tuning¶

Hyperparameter tuning for ForecasterRecursiveClassifier can be performed using the functions available in the skforecast.model_selection module, such as grid_search_forecaster, random_search_forecaster and bayesian_search_forecaster. These functions allow for systematic exploration of hyperparameter combinations to identify the best configuration for the forecaster based on specified evaluation metrics suitable for classification tasks.

When performing hyperparameter tuning, it is important to use a train-validation-test split approach. This involves dividing the dataset into three distinct subsets:

Training set: Used to fit the model.
Validation set: Used to tune the hyperparameters.
Test set: Used to evaluate the model's performance.

This approach helps to prevent overfitting and ensures that the model generalizes well to unseen data.

In [20]:

Copied!





# Train-validation-test split
# ==============================================================================
end_train = '2014-08-31 23:59:00'
end_val = '2014-10-31 23:59:00'
print(
    f"Train dates      : {data.index.min()} --- {data.loc[:end_train].index.max()}"
    f"  (n={len(data.loc[:end_train])})"
)
print(
    f"Validation dates : {data.loc[end_train:].index.min()} --- {data.loc[:end_val].index.max()}"
    f"  (n={len(data.loc[end_train:end_val])})"
)
print(
    f"Test dates       : {data.loc[end_val:].index.min()} --- {data.index.max()}"
    f"  (n={len(data.loc[end_val:])})"
)
print("")
# Train-validation-test split
# ==============================================================================
end_train = '2014-08-31 23:59:00'
end_val = '2014-10-31 23:59:00'
print(
    f"Train dates      : {data.index.min()} --- {data.loc[:end_train].index.max()}"
    f"  (n={len(data.loc[:end_train])})"
)
print(
    f"Validation dates : {data.loc[end_train:].index.min()} --- {data.loc[:end_val].index.max()}"
    f"  (n={len(data.loc[end_train:end_val])})"
)
print(
    f"Test dates       : {data.loc[end_val:].index.min()} --- {data.index.max()}"
    f"  (n={len(data.loc[end_val:])})"
)
print("")

Train dates      : 2014-01-01 01:00:00 --- 2014-08-31 23:00:00  (n=5831)
Validation dates : 2014-09-01 00:00:00 --- 2014-10-31 23:00:00  (n=1464)
Test dates       : 2014-11-01 00:00:00 --- 2014-12-31 00:00:00  (n=1441)

In [21]:

Copied!





# Bayesian search hyperparameters and lags with Optuna
# ==============================================================================
estimator = HistGradientBoostingClassifier(
                early_stopping= True,
                random_state=123,
                verbose=0
            )
forecaster = ForecasterRecursiveClassifier(
                 estimator         = estimator,
                 lags              = 14,  # Placeholder, the value will be overwritten
                 window_features   = RollingFeaturesClassification(stats=['proportion'], window_sizes=[72]),
                 features_encoding = 'auto',
             )

# Lags used as predictors
lags_grid = [24, (1, 2, 3, 23, 24, 25, 47, 48, 49)]

# Search space
def search_space(trial):
    search_space  = {
        'lags': trial.suggest_categorical('lags', lags_grid),
        'max_iter': trial.suggest_int('max_iter', 100, 500, step=100),
        'max_depth': trial.suggest_int('max_depth', 3, 10),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.5),
        'l2_regularization': trial.suggest_float('l2_regularization', 0.0, 1.0)
    } 
    return search_space

# Folds
cv_search = OneStepAheadFold(initial_train_size = end_train)

results, best_trial = bayesian_search_forecaster(
                          forecaster          = forecaster,
                          y                   = data.loc[:end_val, 'Demand'],
                          exog                = data.loc[:end_val, ['Temperature', 'Holiday', 'Hour_of_day']],
                          search_space        = search_space,
                          cv                  = cv_search,
                          metric              = 'accuracy_score',
                          kwargs_create_study = {'direction': 'maximize'},
                          n_trials            = 20,
                          random_state        = 123,
                          return_best         = True,
                          n_jobs              = 'auto',
                          verbose             = False,
                          show_progress       = True,
                      )
results.head(4)
# Bayesian search hyperparameters and lags with Optuna
# ==============================================================================
estimator = HistGradientBoostingClassifier(
                early_stopping= True,
                random_state=123,
                verbose=0
            )
forecaster = ForecasterRecursiveClassifier(
                 estimator         = estimator,
                 lags              = 14,  # Placeholder, the value will be overwritten
                 window_features   = RollingFeaturesClassification(stats=['proportion'], window_sizes=[72]),
                 features_encoding = 'auto',
             )

# Lags used as predictors
lags_grid = [24, (1, 2, 3, 23, 24, 25, 47, 48, 49)]

# Search space
def search_space(trial):
    search_space  = {
        'lags': trial.suggest_categorical('lags', lags_grid),
        'max_iter': trial.suggest_int('max_iter', 100, 500, step=100),
        'max_depth': trial.suggest_int('max_depth', 3, 10),
        'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.5),
        'l2_regularization': trial.suggest_float('l2_regularization', 0.0, 1.0)
    } 
    return search_space

# Folds
cv_search = OneStepAheadFold(initial_train_size = end_train)

results, best_trial = bayesian_search_forecaster(
                          forecaster          = forecaster,
                          y                   = data.loc[:end_val, 'Demand'],
                          exog                = data.loc[:end_val, ['Temperature', 'Holiday', 'Hour_of_day']],
                          search_space        = search_space,
                          cv                  = cv_search,
                          metric              = 'accuracy_score',
                          kwargs_create_study = {'direction': 'maximize'},
                          n_trials            = 20,
                          random_state        = 123,
                          return_best         = True,
                          n_jobs              = 'auto',
                          verbose             = False,
                          show_progress       = True,
                      )
results.head(4)

╭─────────────────────────── OneStepAheadValidationWarning ────────────────────────────╮
│ One-step-ahead predictions are used for faster model comparison, but they may not    │
│ fully represent multi-step prediction performance. It is recommended to backtest the │
│ final model for a more accurate multi-step performance estimate.                     │
│                                                                                      │
│ Category : skforecast.exceptions.OneStepAheadValidationWarning                       │
│ Location :                                                                           │
│ c:\Users\jaesc2\Miniconda3\envs\skforecast_py12\Lib\site-packages\skforecast\model_s │
│ election\_utils.py:607                                                               │
│ Suppress : warnings.simplefilter('ignore', category=OneStepAheadValidationWarning)   │
╰──────────────────────────────────────────────────────────────────────────────────────╯

  0%|          | 0/20 [00:00<?, ?it/s]

`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] 
  Parameters: {'max_iter': 200, 'max_depth': 8, 'learning_rate': 0.06995250096123734, 'l2_regularization': 0.3790013234407905}
  One-step-ahead metric: 0.9050546448087432

Out[21]:

	lags	params	accuracy_score	max_iter	max_depth	learning_rate	l2_regularization
0	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_iter': 200, 'max_depth': 8, 'learning_ra...	0.905055	200.0	8.0	0.069953	0.379001
1	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_iter': 200, 'max_depth': 8, 'learning_ra...	0.903689	200.0	8.0	0.099421	0.175452
2	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_iter': 400, 'max_depth': 7, 'learning_ra...	0.902322	400.0	7.0	0.059226	0.412988
3	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'max_iter': 200, 'max_depth': 9, 'learning_ra...	0.901639	200.0	9.0	0.096356	0.215461

In [22]:

Copied!





# Best model
# ==============================================================================
best_params = results.loc[0, 'params']
best_lags   = results.loc[0, 'lags']

forecaster = ForecasterRecursiveClassifier(
    estimator       = HistGradientBoostingClassifier(random_state=123, **best_params),
    lags            = best_lags,
    window_features = RollingFeaturesClassification(stats=['proportion'], window_sizes=[72]),
    features_encoding = 'auto',
)
# Best model
# ==============================================================================
best_params = results.loc[0, 'params']
best_lags   = results.loc[0, 'lags']

forecaster = ForecasterRecursiveClassifier(
    estimator       = HistGradientBoostingClassifier(random_state=123, **best_params),
    lags            = best_lags,
    window_features = RollingFeaturesClassification(stats=['proportion'], window_sizes=[72]),
    features_encoding = 'auto',
)

In [23]:

Copied!





# Backtest final model on test data
# ==============================================================================
cv = TimeSeriesFold(steps = 24, initial_train_size = end_val)
metric, predictions = backtesting_forecaster(
                          forecaster = forecaster,
                          y          = data['Demand'],
                          exog       = data[['Temperature', 'Holiday', 'Hour_of_day']],
                          cv         = cv,
                          metric     = 'accuracy_score'
                      )
display(metric)
predictions
# Backtest final model on test data
# ==============================================================================
cv = TimeSeriesFold(steps = 24, initial_train_size = end_val)
metric, predictions = backtesting_forecaster(
                          forecaster = forecaster,
                          y          = data['Demand'],
                          exog       = data[['Temperature', 'Holiday', 'Hour_of_day']],
                          cv         = cv,
                          metric     = 'accuracy_score'
                      )
display(metric)
predictions

  0%|          | 0/61 [00:00<?, ?it/s]

	accuracy_score
0	0.840389

Out[23]:

	fold	pred	high_proba	low_proba	medium_proba
2014-11-01 00:00:00	0	medium	0.001048	0.001395	0.997558
2014-11-01 01:00:00	0	medium	0.000473	0.002579	0.996947
2014-11-01 02:00:00	0	medium	0.000147	0.001187	0.998666
2014-11-01 03:00:00	0	medium	0.000296	0.003188	0.996516
2014-11-01 04:00:00	0	medium	0.000237	0.000265	0.999497
...	...	...	...	...	...
2014-12-30 20:00:00	59	low	0.000195	0.697114	0.302691
2014-12-30 21:00:00	59	low	0.000020	0.994279	0.005701
2014-12-30 22:00:00	59	medium	0.000138	0.375040	0.624822
2014-12-30 23:00:00	59	medium	0.000092	0.005908	0.994000
2014-12-31 00:00:00	60	medium	0.000738	0.002541	0.996721

1441 rows × 5 columns

In [24]:

Copied!





# Summary of classification metrics on test set
# ==============================================================================
print("Classification Report \n---------------------")
print(
    classification_report(
        data.loc[predictions.index, 'Demand'],
        predictions['pred']
    )
)
# Summary of classification metrics on test set
# ==============================================================================
print("Classification Report \n---------------------")
print(
    classification_report(
        data.loc[predictions.index, 'Demand'],
        predictions['pred']
    )
)

Classification Report 
---------------------
              precision    recall  f1-score   support

        high       0.46      0.44      0.45       107
         low       0.86      0.81      0.84       364
      medium       0.87      0.89      0.88       970

    accuracy                           0.84      1441
   macro avg       0.73      0.72      0.72      1441
weighted avg       0.84      0.84      0.84      1441

Probability calibration¶

Well calibrated classifiers are probabilistic classifiers for which the output of the predict_proba method can be directly interpreted as a confidence level. For instance, a well calibrated (binary) classifier should classify the samples such that among the samples to which it gave a predict_proba value close to, say, 0.8, approximately 80% actually belong to the positive class.

Before we show how to re-calibrate a classifier, we first need a way to detect how good a classifier is calibrated.

Calibration curves¶

The following definition of calibration curves is taken from scikit-learn documentation:

"Calibration curves, also referred to as reliability diagrams, compare how well the probabilistic predictions of a binary classifier are calibrated. It plots the frequency of the positive label (to be more precise, an estimation of the conditional event probability) on the y-axis against the predicted probability predict_proba of a model on the x-axis. The tricky part is to get values for the y-axis. In scikit-learn, this is accomplished by binning the predictions such that the x-axis represents the average predicted probability in each bin. The y-axis is then the fraction of positives given the predictions of that bin, i.e. the proportion of samples whose class is the positive class (in each bin)."

Since calibration curves are defined for binary classifiers, in the case of multi-class classification problems, one-vs-rest calibration curves are used. This means that for each class, the samples belonging to that class are considered as positive samples, while all other samples are treated as negative samples.

In [25]:

Copied!





# Calibration plots: one-vs-rest
# ==============================================================================
for level, code in {'low': 0, 'medium': 1, 'high': 2}.items():

    fig, ax = plt.subplots(figsize=(6, 2.5))

    y_prob = predictions.loc[:, f'{level}_proba'].rename(code)
    y_true = (data.loc[predictions.index, 'encoded_value'] == code).astype(int)
    curve_prob_true, curve_prob_pred = calibration_curve(
        y_true=y_true,
        y_prob=y_prob,
        n_bins=10
    )
    
    disp = CalibrationDisplay(
        prob_true=curve_prob_true, prob_pred=curve_prob_pred, y_prob=y_prob
    )
    disp.plot(name=level, ax=ax, color='C' + str(code))

    ax.set_title(f"{level}")
    ax.set_xlabel("Mean predicted probability")
    ax.set_ylabel("Fraction of positives")
    ax.get_lines()[0].set_color('white')
# Calibration plots: one-vs-rest
# ==============================================================================
for level, code in {'low': 0, 'medium': 1, 'high': 2}.items():

    fig, ax = plt.subplots(figsize=(6, 2.5))

    y_prob = predictions.loc[:, f'{level}_proba'].rename(code)
    y_true = (data.loc[predictions.index, 'encoded_value'] == code).astype(int)
    curve_prob_true, curve_prob_pred = calibration_curve(
        y_true=y_true,
        y_prob=y_prob,
        n_bins=10
    )
    
    disp = CalibrationDisplay(
        prob_true=curve_prob_true, prob_pred=curve_prob_pred, y_prob=y_prob
    )
    disp.plot(name=level, ax=ax, color='C' + str(code))

    ax.set_title(f"{level}")
    ax.set_xlabel("Mean predicted probability")
    ax.set_ylabel("Fraction of positives")
    ax.get_lines()[0].set_color('white')

The calibrations curves show that the classifier is not well calibrated, as the curves deviate significantly from the diagonal line representing perfect calibration. For example, for the class 'low', when the classifier predicts a probability of 0.8, the actual frequency of the positive class is only around 0.5, indicating overconfidence in its predictions for that class.

Calibration¶

The calibration of a classification model refers to the process of adjusting the predicted probabilities so that they align with the actual observed outcomes. In other words, calibration corrects the model when it tends to systematically overestimate or underestimate probabilities.

A model is said to be perfectly calibrated if, for any predicted probability $p \in [0, 1]$, the proportion of correctly classified observations among those predicted with probability $p$ is exactly $p$. Formally:

$$ P(\hat{Y} = Y \mid \hat{P} = p) = p, \quad \forall p \in [0, 1] $$

where:

$\hat{Y}$ is the predicted class,
$Y$ is the true class, and
$\hat{P}$ is the predicted probability of the positive (or target) class.

For example, if we look at all cases where the model predicts a probability of $\hat{P} = 0.8$, we would expect that about 80% of those cases truly belong to the positive class.

When a model is perfectly calibrated, plotting the observed frequencies against the predicted probabilities produces a diagonal line (the identity function) across the range $[0, 1]$:

$$ \text{Observed frequency} = \text{Predicted probability} $$

The CalibratedClassifierCV class in sklearn is used to calibrate a classifier. Since skforecast is compatible with scikit-learn, CalibratedClassifierCV can be used directly into ForecasterRecursiveClassifier.

In [26]:

Copied!





# Forecaster with calibrated classifier
# ==============================================================================
estimator = CalibratedClassifierCV(
                estimator = HistGradientBoostingClassifier(
                                **best_params,
                                random_state=123
                            ),
                method   = 'sigmoid',
                ensemble = False,  # False to speed up prediction
                cv       = 3,
            )

forecaster = ForecasterRecursiveClassifier(
    estimator       = estimator,
    lags            = best_lags,
    window_features = RollingFeaturesClassification(stats=['proportion'], window_sizes=[72]),
    features_encoding= 'auto',
)


cv = TimeSeriesFold(steps = 24, initial_train_size = end_val)
metric, predictions = backtesting_forecaster(
                          forecaster = forecaster,
                          y          = data['Demand'],
                          exog       = data[['Temperature', 'Holiday', 'Hour_of_day']],
                          cv         = cv,
                          metric     = 'accuracy_score'
                      )
display(metric)
predictions
# Forecaster with calibrated classifier
# ==============================================================================
estimator = CalibratedClassifierCV(
                estimator = HistGradientBoostingClassifier(
                                **best_params,
                                random_state=123
                            ),
                method   = 'sigmoid',
                ensemble = False,  # False to speed up prediction
                cv       = 3,
            )

forecaster = ForecasterRecursiveClassifier(
    estimator       = estimator,
    lags            = best_lags,
    window_features = RollingFeaturesClassification(stats=['proportion'], window_sizes=[72]),
    features_encoding= 'auto',
)


cv = TimeSeriesFold(steps = 24, initial_train_size = end_val)
metric, predictions = backtesting_forecaster(
                          forecaster = forecaster,
                          y          = data['Demand'],
                          exog       = data[['Temperature', 'Holiday', 'Hour_of_day']],
                          cv         = cv,
                          metric     = 'accuracy_score'
                      )
display(metric)
predictions

  0%|          | 0/61 [00:00<?, ?it/s]

	accuracy_score
0	0.843164

Out[26]:

	fold	pred	high_proba	low_proba	medium_proba
2014-11-01 00:00:00	0	medium	0.022222	0.022406	0.955372
2014-11-01 01:00:00	0	medium	0.020423	0.065693	0.913884
2014-11-01 02:00:00	0	medium	0.009767	0.044904	0.945330
2014-11-01 03:00:00	0	medium	0.018259	0.100651	0.881091
2014-11-01 04:00:00	0	medium	0.014561	0.012693	0.972746
...	...	...	...	...	...
2014-12-30 20:00:00	59	low	0.001329	0.627120	0.371551
2014-12-30 21:00:00	59	low	0.001431	0.913995	0.084574
2014-12-30 22:00:00	59	medium	0.000931	0.426978	0.572091
2014-12-30 23:00:00	59	medium	0.003846	0.098475	0.897678
2014-12-31 00:00:00	60	medium	0.027835	0.060985	0.911179

1441 rows × 5 columns

In [27]:

Copied!





# Calibration plots: one-vs-rest
# ==============================================================================
for level, code in {'low': 0, 'medium': 1, 'high': 2}.items():
    
    fig, ax = plt.subplots(figsize=(6, 2.5))

    y_prob = predictions.loc[:, f'{level}_proba'].rename(code)
    y_true = (data.loc[predictions.index, 'encoded_value'] == code).astype(int)
    prob_true, prob_pred = calibration_curve(
        y_true=y_true,
        y_prob=y_prob,
        n_bins=10
    )

    disp = CalibrationDisplay(prob_true=prob_true, prob_pred=prob_pred, y_prob=y_prob)
    disp.plot(name=level, ax=ax, color='C' + str(code))

    ax.set_title(f"{level}")
    ax.set_xlabel("Mean predicted probability")
    ax.set_ylabel("Fraction of positives")
    ax.get_lines()[0].set_color('white')
# Calibration plots: one-vs-rest
# ==============================================================================
for level, code in {'low': 0, 'medium': 1, 'high': 2}.items():
    
    fig, ax = plt.subplots(figsize=(6, 2.5))

    y_prob = predictions.loc[:, f'{level}_proba'].rename(code)
    y_true = (data.loc[predictions.index, 'encoded_value'] == code).astype(int)
    prob_true, prob_pred = calibration_curve(
        y_true=y_true,
        y_prob=y_prob,
        n_bins=10
    )

    disp = CalibrationDisplay(prob_true=prob_true, prob_pred=prob_pred, y_prob=y_prob)
    disp.plot(name=level, ax=ax, color='C' + str(code))

    ax.set_title(f"{level}")
    ax.set_xlabel("Mean predicted probability")
    ax.set_ylabel("Fraction of positives")
    ax.get_lines()[0].set_color('white')

After applying calibration techniques, the calibration curves are much closer to the diagonal line, indicating that the predicted probabilities are now more aligned with the actual observed frequencies. For instance, for the class 'low', when the classifier predicts a probability of 0.8, the actual frequency of the positive class is now around 0.75, showing a significant improvement compared to before calibration.