Drift detection¶

In the context of machine learning, data drift refers to a change in the statistical properties of the input data over time compared to the data on which the model was originally trained. This can cause the model’s performance to deteriorate, since it may no longer generalize well to new, unseen data.

There are several types of data drift, including:

Covariate Drift (Feature Drift): The distribution of the input features changes, but the relationship between features and target remains the same. Example: A model was trained when a feature had values in a certain range. Over time, if that feature shifts to a different range, covariate drift occurs.
Prior Probability Drift (Label Drift): The distribution of the target variable changes. Example: A model trained to predict energy consumption during a season may fail if seasonal patterns change due to external factors.
Concept Drift: The relationship between input features and the target variable changes. Example: A model predicting energy consumption from weather data might fail if new technologies or behaviors alter how weather affects energy usage.

Detecting and addressing data drift is crucial for maintaining the model performance in production. Common strategies include:

Monitoring input data during the prediction phase.
Monitoring model performance (accuracy, precision, recall, etc.) over time.
Periodically retraining the model with new data to adapt to changes.

RangeDriftDetector

Skforecast provides the class RangeDriftDetector to detect covariate drift in both single and multiple time series, as well as in exogenous variables.

The detector checks whether the input data (lags and exogenous variables) used to predict new values fall within the range of the data used to train the model.

Its API follows the same design as the forecasters:

The data used to train a forecaster can also be used to fit the RangeDriftDetector.
The data passed for prediction can be used to check for drift.

✎ Note

This module is in active development, we expect to add more features and improvements in future releases.

Libraries¶

In [1]:

Copied!





# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from sklearn.ensemble import HistGradientBoostingRegressor

from skforecast.datasets import fetch_dataset
from skforecast.recursive import ForecasterRecursive
from skforecast.drift_detection import RangeDriftDetector
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from sklearn.ensemble import HistGradientBoostingRegressor

from skforecast.datasets import fetch_dataset
from skforecast.recursive import ForecasterRecursive
from skforecast.drift_detection import RangeDriftDetector

Detecting out-of-range values in a single series¶

The RangeDriftDetector checks whether the values of a time series remain consistent with the data seen during training.

For numeric variables, it verifies that each new value falls within the minimum and maximum range of the training data. Values outside this range are flagged as potential drift.
For categorical variables, it checks whether each new category was observed during training. Unseen categories are flagged as potential drift.

This mechanism allows you to quickly identify when the model is receiving inputs that differ from those it was trained on, helping you decide whether to retrain the model or adjust preprocessing.

In [2]:

Copied!





# Simulated data
# ==============================================================================
rgn = np.random.default_rng(123)
y_train = pd.Series(
    rgn.normal(loc=10, scale=2, size=100),
    index=pd.date_range(start="2020-01-01", periods=100),
    name="y",
)
exog_train = pd.DataFrame(
    {
        "exog_1": rgn.normal(loc=10, scale=2, size=100),
        "exog_2": rgn.choice(["A", "B", "C", "D", "E"], size=100),
    },
    index=y_train.index,
)

display(y_train.head())
display(exog_train.head())
# Simulated data
# ==============================================================================
rgn = np.random.default_rng(123)
y_train = pd.Series(
    rgn.normal(loc=10, scale=2, size=100),
    index=pd.date_range(start="2020-01-01", periods=100),
    name="y",
)
exog_train = pd.DataFrame(
    {
        "exog_1": rgn.normal(loc=10, scale=2, size=100),
        "exog_2": rgn.choice(["A", "B", "C", "D", "E"], size=100),
    },
    index=y_train.index,
)

display(y_train.head())
display(exog_train.head())

2020-01-01     8.021757
2020-01-02     9.264427
2020-01-03    12.575851
2020-01-04    10.387949
2020-01-05    11.840462
Freq: D, Name: y, dtype: float64

	exog_1	exog_2
2020-01-01	8.968465	B
2020-01-02	13.316227	B
2020-01-03	9.405475	A
2020-01-04	7.233246	A
2020-01-05	9.437591	A

In [3]:

Copied!





# Train RangeDriftDetector
# ==============================================================================
detector = RangeDriftDetector()
detector.fit(y=y_train, exog=exog_train)
detector
# Train RangeDriftDetector
# ==============================================================================
detector = RangeDriftDetector()
detector.fit(y=y_train, exog=exog_train)
detector

Out[3]:

RangeDriftDetector

General Information

Fitted series: y
Fitted exogenous: exog_1, exog_2
Series-specific exogenous: False
Is fitted: True

Series value ranges

{'y': (5.5850578036003915, 14.579819894629157)}

Exogenous value ranges

{'exog_1': (4.5430286262543085, 14.531041199734418), 'exog_2': {'D', 'E', 'B', 'C', 'A'}}

🛈 API Reference 🗎 User Guide

Lets assume the model is deployed in production and new data is being used to forecast future values. We simulate a covariate drift in the target series and in the exogenous variables to illustrate how to use the RangeDriftDetector class to detect it.

In [4]:

Copied!





# Prediction with drifted data
# ==============================================================================
last_window = pd.Series(
    [6.6, 7.5, 100, 9.3, 10.2], name="y"
)  # Value 100 is out of range
exog_predict = pd.DataFrame(
    {
        "exog_1": [8, 9, 10, 70, 12],         # Value 70 is out of range
        "exog_2": ["A", "B", "C", "D", "W"],  # Value 'W' is out of range
    }
)

flag_out_of_range, series_out_of_range, exog_out_of_range = detector.predict(
    last_window       = last_window,
    exog              = exog_predict,
    verbose           = True,
    suppress_warnings = False
)

print("Out of range detected  :", flag_out_of_range)
print("Series out of range    :", series_out_of_range)
print("Exogenous out of range :", exog_out_of_range)
# Prediction with drifted data
# ==============================================================================
last_window = pd.Series(
    [6.6, 7.5, 100, 9.3, 10.2], name="y"
)  # Value 100 is out of range
exog_predict = pd.DataFrame(
    {
        "exog_1": [8, 9, 10, 70, 12],         # Value 70 is out of range
        "exog_2": ["A", "B", "C", "D", "W"],  # Value 'W' is out of range
    }
)

flag_out_of_range, series_out_of_range, exog_out_of_range = detector.predict(
    last_window       = last_window,
    exog              = exog_predict,
    verbose           = True,
    suppress_warnings = False
)

print("Out of range detected  :", flag_out_of_range)
print("Series out of range    :", series_out_of_range)
print("Exogenous out of range :", exog_out_of_range)

╭────────────────────────────── FeatureOutOfRangeWarning ──────────────────────────────╮
│ 'y' has values outside the range seen during training [5.58506, 14.57982]. This may  │
│ affect the accuracy of the predictions.                                              │
│                                                                                      │
│ Category : skforecast.exceptions.FeatureOutOfRangeWarning                            │
│ Location :                                                                           │
│ c:\Users\jaesc2\Miniconda3\envs\skforecast_py12\Lib\site-packages\skforecast\drift_d │
│ etection\_range_drift.py:283                                                         │
│ Suppress : warnings.simplefilter('ignore', category=FeatureOutOfRangeWarning)        │
╰──────────────────────────────────────────────────────────────────────────────────────╯

╭────────────────────────────── FeatureOutOfRangeWarning ──────────────────────────────╮
│ 'exog_1' has values outside the range seen during training [4.54303, 14.53104]. This │
│ may affect the accuracy of the predictions.                                          │
│                                                                                      │
│ Category : skforecast.exceptions.FeatureOutOfRangeWarning                            │
│ Location :                                                                           │
│ c:\Users\jaesc2\Miniconda3\envs\skforecast_py12\Lib\site-packages\skforecast\drift_d │
│ etection\_range_drift.py:283                                                         │
│ Suppress : warnings.simplefilter('ignore', category=FeatureOutOfRangeWarning)        │
╰──────────────────────────────────────────────────────────────────────────────────────╯

╭────────────────────────────── FeatureOutOfRangeWarning ──────────────────────────────╮
│ 'exog_2' has values not seen during training. Seen values: {'D', 'E', 'B', 'C',      │
│ 'A'}. This may affect the accuracy of the predictions.                               │
│                                                                                      │
│ Category : skforecast.exceptions.FeatureOutOfRangeWarning                            │
│ Location :                                                                           │
│ c:\Users\jaesc2\Miniconda3\envs\skforecast_py12\Lib\site-packages\skforecast\drift_d │
│ etection\_range_drift.py:283                                                         │
│ Suppress : warnings.simplefilter('ignore', category=FeatureOutOfRangeWarning)        │
╰──────────────────────────────────────────────────────────────────────────────────────╯

╭───────────────────────────── Out-of-range summary ──────────────────────────────╮
│ Series:                                                                         │
│ 'y' has values outside the observed range [5.58506, 14.57982].                  │
│                                                                                 │
│ Exogenous Variables:                                                            │
│ 'exog_1' has values outside the observed range [4.54303, 14.53104].             │
│ 'exog_2' has values not seen during training. Seen values: {'D', 'E', 'B', 'C', │
│ 'A'}.                                                                           │
╰─────────────────────────────────────────────────────────────────────────────────╯

Out of range detected  : True
Series out of range    : ['y']
Exogenous out of range : ['exog_1', 'exog_2']

Detecting out-of-range values in multiple series¶

The same process applies when modeling multiple time series.

For each series, the RangeDriftDetector checks whether the new values remain within the range of the training data.
If exogenous variables are included, they are checked grouped by series, ensuring that drift is detected in the correct context.

This allows you to monitor drift at the per-series level, making it easier to spot issues in specific series without being misled by aggregated results.

In [5]:

Copied!





# Simulated data - Multiple time series
# ==============================================================================
idx = pd.MultiIndex.from_product(
    [
        ["series_1", "series_2", "series_3"],
        pd.date_range(start="2020-01-01", periods=3),
    ],
    names=["series_id", "datetime"],
)
series_train = pd.DataFrame(
    {"values": [1, 2, 3, 10, 20, 30, 100, 200, 300]}, index=idx
)
exog_train = pd.DataFrame(
    {
        "exog_1": [5.0, 6.0, 7.0, 15.0, 25.0, 35.0, 150.0, 250.0, 350.0],
        "exog_2": ["A", "B", "C", "D", "E", "F", "G", "H", "I"],
    },
    index=idx,
)

display(series_train)
display(exog_train)
# Simulated data - Multiple time series
# ==============================================================================
idx = pd.MultiIndex.from_product(
    [
        ["series_1", "series_2", "series_3"],
        pd.date_range(start="2020-01-01", periods=3),
    ],
    names=["series_id", "datetime"],
)
series_train = pd.DataFrame(
    {"values": [1, 2, 3, 10, 20, 30, 100, 200, 300]}, index=idx
)
exog_train = pd.DataFrame(
    {
        "exog_1": [5.0, 6.0, 7.0, 15.0, 25.0, 35.0, 150.0, 250.0, 350.0],
        "exog_2": ["A", "B", "C", "D", "E", "F", "G", "H", "I"],
    },
    index=idx,
)

display(series_train)
display(exog_train)

		values
series_id	datetime
series_1	2020-01-01	1
	2020-01-02	2
	2020-01-03	3
series_2	2020-01-01	10
	2020-01-02	20
	2020-01-03	30
series_3	2020-01-01	100
	2020-01-02	200
	2020-01-03	300

		exog_1	exog_2
series_id	datetime
series_1	2020-01-01	5.0	A
	2020-01-02	6.0	B
	2020-01-03	7.0	C
series_2	2020-01-01	15.0	D
	2020-01-02	25.0	E
	2020-01-03	35.0	F
series_3	2020-01-01	150.0	G
	2020-01-02	250.0	H
	2020-01-03	350.0	I

In [6]:

Copied!





# Train RangeDriftDetector - Multiple time series
# ==============================================================================
detector = RangeDriftDetector()
detector.fit(series=series_train, exog=exog_train)
detector
# Train RangeDriftDetector - Multiple time series
# ==============================================================================
detector = RangeDriftDetector()
detector.fit(series=series_train, exog=exog_train)
detector

Out[6]:

RangeDriftDetector

General Information

Fitted series: series_1, series_2, series_3
Fitted exogenous: exog_1, exog_2
Series-specific exogenous: True
Is fitted: True

Series value ranges

{'series_1': (1.0, 3.0), 'series_2': (10.0, 30.0), 'series_3': (100.0, 300.0)}

Exogenous value ranges

{'series_1': {'exog_1': (5.0, 7.0), 'exog_2': {'A', 'C', 'B'}}, 'series_2': {'exog_1': (15.0, 35.0), 'exog_2': {'E', 'D', 'F'}}, 'series_3': {'exog_1': (150.0, 350.0), 'exog_2': {'I', 'H', 'G'}}}

🛈 API Reference 🗎 User Guide

In [7]:

Copied!





# Prediction with drifted data - Multiple time series
# ==============================================================================
last_window = pd.DataFrame(
    {
        "series_1": np.array([1.5, 2.3]),
        "series_2": np.array([100, 20]),  # Value 100 is out of range
        "series_3": np.array([110, 200]),
    },
    index=pd.date_range(start="2020-01-02", periods=2),
)

idx = pd.MultiIndex.from_product(
    [
        ["series_1", "series_2", "series_3"],
        pd.date_range(start="2020-01-04", periods=2),
    ],
    names=["series_id", "datetime"],
)
exog_predict = pd.DataFrame(
    {
        "exog_1": [5.0, 6.1, 10, 70, 220, 290], 
        "exog_2": ["A", "B", "D", "F", "W", "E"],
    },
    index=idx,
)

display(last_window)
display(exog_predict)
# Prediction with drifted data - Multiple time series
# ==============================================================================
last_window = pd.DataFrame(
    {
        "series_1": np.array([1.5, 2.3]),
        "series_2": np.array([100, 20]),  # Value 100 is out of range
        "series_3": np.array([110, 200]),
    },
    index=pd.date_range(start="2020-01-02", periods=2),
)

idx = pd.MultiIndex.from_product(
    [
        ["series_1", "series_2", "series_3"],
        pd.date_range(start="2020-01-04", periods=2),
    ],
    names=["series_id", "datetime"],
)
exog_predict = pd.DataFrame(
    {
        "exog_1": [5.0, 6.1, 10, 70, 220, 290], 
        "exog_2": ["A", "B", "D", "F", "W", "E"],
    },
    index=idx,
)

display(last_window)
display(exog_predict)

	series_1	series_2	series_3
2020-01-02	1.5	100	110
2020-01-03	2.3	20	200

		exog_1	exog_2
series_id	datetime
series_1	2020-01-04	5.0	A
series_1	2020-01-05	6.1	B
series_2	2020-01-04	10.0	D
series_2	2020-01-05	70.0	F
series_3	2020-01-04	220.0	W
series_3	2020-01-05	290.0	E

In [8]:

Copied!





# Prediction with drifted data - Multiple time series
# ==============================================================================
flag_out_of_range, series_out_of_range, exog_out_of_range = detector.predict(
    last_window       = last_window, 
    exog              = exog_predict, 
    verbose           = True, 
    suppress_warnings = False
)

print("Out of range detected  :", flag_out_of_range)
print("Series out of range    :", series_out_of_range)
print("Exogenous out of range :", exog_out_of_range)
# Prediction with drifted data - Multiple time series
# ==============================================================================
flag_out_of_range, series_out_of_range, exog_out_of_range = detector.predict(
    last_window       = last_window, 
    exog              = exog_predict, 
    verbose           = True, 
    suppress_warnings = False
)

print("Out of range detected  :", flag_out_of_range)
print("Series out of range    :", series_out_of_range)
print("Exogenous out of range :", exog_out_of_range)

╭────────────────────────────── FeatureOutOfRangeWarning ──────────────────────────────╮
│ 'series_2' has values outside the range seen during training [10.00000, 30.00000].   │
│ This may affect the accuracy of the predictions.                                     │
│                                                                                      │
│ Category : skforecast.exceptions.FeatureOutOfRangeWarning                            │
│ Location :                                                                           │
│ c:\Users\jaesc2\Miniconda3\envs\skforecast_py12\Lib\site-packages\skforecast\drift_d │
│ etection\_range_drift.py:283                                                         │
│ Suppress : warnings.simplefilter('ignore', category=FeatureOutOfRangeWarning)        │
╰──────────────────────────────────────────────────────────────────────────────────────╯

╭────────────────────────────── FeatureOutOfRangeWarning ──────────────────────────────╮
│ 'series_2': 'exog_1' has values outside the range seen during training [15.00000,    │
│ 35.00000]. This may affect the accuracy of the predictions.                          │
│                                                                                      │
│ Category : skforecast.exceptions.FeatureOutOfRangeWarning                            │
│ Location :                                                                           │
│ c:\Users\jaesc2\Miniconda3\envs\skforecast_py12\Lib\site-packages\skforecast\drift_d │
│ etection\_range_drift.py:283                                                         │
│ Suppress : warnings.simplefilter('ignore', category=FeatureOutOfRangeWarning)        │
╰──────────────────────────────────────────────────────────────────────────────────────╯

╭────────────────────────────── FeatureOutOfRangeWarning ──────────────────────────────╮
│ 'series_3': 'exog_2' has values not seen during training. Seen values: {'I', 'H',    │
│ 'G'}. This may affect the accuracy of the predictions.                               │
│                                                                                      │
│ Category : skforecast.exceptions.FeatureOutOfRangeWarning                            │
│ Location :                                                                           │
│ c:\Users\jaesc2\Miniconda3\envs\skforecast_py12\Lib\site-packages\skforecast\drift_d │
│ etection\_range_drift.py:283                                                         │
│ Suppress : warnings.simplefilter('ignore', category=FeatureOutOfRangeWarning)        │
╰──────────────────────────────────────────────────────────────────────────────────────╯

╭────────────────────────────── Out-of-range summary ──────────────────────────────╮
│ Series:                                                                          │
│ 'series_2' has values outside the observed range [10.00000, 30.00000].           │
│                                                                                  │
│ Exogenous Variables:                                                             │
│ 'series_2': 'exog_1' has values outside the observed range [15.00000, 35.00000]. │
│ 'series_3': 'exog_2' has values not seen during training. Seen values: {'I',     │
│ 'H', 'G'}.                                                                       │
╰──────────────────────────────────────────────────────────────────────────────────╯

Out of range detected  : True
Series out of range    : ['series_2']
Exogenous out of range : {'series_2': ['exog_1'], 'series_3': ['exog_2']}

Combining RangeDriftDetector with Forecasters¶

When deploying a forecaster in production, it is good practice to pair it with a drift detector. This ensures that both are trained on the same dataset, allowing the drift detector to verify the input data before the forecaster makes predictions.

In [9]:

Copied!





# Data
# ==============================================================================
data = fetch_dataset(name='h2o_exog')
data.index.name = 'datetime'
data.head(3)
# Data
# ==============================================================================
data = fetch_dataset(name='h2o_exog')
data.index.name = 'datetime'
data.head(3)

╭─────────────────────────────────── h2o_exog ────────────────────────────────────╮
│ Description:                                                                    │
│ Monthly expenditure ($AUD) on corticosteroid drugs that the Australian health   │
│ system had between 1991 and 2008. Two additional variables (exog_1, exog_2) are │
│ simulated.                                                                      │
│                                                                                 │
│ Source:                                                                         │
│ Hyndman R (2023). fpp3: Data for Forecasting: Principles and Practice (3rd      │
│ Edition). http://pkg.robjhyndman.com/fpp3package/,                              │
│ https://github.com/robjhyndman/fpp3package, http://OTexts.com/fpp3.             │
│                                                                                 │
│ URL:                                                                            │
│ https://raw.githubusercontent.com/skforecast/skforecast-                        │
│ datasets/main/data/h2o_exog.csv                                                 │
│                                                                                 │
│ Shape: 195 rows x 3 columns                                                     │
╰─────────────────────────────────────────────────────────────────────────────────╯

Out[9]:

	y	exog_1	exog_2
datetime
1992-04-01	0.379808	0.958792	1.166029
1992-05-01	0.361801	0.951993	1.117859
1992-06-01	0.410534	0.952955	1.067942

In [10]:

Copied!





# Train Forecaster and RangeDriftDetector
# ==============================================================================
steps = 36
data_train = data.iloc[:-steps, :]
data_test  = data.iloc[-steps:, :]

forecaster = ForecasterRecursive(
                 regressor = HistGradientBoostingRegressor(random_state=123),
                 lags      = 15
             )
detector = RangeDriftDetector()

forecaster.fit(
    y    = data_train['y'],
    exog = data_train[['exog_1', 'exog_2']]
)
detector.fit(
    series = data_train['y'],
    exog   = data_train[['exog_1', 'exog_2']]
)
# Train Forecaster and RangeDriftDetector
# ==============================================================================
steps = 36
data_train = data.iloc[:-steps, :]
data_test  = data.iloc[-steps:, :]

forecaster = ForecasterRecursive(
                 regressor = HistGradientBoostingRegressor(random_state=123),
                 lags      = 15
             )
detector = RangeDriftDetector()

forecaster.fit(
    y    = data_train['y'],
    exog = data_train[['exog_1', 'exog_2']]
)
detector.fit(
    series = data_train['y'],
    exog   = data_train[['exog_1', 'exog_2']]
)

If you use the last_window stored in the Forecaster, drift detection is unnecessary because it corresponds to the final window of the training data. In production environments, however, you may supply an external last_window from a different time period. In that case, drift detection is recommended.

In the example below, the external last_window is identical to the final training window, so no drift will be detected.

In [11]:

Copied!





# Last window (same as forecaster.last_window_)
# ==============================================================================
last_window = data_train['y'].iloc[-forecaster.max_lag:]
last_window
# Last window (same as forecaster.last_window_)
# ==============================================================================
last_window = data_train['y'].iloc[-forecaster.max_lag:]
last_window

Out[11]:

datetime
2004-04-01    0.739986
2004-05-01    0.795129
2004-06-01    0.856803
2004-07-01    1.001593
2004-08-01    0.994864
2004-09-01    1.134432
2004-10-01    1.181011
2004-11-01    1.216037
2004-12-01    1.257238
2005-01-01    1.170690
2005-02-01    0.597639
2005-03-01    0.652590
2005-04-01    0.670505
2005-05-01    0.695248
2005-06-01    0.842263
Freq: MS, Name: y, dtype: float64

In [12]:

Copied!





# Check data with RangeDriftDetector and predict with Forecaster
# ==============================================================================
detector.predict(
    last_window       = last_window,
    exog              = data_test[['exog_1', 'exog_2']],
    verbose           = True,
    suppress_warnings = False
)

predictions = forecaster.predict(
                  steps       = 36,
                  last_window = last_window,
                  exog        = data_test[['exog_1', 'exog_2']]
              )
# Check data with RangeDriftDetector and predict with Forecaster
# ==============================================================================
detector.predict(
    last_window       = last_window,
    exog              = data_test[['exog_1', 'exog_2']],
    verbose           = True,
    suppress_warnings = False
)

predictions = forecaster.predict(
                  steps       = 36,
                  last_window = last_window,
                  exog        = data_test[['exog_1', 'exog_2']]
              )

╭───────────────── Out-of-range summary ─────────────────╮
│ Series:                                                │
│ No series with out-of-range values found.              │
│                                                        │
│ Exogenous Variables:                                   │
│ No exogenous variables with out-of-range values found. │
╰────────────────────────────────────────────────────────╯