Feature selection¶

Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: to simplify models to make them easier to interpret, to reduce training time, to avoid the curse of dimensionality, to improve generalization by reducing overfitting (formally, variance reduction), and others.

Skforecast is compatible with the feature selection methods implemented in the scikit-learn library. There are several methods for feature selection, but the most common are:

Recursive feature elimination

Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features, and the importance of each feature is obtained either by a specific attribute (such as coef_, feature_importances_) or by a callable. Then, the least important features are pruned from the current set of features. This procedure is repeated recursively on the pruned set until the desired number of features to select is eventually reached. RFECV performs RFE in a cross-validation loop to find the optimal number of features.

Sequential Feature Selection

Sequential Feature Selection (SFS) can be either forward or backward, with the direction parameter controlling whether forward or backward SFS is used.

Forward-SFS is a greedy procedure that iteratively finds the best new feature to add to the set of selected features. It starts with zero features and finds the one that maximizes a cross-validated score when an estimator is trained on that single feature. Once this first feature is selected, the procedure is repeated, adding one new feature to the set of selected features. The procedure stops when the desired number of selected features is reached, as determined by the n_features_to_select parameter.
Backward-SFS follows the same idea, but works in the opposite direction. Instead of starting with no features and greedily adding features, it starts with all features and greedily removes features from the set.

In general, forward and backward selection do not produce equivalent results. Also, one can be much faster than the other depending on the requested number of selected features: if we have 10 features and ask for 7 selected features, forward selection would need to perform 7 iterations while backward selection would only need to perform 3.

SFS differs does not require the underlying model to expose a coef_ or feature_importances_ attribute. However, it may be slower compared to the other approaches, considering that more models have to be evaluated. For example in backward selection, the iteration going from $m$ features to $m - 1$ features using k-fold cross-validation requires fitting $m * k$ models to be evaluated.

Feature selection based on threshold (SelectFromModel)

SelectFromModel can be used along with any estimator that has a coef_ or feature_importances_ attribute after fitting. Features are considered unimportant and removed, if the corresponding coef_ or feature_importances_ values are below the given threshold parameter. In addition to specifying the threshold numerically, there are built-in heuristics for finding a threshold using a string argument. Available heuristics are 'mean', 'median' and float multiples of these, such as '0.1*mean'.

This method is very fast compared to the others because it does not require any additional model training. However, it does not evaluate the impact of feature removal on the model. It is often used for an initial selection before applying another more computationally expensive feature selection method.

💡 Tip

Feature selection is a powerful tool for improving the performance of machine learning models. However, it is computationally expensive and can be time-consuming. Since the goal is to find the best subset of features, not the best model, it is not necessary to use the entire data set or a highly complex model. Instead, it is recommended to use a small subset of the data and a simple model. Once the best subset of features has been identified, the model can then be trained using the entire dataset and a more complex configuration.

For example, in this use case, the model is an LGMBRegressor with 900 trees and a maximum depth of 7. However, to find the best subset of features, only 100 trees and a maximum depth of 5 are used.

⚠ Warning

The current version of skforecast only supports feature selection for forecasters of type ForecasterAutoreg and ForecasterAutoregCustom for univariate forecasts, and ForecasterAutoregMultiSeries and ForecasterAutoregMultiSeriesCustom for global forecast models.

Feature selection with skforecast¶

The select_features and select_features_multiseries functions can be used to select the best subset of features (autoregressive and exogenous variables). These functions are compatible with the feature selection methods implemented in the scikit-learn library. The available parameters are:

forecaster: Forecaster of type ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiSeries or ForecasterAutoregMultiSeriesCustom.
selector: Feature selector from sklearn.feature_selection. For example, RFE or RFECV.
y or series: Target time series to which the feature selection will be applied.
exog: Exogenous variables.
select_only: Decide what type of features to include in the selection process.
- If 'autoreg', only autoregressive features (lags or custom predictors) are evaluated by the selector. All exogenous features are included in the output (selected_exog).
- If 'exog', only exogenous features are evaluated without the presence of autoregressive features. All autoregressive features are included in the output (selected_autoreg).
- If None, all features are evaluated by the selector.
force_inclusion: Features to force include in the final list of selected features.
- If list, list of feature names to force include.
- If str, regular expression to identify features to force include. For example, if force_inclusion="^sun_", all features that begin with "sun_" will be included in the final list of selected features.
subsample: Proportion of records to use for feature selection.
random_state: Sets a seed for the random subsample so that the subsampling process is always deterministic.
verbose: Print information about feature selection process.

These functions return two list:

selected_autoreg: List of selected autoregressive features.
selected_exog: List of selected exogenous features.

Libraries¶

In [1]:

Copied!





# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from lightgbm import LGBMRegressor
from sklearn.feature_selection import RFECV
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.feature_selection import SelectFromModel
from sklearn.model_selection import ShuffleSplit
from skforecast.datasets import fetch_dataset

from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import select_features

from skforecast.ForecasterAutoregMultiSeries import ForecasterAutoregMultiSeries
from skforecast.model_selection_multiseries import select_features_multiseries
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from lightgbm import LGBMRegressor
from sklearn.feature_selection import RFECV
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.feature_selection import SelectFromModel
from sklearn.model_selection import ShuffleSplit
from skforecast.datasets import fetch_dataset

from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import select_features

from skforecast.ForecasterAutoregMultiSeries import ForecasterAutoregMultiSeries
from skforecast.model_selection_multiseries import select_features_multiseries

Data¶

In [2]:

Copied!





# Download data
# ==============================================================================
data = fetch_dataset(name="bike_sharing_extended_features")
data.head(3)
# Download data
# ==============================================================================
data = fetch_dataset(name="bike_sharing_extended_features")
data.head(3)

bike_sharing_extended_features
------------------------------
Hourly usage of the bike share system in the city of Washington D.C. during the
years 2011 and 2012. In addition to the number of users per hour, the dataset
was enriched by introducing supplementary features. Addition includes calendar-
based variables (day of the week, hour of the day, month, etc.), indicators for
sunlight, incorporation of rolling temperature averages, and the creation of
polynomial features generated from variable pairs. All cyclic variables are
encoded using sine and cosine functions to ensure accurate representation.
Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository.
https://doi.org/10.24432/C5W894.
Shape of the dataset: (17352, 90)

Out[2]:

	users	weather	month_sin	month_cos	week_of_year_sin	week_of_year_cos	week_day_sin	week_day_cos	hour_day_sin	hour_day_cos	...	temp_roll_mean_1_day	temp_roll_mean_7_day	temp_roll_max_1_day	temp_roll_min_1_day	temp_roll_max_7_day	temp_roll_min_7_day	holiday_previous_day	holiday_next_day	temp	holiday
date_time
2011-01-08 00:00:00	25.0	mist	0.5	0.866025	0.120537	0.992709	-0.781832	0.62349	0.258819	0.965926	...	8.063334	10.127976	9.02	6.56	18.86	4.92	0.0	0.0	7.38	0.0
2011-01-08 01:00:00	16.0	mist	0.5	0.866025	0.120537	0.992709	-0.781832	0.62349	0.500000	0.866025	...	8.029166	10.113334	9.02	6.56	18.86	4.92	0.0	0.0	7.38	0.0
2011-01-08 02:00:00	16.0	mist	0.5	0.866025	0.120537	0.992709	-0.781832	0.62349	0.707107	0.707107	...	7.995000	10.103572	9.02	6.56	18.86	4.92	0.0	0.0	7.38	0.0

3 rows × 90 columns

In [3]:

Copied!





# Data selection (reduce data size to speed up the example)
# ==============================================================================
data = data.drop(columns="weather")
data = data.loc["2012-01-01 00:00:00":]
# Data selection (reduce data size to speed up the example)
# ==============================================================================
data = data.drop(columns="weather")
data = data.loc["2012-01-01 00:00:00":]

Create forecaster¶

A forecasting model is created to predict the number of users using the last 48 values (last two days) and the exogenous features available in the dataset.

In [4]:

Copied!





# Create forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = LGBMRegressor(
                                 n_estimators = 900,
                                 random_state = 15926,
                                 max_depth    = 7,
                                 verbose      = -1
                             ),
                 lags      = 48,
             )
# Create forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = LGBMRegressor(
                                 n_estimators = 900,
                                 random_state = 15926,
                                 max_depth    = 7,
                                 verbose      = -1
                             ),
                 lags      = 48,
             )

Feature selection with Recursive Feature Elimination (RFECV)¶

Selection of autoregressive and exogenous features¶

By default, the select_features function selects the best subset of autoregressive and exogenous features.

In [5]:

Copied!





# Feature selection (autoregressive and exog) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

selector = RFECV(
    estimator=regressor, step=1, cv=3, min_features_to_select=25, n_jobs=-1
)

selected_autoreg, selected_exog = select_features(
    forecaster      = forecaster,
    selector        = selector,
    y               = data["users"],
    exog            = data.drop(columns="users"),
    select_only     = None,
    force_inclusion = None,
    subsample       = 0.5,
    random_state    = 123,
    verbose         = True,
)
# Feature selection (autoregressive and exog) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

selector = RFECV(
    estimator=regressor, step=1, cv=3, min_features_to_select=25, n_jobs=-1
)

selected_autoreg, selected_exog = select_features(
    forecaster      = forecaster,
    selector        = selector,
    y               = data["users"],
    exog            = data.drop(columns="users"),
    select_only     = None,
    force_inclusion = None,
    subsample       = 0.5,
    random_state    = 123,
    verbose         = True,
)

Recursive feature elimination (RFECV)
-------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 4356
Number of features available: 136
    Autoreg (n=48)
    Exog    (n=88)
Number of features selected: 39
    Autoreg (n=27) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 16, 21, 22, 23, 24, 25, 26, 29, 30, 32, 35, 36, 42, 48]
    Exog    (n=12) : ['week_of_year_sin', 'hour_day_sin', 'hour_day_cos', 'poly_week_of_year_sin__week_day_sin', 'poly_week_of_year_cos__week_day_sin', 'poly_week_of_year_cos__hour_day_sin', 'poly_week_day_sin__hour_day_sin', 'poly_week_day_sin__hour_day_cos', 'poly_week_day_cos__hour_day_cos', 'poly_hour_day_sin__hour_day_cos', 'temp_roll_mean_1_day', 'temp']

Then, the Forecaster model is trained with the selected features.

In [6]:

Copied!





# Train forecaster with selected features
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = LGBMRegressor(
                                 n_estimators = 900,
                                 random_state = 15926,
                                 max_depth    = 7,
                                 verbose      = -1
                             ),
                 lags      = selected_autoreg,
             )

forecaster.fit(y=data["users"], exog=data[selected_exog])
# Train forecaster with selected features
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = LGBMRegressor(
                                 n_estimators = 900,
                                 random_state = 15926,
                                 max_depth    = 7,
                                 verbose      = -1
                             ),
                 lags      = selected_autoreg,
             )

forecaster.fit(y=data["users"], exog=data[selected_exog])

Selection on a subset of features¶

If select_only = 'autoreg', only autoregressive features (lags or custom predictors) are evaluated by the selector. All exogenous features are included in the output (selected_exog).
If select_only = 'exog', exogenous features are evaluated by the selector in the absence of autoregressive features. All autoregressive features are included in the output (selected_autoreg).

In [7]:

Copied!





# Feature selection (only autoregressive) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

selector = RFECV(
    estimator=regressor, step=1, cv=3, min_features_to_select=25, n_jobs=-1
)

selected_autoreg, selected_exog = select_features(
    forecaster  = forecaster,
    selector    = selector,
    y           = data["users"],
    exog        = data.drop(columns="users"),
    select_only = 'autoreg',
    subsample   = 0.5,
    verbose     = True,
)
# Feature selection (only autoregressive) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

selector = RFECV(
    estimator=regressor, step=1, cv=3, min_features_to_select=25, n_jobs=-1
)

selected_autoreg, selected_exog = select_features(
    forecaster  = forecaster,
    selector    = selector,
    y           = data["users"],
    exog        = data.drop(columns="users"),
    select_only = 'autoreg',
    subsample   = 0.5,
    verbose     = True,
)

Recursive feature elimination (RFECV)
-------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 4356
Number of features available: 27
    Autoreg (n=27)
    Exog    (n=88)
Number of features selected: 26
    Autoreg (n=26) : [1, 2, 3, 4, 5, 6, 8, 9, 10, 11, 13, 14, 16, 21, 22, 23, 24, 25, 26, 29, 30, 32, 35, 36, 42, 48]
    Exog    (n=88) : ['month_sin', 'month_cos', 'week_of_year_sin', 'week_of_year_cos', 'week_day_sin', 'week_day_cos', 'hour_day_sin', 'hour_day_cos', 'sunrise_hour_sin', 'sunrise_hour_cos', 'sunset_hour_sin', 'sunset_hour_cos', 'poly_month_sin__month_cos', 'poly_month_sin__week_of_year_sin', 'poly_month_sin__week_of_year_cos', 'poly_month_sin__week_day_sin', 'poly_month_sin__week_day_cos', 'poly_month_sin__hour_day_sin', 'poly_month_sin__hour_day_cos', 'poly_month_sin__sunrise_hour_sin', 'poly_month_sin__sunrise_hour_cos', 'poly_month_sin__sunset_hour_sin', 'poly_month_sin__sunset_hour_cos', 'poly_month_cos__week_of_year_sin', 'poly_month_cos__week_of_year_cos', 'poly_month_cos__week_day_sin', 'poly_month_cos__week_day_cos', 'poly_month_cos__hour_day_sin', 'poly_month_cos__hour_day_cos', 'poly_month_cos__sunrise_hour_sin', 'poly_month_cos__sunrise_hour_cos', 'poly_month_cos__sunset_hour_sin', 'poly_month_cos__sunset_hour_cos', 'poly_week_of_year_sin__week_of_year_cos', 'poly_week_of_year_sin__week_day_sin', 'poly_week_of_year_sin__week_day_cos', 'poly_week_of_year_sin__hour_day_sin', 'poly_week_of_year_sin__hour_day_cos', 'poly_week_of_year_sin__sunrise_hour_sin', 'poly_week_of_year_sin__sunrise_hour_cos', 'poly_week_of_year_sin__sunset_hour_sin', 'poly_week_of_year_sin__sunset_hour_cos', 'poly_week_of_year_cos__week_day_sin', 'poly_week_of_year_cos__week_day_cos', 'poly_week_of_year_cos__hour_day_sin', 'poly_week_of_year_cos__hour_day_cos', 'poly_week_of_year_cos__sunrise_hour_sin', 'poly_week_of_year_cos__sunrise_hour_cos', 'poly_week_of_year_cos__sunset_hour_sin', 'poly_week_of_year_cos__sunset_hour_cos', 'poly_week_day_sin__week_day_cos', 'poly_week_day_sin__hour_day_sin', 'poly_week_day_sin__hour_day_cos', 'poly_week_day_sin__sunrise_hour_sin', 'poly_week_day_sin__sunrise_hour_cos', 'poly_week_day_sin__sunset_hour_sin', 'poly_week_day_sin__sunset_hour_cos', 'poly_week_day_cos__hour_day_sin', 'poly_week_day_cos__hour_day_cos', 'poly_week_day_cos__sunrise_hour_sin', 'poly_week_day_cos__sunrise_hour_cos', 'poly_week_day_cos__sunset_hour_sin', 'poly_week_day_cos__sunset_hour_cos', 'poly_hour_day_sin__hour_day_cos', 'poly_hour_day_sin__sunrise_hour_sin', 'poly_hour_day_sin__sunrise_hour_cos', 'poly_hour_day_sin__sunset_hour_sin', 'poly_hour_day_sin__sunset_hour_cos', 'poly_hour_day_cos__sunrise_hour_sin', 'poly_hour_day_cos__sunrise_hour_cos', 'poly_hour_day_cos__sunset_hour_sin', 'poly_hour_day_cos__sunset_hour_cos', 'poly_sunrise_hour_sin__sunrise_hour_cos', 'poly_sunrise_hour_sin__sunset_hour_sin', 'poly_sunrise_hour_sin__sunset_hour_cos', 'poly_sunrise_hour_cos__sunset_hour_sin', 'poly_sunrise_hour_cos__sunset_hour_cos', 'poly_sunset_hour_sin__sunset_hour_cos', 'temp_roll_mean_1_day', 'temp_roll_mean_7_day', 'temp_roll_max_1_day', 'temp_roll_min_1_day', 'temp_roll_max_7_day', 'temp_roll_min_7_day', 'holiday_previous_day', 'holiday_next_day', 'temp', 'holiday']

In [8]:

Copied!

# Check all exogenous features are selected
# ==============================================================================
len(selected_exog) == data.drop(columns="users").shape[1]
# Check all exogenous features are selected
# ==============================================================================
len(selected_exog) == data.drop(columns="users").shape[1]

Out[8]:

True

In [9]:

Copied!





# Feature selection (only exog) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

selector = RFECV(
    estimator=regressor, step=1, cv=3, min_features_to_select=25, n_jobs=-1
)

selected_autoreg, selected_exog = select_features(
    forecaster  = forecaster,
    selector    = selector,
    y           = data["users"],
    exog        = data.drop(columns="users"),
    select_only = 'exog',
    subsample   = 0.5,
    verbose     = True,
)
# Feature selection (only exog) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

selector = RFECV(
    estimator=regressor, step=1, cv=3, min_features_to_select=25, n_jobs=-1
)

selected_autoreg, selected_exog = select_features(
    forecaster  = forecaster,
    selector    = selector,
    y           = data["users"],
    exog        = data.drop(columns="users"),
    select_only = 'exog',
    subsample   = 0.5,
    verbose     = True,
)

Recursive feature elimination (RFECV)
-------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 4356
Number of features available: 88
    Autoreg (n=27)
    Exog    (n=88)
Number of features selected: 66
    Autoreg (n=27) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 16, 21, 22, 23, 24, 25, 26, 29, 30, 32, 35, 36, 42, 48]
    Exog    (n=66) : ['week_of_year_sin', 'week_of_year_cos', 'week_day_sin', 'week_day_cos', 'hour_day_sin', 'hour_day_cos', 'poly_month_sin__week_of_year_sin', 'poly_month_sin__week_of_year_cos', 'poly_month_sin__week_day_sin', 'poly_month_sin__week_day_cos', 'poly_month_sin__hour_day_sin', 'poly_month_sin__hour_day_cos', 'poly_month_sin__sunrise_hour_cos', 'poly_month_sin__sunset_hour_sin', 'poly_month_cos__week_of_year_sin', 'poly_month_cos__week_day_sin', 'poly_month_cos__week_day_cos', 'poly_month_cos__hour_day_sin', 'poly_month_cos__hour_day_cos', 'poly_week_of_year_sin__week_of_year_cos', 'poly_week_of_year_sin__week_day_sin', 'poly_week_of_year_sin__week_day_cos', 'poly_week_of_year_sin__hour_day_sin', 'poly_week_of_year_sin__hour_day_cos', 'poly_week_of_year_sin__sunrise_hour_sin', 'poly_week_of_year_sin__sunrise_hour_cos', 'poly_week_of_year_sin__sunset_hour_sin', 'poly_week_of_year_sin__sunset_hour_cos', 'poly_week_of_year_cos__week_day_sin', 'poly_week_of_year_cos__week_day_cos', 'poly_week_of_year_cos__hour_day_sin', 'poly_week_of_year_cos__hour_day_cos', 'poly_week_of_year_cos__sunrise_hour_sin', 'poly_week_of_year_cos__sunrise_hour_cos', 'poly_week_of_year_cos__sunset_hour_sin', 'poly_week_day_sin__week_day_cos', 'poly_week_day_sin__hour_day_sin', 'poly_week_day_sin__hour_day_cos', 'poly_week_day_sin__sunrise_hour_sin', 'poly_week_day_sin__sunrise_hour_cos', 'poly_week_day_sin__sunset_hour_sin', 'poly_week_day_sin__sunset_hour_cos', 'poly_week_day_cos__hour_day_sin', 'poly_week_day_cos__hour_day_cos', 'poly_week_day_cos__sunrise_hour_sin', 'poly_week_day_cos__sunrise_hour_cos', 'poly_week_day_cos__sunset_hour_sin', 'poly_week_day_cos__sunset_hour_cos', 'poly_hour_day_sin__hour_day_cos', 'poly_hour_day_sin__sunrise_hour_sin', 'poly_hour_day_sin__sunrise_hour_cos', 'poly_hour_day_sin__sunset_hour_sin', 'poly_hour_day_sin__sunset_hour_cos', 'poly_hour_day_cos__sunrise_hour_sin', 'poly_hour_day_cos__sunrise_hour_cos', 'poly_hour_day_cos__sunset_hour_sin', 'poly_hour_day_cos__sunset_hour_cos', 'temp_roll_mean_1_day', 'temp_roll_mean_7_day', 'temp_roll_max_1_day', 'temp_roll_min_1_day', 'temp_roll_max_7_day', 'temp_roll_min_7_day', 'holiday_previous_day', 'temp', 'holiday']

In [10]:

Copied!

# Check all autoregressive features are selected
# ==============================================================================
len(selected_autoreg) == len(forecaster.lags)
# Check all autoregressive features are selected
# ==============================================================================
len(selected_autoreg) == len(forecaster.lags)

Out[10]:

True

Force selection of specific features¶

The force_inclusion argument can be used to force the selection of certain features. To illustrate this, a non-informative feature is added to the data set, noise. This feature contains no information about the target variable and therefore should not be selected by the feature selector. However, if we force the inclusion of this feature, it will be included in the final list of selected features.

In [11]:

Copied!

# Add non-informative feature
# ==============================================================================
data['noise'] = np.random.normal(size=len(data))
# Add non-informative feature
# ==============================================================================
data['noise'] = np.random.normal(size=len(data))

In [12]:

Copied!





# Feature selection (only exog) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

selector = RFECV(
    estimator=regressor, step=1, cv=3, min_features_to_select=10, n_jobs=-1
)

selected_autoreg, selected_exog = select_features(
    forecaster      = forecaster,
    selector        = selector,
    y               = data["users"],
    exog            = data.drop(columns="users"),
    select_only     = 'exog',
    force_inclusion = ["noise"],
    subsample       = 0.5,
    verbose         = True,
)
# Feature selection (only exog) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

selector = RFECV(
    estimator=regressor, step=1, cv=3, min_features_to_select=10, n_jobs=-1
)

selected_autoreg, selected_exog = select_features(
    forecaster      = forecaster,
    selector        = selector,
    y               = data["users"],
    exog            = data.drop(columns="users"),
    select_only     = 'exog',
    force_inclusion = ["noise"],
    subsample       = 0.5,
    verbose         = True,
)

Recursive feature elimination (RFECV)
-------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 4356
Number of features available: 89
    Autoreg (n=27)
    Exog    (n=89)
Number of features selected: 18
    Autoreg (n=27) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 16, 21, 22, 23, 24, 25, 26, 29, 30, 32, 35, 36, 42, 48]
    Exog    (n=19) : ['week_of_year_sin', 'hour_day_sin', 'hour_day_cos', 'poly_month_sin__week_of_year_cos', 'poly_week_of_year_sin__week_day_sin', 'poly_week_of_year_sin__hour_day_sin', 'poly_week_of_year_sin__sunrise_hour_cos', 'poly_week_of_year_cos__week_day_sin', 'poly_week_of_year_cos__week_day_cos', 'poly_week_day_sin__week_day_cos', 'poly_week_day_sin__hour_day_sin', 'poly_week_day_sin__hour_day_cos', 'poly_week_day_cos__hour_day_sin', 'poly_week_day_cos__hour_day_cos', 'poly_hour_day_sin__hour_day_cos', 'temp_roll_mean_1_day', 'temp_roll_mean_7_day', 'temp', 'noise']

In [13]:

Copied!

# Check if "noise" is in selected_exog
# ==============================================================================
"noise" in selected_exog
# Check if "noise" is in selected_exog
# ==============================================================================
"noise" in selected_exog

Out[13]:

True

Feature selection with Sequential Feature Selection (SFS)¶

Sequential Feature Selection is a robust method for selecting features, but it is computationally expensive. When the data set is very large, one way to reduce the computational cost is to use a single validation split to evaluate each candidate model instead of cross-validation (default).

In [14]:

Copied!





# Feature selection (only exog) with scikit-learn SequentialFeatureSelector
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

selector = SequentialFeatureSelector(
               estimator            = forecaster.regressor,
               n_features_to_select = 25,
               direction            = "forward",
               cv                   = ShuffleSplit(n_splits=1, test_size=0.3, random_state=951),
               scoring              = "neg_mean_absolute_error",
               n_jobs               = -1,
           )

selected_autoreg, selected_exog = select_features(
    forecaster   = forecaster,
    selector     = selector,
    y            = data["users"],
    exog         = data.drop(columns="users"),
    select_only  = 'exog',
    subsample    = 0.2,
    random_state = 123,
    verbose      = True,
)
# Feature selection (only exog) with scikit-learn SequentialFeatureSelector
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

selector = SequentialFeatureSelector(
               estimator            = forecaster.regressor,
               n_features_to_select = 25,
               direction            = "forward",
               cv                   = ShuffleSplit(n_splits=1, test_size=0.3, random_state=951),
               scoring              = "neg_mean_absolute_error",
               n_jobs               = -1,
           )

selected_autoreg, selected_exog = select_features(
    forecaster   = forecaster,
    selector     = selector,
    y            = data["users"],
    exog         = data.drop(columns="users"),
    select_only  = 'exog',
    subsample    = 0.2,
    random_state = 123,
    verbose      = True,
)

Recursive feature elimination (SequentialFeatureSelector)
---------------------------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 1742
Number of features available: 89
    Autoreg (n=27)
    Exog    (n=89)
Number of features selected: 25
    Autoreg (n=27) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 16, 21, 22, 23, 24, 25, 26, 29, 30, 32, 35, 36, 42, 48]
    Exog    (n=25) : ['week_of_year_cos', 'week_day_sin', 'week_day_cos', 'hour_day_sin', 'hour_day_cos', 'sunrise_hour_sin', 'sunset_hour_cos', 'poly_month_sin__week_day_cos', 'poly_month_cos__week_of_year_cos', 'poly_month_cos__sunrise_hour_cos', 'poly_week_of_year_sin__week_day_cos', 'poly_week_of_year_cos__sunrise_hour_sin', 'poly_week_day_sin__hour_day_cos', 'poly_week_day_cos__hour_day_sin', 'poly_week_day_cos__hour_day_cos', 'poly_week_day_cos__sunrise_hour_cos', 'poly_hour_day_sin__hour_day_cos', 'poly_hour_day_cos__sunrise_hour_cos', 'poly_sunrise_hour_sin__sunset_hour_cos', 'poly_sunset_hour_sin__sunset_hour_cos', 'temp_roll_min_1_day', 'holiday_previous_day', 'holiday_next_day', 'temp', 'holiday']

Combination of feature selection methods¶

Combining feature selection methods can help speed up the process. An effective approach is to first use SelectFromModel to eliminate the less important features, and then use SequentialFeatureSelector to determine the best subset of features from this reduced list. This two-step method often improves efficiency by focusing on the most important features.

In [15]:

Copied!





# Feature selection (autoregressive and exog) with SelectFromModel + SequentialFeatureSelector
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

# Step 1: Select the 70% most important features with SelectFromModel
selector_1 = SelectFromModel(
                 estimator    = regressor,
                 max_features = int(data.shape[1] * 0.7),
                 threshold    = -np.inf
             )
selected_autoreg_1, selected_exog_1 = select_features(
    forecaster  = forecaster,
    selector    = selector_1,
    y           = data["users"],
    exog        = data.drop(columns="users"),
    select_only = None,
    subsample   = 0.2,
    verbose     = True,
)
print("")

# Step 2: Select the 25 most important features with SequentialFeatureSelector
forecaster.set_lags(lags=selected_autoreg_1)
selector_2 = SequentialFeatureSelector(
                 estimator            = regressor,
                 n_features_to_select = 25,
                 direction            = "forward",
                 cv                   = ShuffleSplit(n_splits=1, test_size=0.3, random_state=951),
                 scoring              = "neg_mean_absolute_error",
                 n_jobs               = -1,
             )
selected_autoreg, selected_exog = select_features(
    forecaster  = forecaster,
    selector    = selector_2,
    y           = data["users"],
    exog        = data[selected_exog_1],
    select_only = None,
    subsample   = 0.2,
    verbose     = True,
)
# Feature selection (autoregressive and exog) with SelectFromModel + SequentialFeatureSelector
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

# Step 1: Select the 70% most important features with SelectFromModel
selector_1 = SelectFromModel(
                 estimator    = regressor,
                 max_features = int(data.shape[1] * 0.7),
                 threshold    = -np.inf
             )
selected_autoreg_1, selected_exog_1 = select_features(
    forecaster  = forecaster,
    selector    = selector_1,
    y           = data["users"],
    exog        = data.drop(columns="users"),
    select_only = None,
    subsample   = 0.2,
    verbose     = True,
)
print("")

# Step 2: Select the 25 most important features with SequentialFeatureSelector
forecaster.set_lags(lags=selected_autoreg_1)
selector_2 = SequentialFeatureSelector(
                 estimator            = regressor,
                 n_features_to_select = 25,
                 direction            = "forward",
                 cv                   = ShuffleSplit(n_splits=1, test_size=0.3, random_state=951),
                 scoring              = "neg_mean_absolute_error",
                 n_jobs               = -1,
             )
selected_autoreg, selected_exog = select_features(
    forecaster  = forecaster,
    selector    = selector_2,
    y           = data["users"],
    exog        = data[selected_exog_1],
    select_only = None,
    subsample   = 0.2,
    verbose     = True,
)

Recursive feature elimination (SelectFromModel)
-----------------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 1742
Number of features available: 116
    Autoreg (n=27)
    Exog    (n=89)
Number of features selected: 62
    Autoreg (n=27) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 16, 21, 22, 23, 24, 25, 26, 29, 30, 32, 35, 36, 42, 48]
    Exog    (n=35) : ['week_day_sin', 'week_day_cos', 'hour_day_sin', 'hour_day_cos', 'poly_month_sin__week_day_sin', 'poly_month_sin__week_day_cos', 'poly_month_sin__hour_day_sin', 'poly_month_sin__hour_day_cos', 'poly_month_cos__week_of_year_sin', 'poly_week_of_year_sin__week_day_sin', 'poly_week_of_year_sin__week_day_cos', 'poly_week_of_year_sin__hour_day_sin', 'poly_week_of_year_sin__hour_day_cos', 'poly_week_of_year_sin__sunset_hour_cos', 'poly_week_of_year_cos__week_day_sin', 'poly_week_of_year_cos__hour_day_cos', 'poly_week_day_sin__week_day_cos', 'poly_week_day_sin__hour_day_sin', 'poly_week_day_sin__hour_day_cos', 'poly_week_day_sin__sunrise_hour_sin', 'poly_week_day_cos__hour_day_sin', 'poly_week_day_cos__hour_day_cos', 'poly_hour_day_sin__hour_day_cos', 'poly_hour_day_sin__sunrise_hour_sin', 'poly_hour_day_sin__sunrise_hour_cos', 'poly_hour_day_sin__sunset_hour_sin', 'poly_hour_day_sin__sunset_hour_cos', 'poly_hour_day_cos__sunset_hour_sin', 'poly_hour_day_cos__sunset_hour_cos', 'temp_roll_mean_1_day', 'temp_roll_mean_7_day', 'temp_roll_max_1_day', 'temp_roll_min_1_day', 'temp', 'noise']

Recursive feature elimination (SequentialFeatureSelector)
---------------------------------------------------------
Total number of records available: 8712
Total number of records used for feature selection: 1742
Number of features available: 62
    Autoreg (n=27)
    Exog    (n=35)
Number of features selected: 25
    Autoreg (n=9) : [1, 3, 4, 9, 10, 24, 26, 42, 48]
    Exog    (n=16) : ['week_day_sin', 'hour_day_sin', 'hour_day_cos', 'poly_month_sin__week_day_cos', 'poly_month_sin__hour_day_cos', 'poly_month_cos__week_of_year_sin', 'poly_week_of_year_sin__week_day_cos', 'poly_week_of_year_sin__hour_day_cos', 'poly_week_day_sin__hour_day_sin', 'poly_week_day_sin__hour_day_cos', 'poly_week_day_sin__sunrise_hour_sin', 'poly_week_day_cos__hour_day_cos', 'poly_hour_day_sin__sunset_hour_sin', 'poly_hour_day_sin__sunset_hour_cos', 'poly_hour_day_cos__sunset_hour_cos', 'temp_roll_min_1_day']

Feature Selection in Global Forecasting Models¶

As with univariate forecasting models, feature selection can be applied to global forecasting models (multi-series). In this case, the select_features_multiseries function is used. This function has the same parameters as select_features, but the y parameter is replaced by series.

forecaster: Forecaster of type ForecasterAutoreg, ForecasterAutoregCustom, ForecasterAutoregMultiSeries or ForecasterAutoregMultiSeriesCustom.
selector: Feature selector from sklearn.feature_selection. For example, RFE or RFECV.
series: Target time series to which the feature selection will be applied.
exog: Exogenous variables.
select_only: Decide what type of features to include in the selection process.
- If 'autoreg', only autoregressive features (lags or custom predictors) are evaluated by the selector. All exogenous features are included in the output (selected_exog).
- If 'exog', only exogenous features are evaluated without the presence of autoregressive features. All autoregressive features are included in the output (selected_autoreg).
- If None, all features are evaluated by the selector.
force_inclusion: Features to force include in the final list of selected features.
- If list, list of feature names to force include.
- If str, regular expression to identify features to force include. For example, if force_inclusion="^sun_", all features that begin with "sun_" will be included in the final list of selected features.
subsample: Proportion of records to use for feature selection.
random_state: Sets a seed for the random subsample so that the subsampling process is always deterministic.
verbose: Print information about feature selection process.

In [16]:

Copied!

# Data
# ==============================================================================
data = fetch_dataset(name="items_sales")
# Data
# ==============================================================================
data = fetch_dataset(name="items_sales")

items_sales
-----------
Simulated time series for the sales of 3 different items.
Simulated data.
Shape of the dataset: (1097, 3)

In [17]:

Copied!





# Create exogenous features based on the calendar
# ==============================================================================
data["month"] = data.index.month
data["day_of_week"] = data.index.dayofweek
data["day_of_month"] = data.index.day
data["week_of_year"] = data.index.isocalendar().week
data["quarter"] = data.index.quarter
data["is_month_start"] = data.index.is_month_start.astype(int)
data["is_month_end"] = data.index.is_month_end.astype(int)
data["is_quarter_start"] = data.index.is_quarter_start.astype(int)
data["is_quarter_end"] = data.index.is_quarter_end.astype(int)
data["is_year_start"] = data.index.is_year_start.astype(int)
data["is_year_end"] = data.index.is_year_end.astype(int)
data.head()
# Create exogenous features based on the calendar
# ==============================================================================
data["month"] = data.index.month
data["day_of_week"] = data.index.dayofweek
data["day_of_month"] = data.index.day
data["week_of_year"] = data.index.isocalendar().week
data["quarter"] = data.index.quarter
data["is_month_start"] = data.index.is_month_start.astype(int)
data["is_month_end"] = data.index.is_month_end.astype(int)
data["is_quarter_start"] = data.index.is_quarter_start.astype(int)
data["is_quarter_end"] = data.index.is_quarter_end.astype(int)
data["is_year_start"] = data.index.is_year_start.astype(int)
data["is_year_end"] = data.index.is_year_end.astype(int)
data.head()

Out[17]:

	item_1	item_2	item_3	month	day_of_week	day_of_month	week_of_year	quarter	is_month_start	is_month_end	is_quarter_start	is_quarter_end	is_year_start	is_year_end
date
2012-01-01	8.253175	21.047727	19.429739	1	6	1	52	1	1	0	1	0	1	0
2012-01-02	22.777826	26.578125	28.009863	1	0	2	1	1	0	0	0	0	0	0
2012-01-03	27.549099	31.751042	32.078922	1	1	3	1	1	0	0	0	0	0	0
2012-01-04	25.895533	24.567708	27.252276	1	2	4	1	1	0	0	0	0	0	0
2012-01-05	21.379238	18.191667	20.357737	1	3	5	1	1	0	0	0	0	0	0

In [18]:

Copied!





# Create forecaster
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
    regressor = LGBMRegressor(n_estimators=900,random_state=159, max_depth=7, verbose=-1),
    lags      = 24,
)
# Create forecaster
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
    regressor = LGBMRegressor(n_estimators=900,random_state=159, max_depth=7, verbose=-1),
    lags      = 24,
)

In [19]:

Copied!





# Feature selection (autoregressive and exog) with scikit-learn RFECV
# ==============================================================================
series_columns = ["item_1", "item_2", "item_3"]
exog_columns = [col for col in data.columns if col not in series_columns]
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

selector = RFECV(
    estimator=regressor, step=1, cv=3, min_features_to_select=25, n_jobs=-1
)

selected_autoreg, selected_exog = select_features_multiseries(
    forecaster      = forecaster,
    selector        = selector,
    series          = data[series_columns],
    exog            = data[exog_columns],
    select_only     = None,
    force_inclusion = None,
    subsample       = 0.5,
    random_state    = 123,
    verbose         = True,
)
# Feature selection (autoregressive and exog) with scikit-learn RFECV
# ==============================================================================
series_columns = ["item_1", "item_2", "item_3"]
exog_columns = [col for col in data.columns if col not in series_columns]
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)

selector = RFECV(
    estimator=regressor, step=1, cv=3, min_features_to_select=25, n_jobs=-1
)

selected_autoreg, selected_exog = select_features_multiseries(
    forecaster      = forecaster,
    selector        = selector,
    series          = data[series_columns],
    exog            = data[exog_columns],
    select_only     = None,
    force_inclusion = None,
    subsample       = 0.5,
    random_state    = 123,
    verbose         = True,
)

Recursive feature elimination (RFECV)
-------------------------------------
Total number of records available: 3219
Total number of records used for feature selection: 4827
Number of features available: 35
    Autoreg (n=24)
    Exog    (n=11)
Number of features selected: 29
    Autoreg (n=24) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24]
    Exog    (n=5) : ['month', 'day_of_week', 'day_of_month', 'week_of_year', 'is_quarter_start']

Once the best subset of features has been selected, the global forecasting model is trained with the selected features.

In [20]:

Copied!





# Train forecaster with selected features
# ==============================================================================
forecaster.set_lags(lags=selected_autoreg)
forecaster.fit(series=data[series_columns], exog=data[selected_exog])
forecaster
# Train forecaster with selected features
# ==============================================================================
forecaster.set_lags(lags=selected_autoreg)
forecaster.fit(series=data[series_columns], exog=data[selected_exog])
forecaster

Out[20]:

============================ 
ForecasterAutoregMultiSeries 
============================ 
Regressor: LGBMRegressor(max_depth=7, n_estimators=900, random_state=159, verbose=-1) 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] 
Transformer for series: StandardScaler() 
Transformer for exog: None 
Series encoding: ordinal_category 
Window size: 24 
Series levels (names): ['item_1', 'item_2', 'item_3'] 
Series weights: None 
Weight function included: False 
Differentiation order: None 
Exogenous included: True 
Type of exogenous variable: <class 'pandas.core.frame.DataFrame'> 
Exogenous variables names: ['month', 'day_of_week', 'day_of_month', 'week_of_year', 'is_quarter_start'] 
Training range: ["'item_1': ['2012-01-01', '2015-01-01']", "'item_2': ['2012-01-01', '2015-01-01']", "'item_3': ['2012-01-01', '2015-01-01']"] 
Training index type: DatetimeIndex 
Training index frequency: D 
Regressor parameters: boosting_type: gbdt, class_weight: None, colsample_bytree: 1.0, importance_type: split, learning_rate: 0.1, ... 
fit_kwargs: {} 
Creation date: 2024-05-17 15:35:05 
Last fit date: 2024-05-17 15:35:10 
Skforecast version: 0.12.0 
Python version: 3.11.8 
Forecaster id: None