Feature selection¶
Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: to simplify models to make them easier to interpret, to reduce training time, to avoid the curse of dimensionality, to improve generalization by reducing overfitting (formally, variance reduction), and others.
Skforecast is compatible with the feature selection methods implemented in the scikit-learn library. There are several methods for feature selection, but the most common are:
Recursive feature elimination
Given an external estimator that assigns weights to features (e.g., the coefficients of a linear model), the goal of recursive feature elimination (RFE) is to select features by recursively considering smaller and smaller sets of features. First, the estimator is trained on the initial set of features, and the importance of each feature is obtained either by a specific attribute (such as coef_
, feature_importances_
) or by a callable
. Then, the least important features are pruned from the current set of features. This procedure is repeated recursively on the pruned set until the desired number of features to select is eventually reached. RFECV performs RFE in a cross-validation loop to find the optimal number of features.
Sequential Feature Selection
Sequential Feature Selection (SFS) can be either forward or backward, with the direction
parameter controlling whether forward or backward SFS is used.
Forward-SFS is a greedy procedure that iteratively finds the best new feature to add to the set of selected features. It starts with zero features and finds the one that maximizes a cross-validated score when an estimator is trained on that single feature. Once this first feature is selected, the procedure is repeated, adding one new feature to the set of selected features. The procedure stops when the desired number of selected features is reached, as determined by the
n_features_to_select
parameter.Backward-SFS follows the same idea, but works in the opposite direction. Instead of starting with no features and greedily adding features, it starts with all features and greedily removes features from the set.
In general, forward and backward selection do not produce equivalent results. Also, one can be much faster than the other depending on the requested number of selected features: if we have 10 features and ask for 7 selected features, forward selection would need to perform 7 iterations while backward selection would only need to perform 3.
SFS differs does not require the underlying model to expose a coef_
or feature_importances_
attribute. However, it may be slower compared to the other approaches, considering that more models have to be evaluated. For example in backward selection, the iteration going from $m$ features to $m - 1$ features using k-fold cross-validation requires fitting $m * k$ models to be evaluated.
Feature selection based on threshold (SelectFromModel)
SelectFromModel can be used along with any estimator that has a coef_
or feature_importances_
attribute after fitting. Features are considered unimportant and removed, if the corresponding coef_
or feature_importances_
values are below the given threshold
parameter. In addition to specifying the threshold
numerically, there are built-in heuristics for finding a threshold using a string argument. Available heuristics are 'mean'
, 'median'
and float multiples of these, such as '0.1*mean'
.
This method is very fast compared to the others because it does not require any additional model training. However, it does not evaluate the impact of feature removal on the model. It is often used for an initial selection before applying another more computationally expensive feature selection method.
💡 Tip
Feature selection is a powerful tool for improving the performance of machine learning models. However, it is computationally expensive and can be time-consuming. Since the goal is to find the best subset of features, not the best model, it is not necessary to use the entire data set or a highly complex model. Instead, it is recommended to use a small subset of the data and a simple model. Once the best subset of features has been identified, the model can then be trained using the entire dataset and a more complex configuration.
For example, in this use case, the model is an LGMBRegressor
with 900 trees and a maximum depth of 7. However, to find the best subset of features, only 100 trees and a maximum depth of 5 are used.
Feature selection with skforecast¶
The select_features
and select_features_multiseries
functions can be used to select the best subset of features (autoregressive and exogenous variables). These functions are compatible with the feature selection methods implemented in the scikit-learn library. The available parameters are:
forecaster
: Forecaster of typeForecasterRecursive
,ForecasterDirect
,ForecasterRecursiveMultiSeries
orForecasterDirectMultiVariate
.selector
: Feature selector fromsklearn.feature_selection
. For example,RFE
orRFECV
.y
orseries
: Target time series to which the feature selection will be applied.exog
: Exogenous variables.select_only
: Decide what type of features to include in the selection process.If
'autoreg'
, only autoregressive features (lags and window features) are evaluated by the selector. All exogenous features are included in the outputselected_exog
.If
'exog'
, only exogenous features are evaluated without the presence of autoregressive features. All autoregressive features are included in the outputsselected_lags
andselected_window_features
.If
None
, all features are evaluated by the selector.
force_inclusion
: Features to force include in the final list of selected features.If
list
, list of feature names to force include.If
str
, regular expression to identify features to force include. For example, ifforce_inclusion="^sun_"
, all features that begin with "sun_" will be included in the final list of selected features.
subsample
: Proportion of records to use for feature selection.random_state
: Sets a seed for the random subsample so that the subsampling process is always deterministic.verbose
: Print information about feature selection process.
These functions return three list
:
selected_lags
: List of selected lags.selected_window_features
: List of selected window features.selected_exog
: List of selected exogenous features.
Libraries and data¶
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from lightgbm import LGBMRegressor
from sklearn.feature_selection import RFECV
from sklearn.feature_selection import SequentialFeatureSelector
from sklearn.feature_selection import SelectFromModel
from sklearn.model_selection import ShuffleSplit
from skforecast.datasets import fetch_dataset
from skforecast.preprocessing import RollingFeatures
from skforecast.recursive import ForecasterRecursive
from skforecast.recursive import ForecasterRecursiveMultiSeries
from skforecast.feature_selection import select_features
from skforecast.feature_selection import select_features_multiseries
# Download data
# ==============================================================================
data = fetch_dataset(name="bike_sharing_extended_features")
data.head(3)
bike_sharing_extended_features ------------------------------ Hourly usage of the bike share system in the city of Washington D.C. during the years 2011 and 2012. In addition to the number of users per hour, the dataset was enriched by introducing supplementary features. Addition includes calendar- based variables (day of the week, hour of the day, month, etc.), indicators for sunlight, incorporation of rolling temperature averages, and the creation of polynomial features generated from variable pairs. All cyclic variables are encoded using sine and cosine functions to ensure accurate representation. Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository. https://doi.org/10.24432/C5W894. Shape of the dataset: (17352, 90)
users | weather | month_sin | month_cos | week_of_year_sin | week_of_year_cos | week_day_sin | week_day_cos | hour_day_sin | hour_day_cos | ... | temp_roll_mean_1_day | temp_roll_mean_7_day | temp_roll_max_1_day | temp_roll_min_1_day | temp_roll_max_7_day | temp_roll_min_7_day | holiday_previous_day | holiday_next_day | temp | holiday | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date_time | |||||||||||||||||||||
2011-01-08 00:00:00 | 25.0 | mist | 0.5 | 0.866025 | 0.120537 | 0.992709 | -0.781832 | 0.62349 | 0.258819 | 0.965926 | ... | 8.063334 | 10.127976 | 9.02 | 6.56 | 18.86 | 4.92 | 0.0 | 0.0 | 7.38 | 0.0 |
2011-01-08 01:00:00 | 16.0 | mist | 0.5 | 0.866025 | 0.120537 | 0.992709 | -0.781832 | 0.62349 | 0.500000 | 0.866025 | ... | 8.029166 | 10.113334 | 9.02 | 6.56 | 18.86 | 4.92 | 0.0 | 0.0 | 7.38 | 0.0 |
2011-01-08 02:00:00 | 16.0 | mist | 0.5 | 0.866025 | 0.120537 | 0.992709 | -0.781832 | 0.62349 | 0.707107 | 0.707107 | ... | 7.995000 | 10.103572 | 9.02 | 6.56 | 18.86 | 4.92 | 0.0 | 0.0 | 7.38 | 0.0 |
3 rows × 90 columns
# Data selection (reduce data size to speed up the example)
# ==============================================================================
data = data.drop(columns="weather")
data = data.loc["2012-01-01 00:00:00":]
Create forecaster¶
A forecasting model is created to predict the number of users using the last 48 values (last two days) and the exogenous features available in the dataset.
# Create forecaster
# ==============================================================================
window_features = RollingFeatures(
stats = ['mean', 'mean', 'sum'],
window_sizes = [24, 48, 24]
)
forecaster = ForecasterRecursive(
regressor = LGBMRegressor(
n_estimators = 900,
random_state = 15926,
max_depth = 7,
verbose = -1
),
lags = 48,
window_features = window_features
)
Feature selection with Recursive Feature Elimination (RFECV)¶
Selection of autoregressive and exogenous features¶
By default, the select_features
function selects the best subset of autoregressive and exogenous features.
# Feature selection (autoregressive and exog) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = RFECV(
estimator=regressor, step=1, cv=3, min_features_to_select=25, n_jobs=-1
)
selected_lags, selected_window_features, selected_exog = select_features(
forecaster = forecaster,
selector = selector,
y = data["users"],
exog = data.drop(columns="users"),
select_only = None,
force_inclusion = None,
subsample = 0.5,
random_state = 123,
verbose = True,
)
Recursive feature elimination (RFECV) ------------------------------------- Total number of records available: 8712 Total number of records used for feature selection: 4356 Number of features available: 139 Lags (n=48) Window features (n=3) Exog (n=88) Number of features selected: 52 Lags (n=31) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 19, 22, 23, 24, 25, 26, 28, 30, 32, 34, 35, 36, 40, 42, 44, 47, 48] Window features (n=1) : ['roll_mean_24'] Exog (n=20) : ['hour_day_sin', 'hour_day_cos', 'poly_month_cos__week_of_year_sin', 'poly_week_of_year_sin__week_day_sin', 'poly_week_of_year_sin__week_day_cos', 'poly_week_of_year_sin__hour_day_sin', 'poly_week_of_year_sin__hour_day_cos', 'poly_week_of_year_sin__sunset_hour_cos', 'poly_week_of_year_cos__week_day_sin', 'poly_week_of_year_cos__week_day_cos', 'poly_week_of_year_cos__hour_day_sin', 'poly_week_of_year_cos__hour_day_cos', 'poly_week_day_sin__hour_day_sin', 'poly_week_day_sin__hour_day_cos', 'poly_week_day_sin__sunset_hour_sin', 'poly_week_day_cos__hour_day_sin', 'poly_week_day_cos__hour_day_cos', 'poly_hour_day_sin__hour_day_cos', 'temp_roll_mean_1_day', 'temp']
Then, the Forecaster model is trained with the selected features. As the window features are generated with the RollingFeatures
class, the selected window features must be included manually creating a new object.
# Train forecaster with selected features
# ==============================================================================
new_window_features = RollingFeatures(
stats = ['mean'],
window_sizes = 24
)
forecaster = ForecasterRecursive(
regressor = LGBMRegressor(
n_estimators = 900,
random_state = 15926,
max_depth = 7,
verbose = -1
),
lags = selected_lags,
window_features = new_window_features
)
forecaster.fit(y=data["users"], exog=data[selected_exog])
Selection on a subset of features¶
If
select_only = 'autoreg'
, only autoregressive features (lags or custom predictors) are evaluated by the selector. All exogenous features are included in the outputselected_exog
.If
select_only = 'exog'
, exogenous features are evaluated by the selector in the absence of autoregressive features. All autoregressive features are included in the outputsselected_lags
andselected_window_features
.
# Feature selection (only autoregressive) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = RFECV(
estimator=regressor, step=1, cv=3, min_features_to_select=25, n_jobs=-1
)
selected_lags, selected_window_features, selected_exog = select_features(
forecaster = forecaster,
selector = selector,
y = data["users"],
exog = data.drop(columns="users"),
select_only = 'autoreg',
subsample = 0.5,
verbose = True,
)
Recursive feature elimination (RFECV) ------------------------------------- Total number of records available: 8712 Total number of records used for feature selection: 4356 Number of features available: 120 Lags (n=31) Window features (n=1) Exog (n=88) Number of features selected: 30 Lags (n=30) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 19, 22, 23, 24, 25, 26, 28, 30, 32, 34, 35, 36, 42, 44, 47, 48] Window features (n=0) : [] Exog (n=88) : ['month_sin', 'month_cos', 'week_of_year_sin', 'week_of_year_cos', 'week_day_sin', 'week_day_cos', 'hour_day_sin', 'hour_day_cos', 'sunrise_hour_sin', 'sunrise_hour_cos', 'sunset_hour_sin', 'sunset_hour_cos', 'poly_month_sin__month_cos', 'poly_month_sin__week_of_year_sin', 'poly_month_sin__week_of_year_cos', 'poly_month_sin__week_day_sin', 'poly_month_sin__week_day_cos', 'poly_month_sin__hour_day_sin', 'poly_month_sin__hour_day_cos', 'poly_month_sin__sunrise_hour_sin', 'poly_month_sin__sunrise_hour_cos', 'poly_month_sin__sunset_hour_sin', 'poly_month_sin__sunset_hour_cos', 'poly_month_cos__week_of_year_sin', 'poly_month_cos__week_of_year_cos', 'poly_month_cos__week_day_sin', 'poly_month_cos__week_day_cos', 'poly_month_cos__hour_day_sin', 'poly_month_cos__hour_day_cos', 'poly_month_cos__sunrise_hour_sin', 'poly_month_cos__sunrise_hour_cos', 'poly_month_cos__sunset_hour_sin', 'poly_month_cos__sunset_hour_cos', 'poly_week_of_year_sin__week_of_year_cos', 'poly_week_of_year_sin__week_day_sin', 'poly_week_of_year_sin__week_day_cos', 'poly_week_of_year_sin__hour_day_sin', 'poly_week_of_year_sin__hour_day_cos', 'poly_week_of_year_sin__sunrise_hour_sin', 'poly_week_of_year_sin__sunrise_hour_cos', 'poly_week_of_year_sin__sunset_hour_sin', 'poly_week_of_year_sin__sunset_hour_cos', 'poly_week_of_year_cos__week_day_sin', 'poly_week_of_year_cos__week_day_cos', 'poly_week_of_year_cos__hour_day_sin', 'poly_week_of_year_cos__hour_day_cos', 'poly_week_of_year_cos__sunrise_hour_sin', 'poly_week_of_year_cos__sunrise_hour_cos', 'poly_week_of_year_cos__sunset_hour_sin', 'poly_week_of_year_cos__sunset_hour_cos', 'poly_week_day_sin__week_day_cos', 'poly_week_day_sin__hour_day_sin', 'poly_week_day_sin__hour_day_cos', 'poly_week_day_sin__sunrise_hour_sin', 'poly_week_day_sin__sunrise_hour_cos', 'poly_week_day_sin__sunset_hour_sin', 'poly_week_day_sin__sunset_hour_cos', 'poly_week_day_cos__hour_day_sin', 'poly_week_day_cos__hour_day_cos', 'poly_week_day_cos__sunrise_hour_sin', 'poly_week_day_cos__sunrise_hour_cos', 'poly_week_day_cos__sunset_hour_sin', 'poly_week_day_cos__sunset_hour_cos', 'poly_hour_day_sin__hour_day_cos', 'poly_hour_day_sin__sunrise_hour_sin', 'poly_hour_day_sin__sunrise_hour_cos', 'poly_hour_day_sin__sunset_hour_sin', 'poly_hour_day_sin__sunset_hour_cos', 'poly_hour_day_cos__sunrise_hour_sin', 'poly_hour_day_cos__sunrise_hour_cos', 'poly_hour_day_cos__sunset_hour_sin', 'poly_hour_day_cos__sunset_hour_cos', 'poly_sunrise_hour_sin__sunrise_hour_cos', 'poly_sunrise_hour_sin__sunset_hour_sin', 'poly_sunrise_hour_sin__sunset_hour_cos', 'poly_sunrise_hour_cos__sunset_hour_sin', 'poly_sunrise_hour_cos__sunset_hour_cos', 'poly_sunset_hour_sin__sunset_hour_cos', 'temp_roll_mean_1_day', 'temp_roll_mean_7_day', 'temp_roll_max_1_day', 'temp_roll_min_1_day', 'temp_roll_max_7_day', 'temp_roll_min_7_day', 'holiday_previous_day', 'holiday_next_day', 'temp', 'holiday']
# Check all exogenous features are selected
# ==============================================================================
len(selected_exog) == data.drop(columns="users").shape[1]
True
# Feature selection (only exog) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = RFECV(
estimator=regressor, step=1, cv=3, min_features_to_select=25, n_jobs=-1
)
selected_lags, selected_window_features, selected_exog = select_features(
forecaster = forecaster,
selector = selector,
y = data["users"],
exog = data.drop(columns="users"),
select_only = 'exog',
subsample = 0.5,
verbose = True,
)
Recursive feature elimination (RFECV) ------------------------------------- Total number of records available: 8712 Total number of records used for feature selection: 4356 Number of features available: 120 Lags (n=31) Window features (n=1) Exog (n=88) Number of features selected: 62 Lags (n=31) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 19, 22, 23, 24, 25, 26, 28, 30, 32, 34, 35, 36, 40, 42, 44, 47, 48] Window features (n=1) : ['roll_mean_24'] Exog (n=62) : ['week_of_year_sin', 'week_of_year_cos', 'week_day_sin', 'week_day_cos', 'hour_day_sin', 'hour_day_cos', 'poly_month_sin__week_of_year_sin', 'poly_month_sin__week_of_year_cos', 'poly_month_sin__week_day_sin', 'poly_month_sin__week_day_cos', 'poly_month_sin__hour_day_sin', 'poly_month_sin__hour_day_cos', 'poly_month_sin__sunrise_hour_cos', 'poly_month_cos__week_of_year_sin', 'poly_month_cos__week_day_sin', 'poly_month_cos__week_day_cos', 'poly_month_cos__hour_day_sin', 'poly_month_cos__hour_day_cos', 'poly_week_of_year_sin__week_of_year_cos', 'poly_week_of_year_sin__week_day_sin', 'poly_week_of_year_sin__week_day_cos', 'poly_week_of_year_sin__hour_day_sin', 'poly_week_of_year_sin__hour_day_cos', 'poly_week_of_year_sin__sunrise_hour_sin', 'poly_week_of_year_sin__sunrise_hour_cos', 'poly_week_of_year_sin__sunset_hour_sin', 'poly_week_of_year_sin__sunset_hour_cos', 'poly_week_of_year_cos__week_day_sin', 'poly_week_of_year_cos__week_day_cos', 'poly_week_of_year_cos__hour_day_sin', 'poly_week_of_year_cos__hour_day_cos', 'poly_week_of_year_cos__sunrise_hour_cos', 'poly_week_of_year_cos__sunset_hour_sin', 'poly_week_day_sin__week_day_cos', 'poly_week_day_sin__hour_day_sin', 'poly_week_day_sin__hour_day_cos', 'poly_week_day_sin__sunrise_hour_sin', 'poly_week_day_sin__sunrise_hour_cos', 'poly_week_day_sin__sunset_hour_sin', 'poly_week_day_sin__sunset_hour_cos', 'poly_week_day_cos__hour_day_sin', 'poly_week_day_cos__hour_day_cos', 'poly_week_day_cos__sunrise_hour_sin', 'poly_week_day_cos__sunrise_hour_cos', 'poly_week_day_cos__sunset_hour_sin', 'poly_week_day_cos__sunset_hour_cos', 'poly_hour_day_sin__hour_day_cos', 'poly_hour_day_sin__sunrise_hour_sin', 'poly_hour_day_sin__sunrise_hour_cos', 'poly_hour_day_sin__sunset_hour_sin', 'poly_hour_day_sin__sunset_hour_cos', 'poly_hour_day_cos__sunrise_hour_sin', 'poly_hour_day_cos__sunrise_hour_cos', 'poly_hour_day_cos__sunset_hour_sin', 'temp_roll_mean_1_day', 'temp_roll_mean_7_day', 'temp_roll_max_1_day', 'temp_roll_min_1_day', 'temp_roll_max_7_day', 'temp_roll_min_7_day', 'temp', 'holiday']
# Check all autoregressive features are selected
# ==============================================================================
print("Same lags :", len(selected_lags) == len(forecaster.lags))
print("Same window features :", len(selected_window_features) == len(forecaster.window_features))
Same lags : True Same window features : True
Force selection of specific features¶
The force_inclusion
argument can be used to force the selection of certain features. To illustrate this, a non-informative feature is added to the data set, noise
. This feature contains no information about the target variable and therefore should not be selected by the feature selector. However, if we force the inclusion of this feature, it will be included in the final list of selected features.
# Add non-informative feature
# ==============================================================================
data['noise'] = np.random.normal(size=len(data))
# Feature selection (only exog) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = RFECV(
estimator=regressor, step=1, cv=3, min_features_to_select=10, n_jobs=-1
)
selected_lags, selected_window_features, selected_exog = select_features(
forecaster = forecaster,
selector = selector,
y = data["users"],
exog = data.drop(columns="users"),
select_only = 'exog',
force_inclusion = ["noise"],
subsample = 0.5,
verbose = True,
)
Recursive feature elimination (RFECV) ------------------------------------- Total number of records available: 8712 Total number of records used for feature selection: 4356 Number of features available: 121 Lags (n=31) Window features (n=1) Exog (n=89) Number of features selected: 75 Lags (n=31) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 19, 22, 23, 24, 25, 26, 28, 30, 32, 34, 35, 36, 40, 42, 44, 47, 48] Window features (n=1) : ['roll_mean_24'] Exog (n=75) : ['month_sin', 'week_of_year_sin', 'week_of_year_cos', 'week_day_sin', 'week_day_cos', 'hour_day_sin', 'hour_day_cos', 'poly_month_sin__month_cos', 'poly_month_sin__week_of_year_sin', 'poly_month_sin__week_of_year_cos', 'poly_month_sin__week_day_sin', 'poly_month_sin__week_day_cos', 'poly_month_sin__hour_day_sin', 'poly_month_sin__hour_day_cos', 'poly_month_sin__sunrise_hour_sin', 'poly_month_sin__sunrise_hour_cos', 'poly_month_sin__sunset_hour_sin', 'poly_month_sin__sunset_hour_cos', 'poly_month_cos__week_of_year_sin', 'poly_month_cos__week_of_year_cos', 'poly_month_cos__week_day_sin', 'poly_month_cos__week_day_cos', 'poly_month_cos__hour_day_sin', 'poly_month_cos__hour_day_cos', 'poly_month_cos__sunset_hour_sin', 'poly_week_of_year_sin__week_of_year_cos', 'poly_week_of_year_sin__week_day_sin', 'poly_week_of_year_sin__week_day_cos', 'poly_week_of_year_sin__hour_day_sin', 'poly_week_of_year_sin__hour_day_cos', 'poly_week_of_year_sin__sunrise_hour_sin', 'poly_week_of_year_sin__sunrise_hour_cos', 'poly_week_of_year_sin__sunset_hour_sin', 'poly_week_of_year_sin__sunset_hour_cos', 'poly_week_of_year_cos__week_day_sin', 'poly_week_of_year_cos__week_day_cos', 'poly_week_of_year_cos__hour_day_sin', 'poly_week_of_year_cos__hour_day_cos', 'poly_week_of_year_cos__sunrise_hour_sin', 'poly_week_of_year_cos__sunrise_hour_cos', 'poly_week_of_year_cos__sunset_hour_sin', 'poly_week_of_year_cos__sunset_hour_cos', 'poly_week_day_sin__week_day_cos', 'poly_week_day_sin__hour_day_sin', 'poly_week_day_sin__hour_day_cos', 'poly_week_day_sin__sunrise_hour_sin', 'poly_week_day_sin__sunrise_hour_cos', 'poly_week_day_sin__sunset_hour_sin', 'poly_week_day_sin__sunset_hour_cos', 'poly_week_day_cos__hour_day_sin', 'poly_week_day_cos__hour_day_cos', 'poly_week_day_cos__sunrise_hour_sin', 'poly_week_day_cos__sunrise_hour_cos', 'poly_week_day_cos__sunset_hour_sin', 'poly_week_day_cos__sunset_hour_cos', 'poly_hour_day_sin__hour_day_cos', 'poly_hour_day_sin__sunrise_hour_sin', 'poly_hour_day_sin__sunrise_hour_cos', 'poly_hour_day_sin__sunset_hour_sin', 'poly_hour_day_sin__sunset_hour_cos', 'poly_hour_day_cos__sunrise_hour_sin', 'poly_hour_day_cos__sunrise_hour_cos', 'poly_hour_day_cos__sunset_hour_sin', 'poly_hour_day_cos__sunset_hour_cos', 'temp_roll_mean_1_day', 'temp_roll_mean_7_day', 'temp_roll_max_1_day', 'temp_roll_min_1_day', 'temp_roll_max_7_day', 'temp_roll_min_7_day', 'holiday_previous_day', 'holiday_next_day', 'temp', 'holiday', 'noise']
# Check if "noise" is in selected_exog
# ==============================================================================
"noise" in selected_exog
True
Feature selection with Sequential Feature Selection (SFS)¶
Sequential Feature Selection is a robust method for selecting features, but it is computationally expensive. When the data set is very large, one way to reduce the computational cost is to use a single validation split to evaluate each candidate model instead of cross-validation (default).
# Feature selection (only exog) with scikit-learn SequentialFeatureSelector
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = SequentialFeatureSelector(
estimator = forecaster.regressor,
n_features_to_select = 25,
direction = "forward",
cv = ShuffleSplit(n_splits=1, test_size=0.3, random_state=951),
scoring = "neg_mean_absolute_error",
n_jobs = -1,
)
selected_lags, selected_window_features, selected_exog = select_features(
forecaster = forecaster,
selector = selector,
y = data["users"],
exog = data.drop(columns="users"),
select_only = 'exog',
subsample = 0.2,
random_state = 123,
verbose = True,
)
Recursive feature elimination (SequentialFeatureSelector) --------------------------------------------------------- Total number of records available: 8712 Total number of records used for feature selection: 1742 Number of features available: 121 Lags (n=31) Window features (n=1) Exog (n=89) Number of features selected: 25 Lags (n=31) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 19, 22, 23, 24, 25, 26, 28, 30, 32, 34, 35, 36, 40, 42, 44, 47, 48] Window features (n=1) : ['roll_mean_24'] Exog (n=25) : ['week_of_year_sin', 'week_day_sin', 'hour_day_sin', 'hour_day_cos', 'sunset_hour_sin', 'poly_month_sin__week_day_cos', 'poly_month_cos__week_of_year_cos', 'poly_month_cos__sunset_hour_cos', 'poly_week_of_year_sin__sunset_hour_sin', 'poly_week_of_year_cos__sunrise_hour_cos', 'poly_week_of_year_cos__sunset_hour_cos', 'poly_week_day_sin__week_day_cos', 'poly_week_day_sin__sunrise_hour_sin', 'poly_week_day_sin__sunset_hour_sin', 'poly_week_day_cos__hour_day_cos', 'poly_hour_day_sin__hour_day_cos', 'poly_hour_day_cos__sunset_hour_sin', 'poly_sunrise_hour_sin__sunset_hour_sin', 'poly_sunrise_hour_cos__sunset_hour_sin', 'poly_sunrise_hour_cos__sunset_hour_cos', 'temp_roll_max_1_day', 'temp_roll_min_1_day', 'holiday_next_day', 'temp', 'holiday']
Combination of feature selection methods¶
Combining feature selection methods can help speed up the process. An effective approach is to first use SelectFromModel
to eliminate the less important features, and then use SequentialFeatureSelector
to determine the best subset of features from this reduced list. This two-step method often improves efficiency by focusing on the most important features.
# Feature selection (autoregressive and exog) with SelectFromModel + SequentialFeatureSelector
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
# Step 1: Select the 70% most important features with SelectFromModel
selector_1 = SelectFromModel(
estimator = regressor,
max_features = int(data.shape[1] * 0.7),
threshold = -np.inf
)
selected_lags_1, selected_window_features_1, selected_exog_1 = select_features(
forecaster = forecaster,
selector = selector_1,
y = data["users"],
exog = data.drop(columns="users"),
select_only = None,
subsample = 0.2,
verbose = True,
)
Recursive feature elimination (SelectFromModel) ----------------------------------------------- Total number of records available: 8712 Total number of records used for feature selection: 1742 Number of features available: 121 Lags (n=31) Window features (n=1) Exog (n=89) Number of features selected: 62 Lags (n=31) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 14, 16, 19, 22, 23, 24, 25, 26, 28, 30, 32, 34, 35, 36, 40, 42, 44, 47, 48] Window features (n=1) : ['roll_mean_24'] Exog (n=30) : ['week_of_year_cos', 'hour_day_sin', 'hour_day_cos', 'poly_month_sin__week_day_sin', 'poly_month_sin__week_day_cos', 'poly_week_of_year_sin__week_day_cos', 'poly_week_of_year_sin__hour_day_sin', 'poly_week_of_year_sin__hour_day_cos', 'poly_week_of_year_sin__sunrise_hour_cos', 'poly_week_of_year_sin__sunset_hour_cos', 'poly_week_of_year_cos__week_day_sin', 'poly_week_of_year_cos__week_day_cos', 'poly_week_of_year_cos__hour_day_sin', 'poly_week_of_year_cos__hour_day_cos', 'poly_week_day_sin__week_day_cos', 'poly_week_day_sin__hour_day_sin', 'poly_week_day_sin__hour_day_cos', 'poly_week_day_cos__hour_day_sin', 'poly_week_day_cos__hour_day_cos', 'poly_hour_day_sin__hour_day_cos', 'poly_hour_day_sin__sunrise_hour_sin', 'poly_hour_day_sin__sunrise_hour_cos', 'poly_hour_day_sin__sunset_hour_sin', 'poly_hour_day_sin__sunset_hour_cos', 'poly_hour_day_cos__sunset_hour_sin', 'temp_roll_mean_1_day', 'temp_roll_mean_7_day', 'temp_roll_max_1_day', 'temp', 'noise']
# Step 2: Select the 25 most important features with SequentialFeatureSelector
window_features_1 = RollingFeatures(
stats = ['mean'],
window_sizes = 24
)
forecaster.set_lags(lags=selected_lags_1)
forecaster.set_window_features(window_features=window_features_1)
selector_2 = SequentialFeatureSelector(
estimator = regressor,
n_features_to_select = 25,
direction = "forward",
cv = ShuffleSplit(n_splits=1, test_size=0.3, random_state=951),
scoring = "neg_mean_absolute_error",
n_jobs = -1,
)
selected_lags, selected_window_features, selected_exog = select_features(
forecaster = forecaster,
selector = selector_2,
y = data["users"],
exog = data[selected_exog_1],
select_only = None,
subsample = 0.2,
verbose = True,
)
Recursive feature elimination (SequentialFeatureSelector) --------------------------------------------------------- Total number of records available: 8712 Total number of records used for feature selection: 1742 Number of features available: 62 Lags (n=31) Window features (n=1) Exog (n=30) Number of features selected: 25 Lags (n=12) : [1, 6, 7, 8, 11, 16, 24, 25, 28, 30, 32, 47] Window features (n=0) : [] Exog (n=13) : ['hour_day_sin', 'hour_day_cos', 'poly_week_of_year_sin__sunrise_hour_cos', 'poly_week_of_year_cos__hour_day_sin', 'poly_week_of_year_cos__hour_day_cos', 'poly_week_day_sin__week_day_cos', 'poly_week_day_sin__hour_day_sin', 'poly_week_day_sin__hour_day_cos', 'poly_hour_day_sin__hour_day_cos', 'poly_hour_day_sin__sunrise_hour_cos', 'poly_hour_day_sin__sunset_hour_cos', 'poly_hour_day_cos__sunset_hour_sin', 'temp']
Feature Selection in Global Forecasting Models¶
As with univariate forecasting models, feature selection can be applied to global forecasting models (multi-series). In this case, the select_features_multiseries
function is used. This function has the same parameters as select_features
, but the y
parameter is replaced by series
.
forecaster
: Forecaster of typeForecasterRecursiveMultiSeries
orForecasterDirectMultiVariate
.selector
: Feature selector fromsklearn.feature_selection
. For example,RFE
orRFECV
.series
: Target time series to which the feature selection will be applied.exog
: Exogenous variables.select_only
: Decide what type of features to include in the selection process.If
'autoreg'
, only autoregressive features (lags or custom predictors) are evaluated by the selector. All exogenous features are included in the outputselected_exog
.If
'exog'
, only exogenous features are evaluated without the presence of autoregressive features. All autoregressive features are included in the outputsselected_lags
andselected_window_features
.If
None
, all features are evaluated by the selector.
force_inclusion
: Features to force include in the final list of selected features.If
list
, list of feature names to force include.If
str
, regular expression to identify features to force include. For example, ifforce_inclusion="^sun_"
, all features that begin with "sun_" will be included in the final list of selected features.
subsample
: Proportion of records to use for feature selection.random_state
: Sets a seed for the random subsample so that the subsampling process is always deterministic.verbose
: Print information about feature selection process.
# Data
# ==============================================================================
data = fetch_dataset(name="items_sales")
items_sales ----------- Simulated time series for the sales of 3 different items. Simulated data. Shape of the dataset: (1097, 3)
# Create exogenous features based on the calendar
# ==============================================================================
data["month"] = data.index.month
data["day_of_week"] = data.index.dayofweek
data["day_of_month"] = data.index.day
data["week_of_year"] = data.index.isocalendar().week
data["quarter"] = data.index.quarter
data["is_month_start"] = data.index.is_month_start.astype(int)
data["is_month_end"] = data.index.is_month_end.astype(int)
data["is_quarter_start"] = data.index.is_quarter_start.astype(int)
data["is_quarter_end"] = data.index.is_quarter_end.astype(int)
data["is_year_start"] = data.index.is_year_start.astype(int)
data["is_year_end"] = data.index.is_year_end.astype(int)
data.head()
item_1 | item_2 | item_3 | month | day_of_week | day_of_month | week_of_year | quarter | is_month_start | is_month_end | is_quarter_start | is_quarter_end | is_year_start | is_year_end | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | ||||||||||||||
2012-01-01 | 8.253175 | 21.047727 | 19.429739 | 1 | 6 | 1 | 52 | 1 | 1 | 0 | 1 | 0 | 1 | 0 |
2012-01-02 | 22.777826 | 26.578125 | 28.009863 | 1 | 0 | 2 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2012-01-03 | 27.549099 | 31.751042 | 32.078922 | 1 | 1 | 3 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2012-01-04 | 25.895533 | 24.567708 | 27.252276 | 1 | 2 | 4 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2012-01-05 | 21.379238 | 18.191667 | 20.357737 | 1 | 3 | 5 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
# Create forecaster
# ==============================================================================
forecaster = ForecasterRecursiveMultiSeries(
regressor = LGBMRegressor(n_estimators=900, random_state=159, max_depth=7, verbose=-1),
lags = 24,
window_features = RollingFeatures(stats=['mean', 'mean', 'mean'], window_sizes=[24, 48, 72])
)
# Feature selection (autoregressive and exog) with scikit-learn RFECV
# ==============================================================================
series_columns = ["item_1", "item_2", "item_3"]
exog_columns = [col for col in data.columns if col not in series_columns]
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = RFECV(
estimator=regressor, step=1, cv=3, min_features_to_select=25, n_jobs=-1
)
selected_lags, selected_window_features, selected_exog = select_features_multiseries(
forecaster = forecaster,
selector = selector,
series = data[series_columns],
exog = data[exog_columns],
select_only = None,
force_inclusion = None,
subsample = 0.5,
random_state = 123,
verbose = True,
)
Recursive feature elimination (RFECV) ------------------------------------- Total number of records available: 3075 Total number of records used for feature selection: 1537 Number of features available: 38 Lags (n=24) Window features (n=3) Exog (n=11) Number of features selected: 28 Lags (n=24) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24] Window features (n=2) : ['roll_mean_24', 'roll_mean_72'] Exog (n=2) : ['day_of_week', 'week_of_year']
Once the best subset of features has been selected, the global forecasting model is trained with the selected features.
# Train forecaster with selected features
# ==============================================================================
new_window_features = RollingFeatures(stats=['mean', 'mean'], window_sizes=[24, 72])
forecaster.set_lags(lags=selected_lags)
forecaster.set_window_features(window_features=new_window_features)
forecaster.fit(series=data[series_columns], exog=data[selected_exog])
forecaster
ForecasterRecursiveMultiSeries
General Information
- Regressor: LGBMRegressor
- Lags: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
- Window features: ['roll_mean_24', 'roll_mean_72']
- Window size: 72
- Series encoding: ordinal
- Exogenous included: True
- Weight function included: False
- Series weights: None
- Differentiation order: None
- Creation date: 2024-11-10 16:27:40
- Last fit date: 2024-11-10 16:28:51
- Skforecast version: 0.14.0
- Python version: 3.11.10
- Forecaster id: None
Exogenous Variables
-
day_of_week, week_of_year
Data Transformations
- Transformer for series: None
- Transformer for exog: None
Training Information
- Series names (levels): item_1, item_2, item_3
- Training range: 'item_1': ['2012-01-01', '2015-01-01'], 'item_2': ['2012-01-01', '2015-01-01'], 'item_3': ['2012-01-01', '2015-01-01']
- Training index type: DatetimeIndex
- Training index frequency: D
Regressor Parameters
-
{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': 7, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 900, 'n_jobs': None, 'num_leaves': 31, 'objective': None, 'random_state': 159, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0, 'verbose': -1}
Fit Kwargs
-
{}