This class turns any estimator compatible with the scikit-learn API into a
autoregressive multivariate direct multi-step forecaster. A separate model
is created for each forecast time step. See documentation for more details.
Parameters:
Name
Type
Description
Default
estimator
estimator or pipeline compatible with the scikit-learn API
An instance of a estimator or pipeline compatible with the scikit-learn API.
Maximum number of future steps the forecaster will predict when using
method predict(). Since a different model is created for each step,
this value must be defined before training.
required
lags
int, list, numpy ndarray, range, dict
Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1.
int: include lags from 1 to lags (included).
list, 1d numpy ndarray or range: include only lags present in
lags, all elements must be int.
dict: create different lags for each series. {'series_column_name': lags}.
Instance or list of instances used to create window features. Window features
are created from the original time series and are included as predictors.
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API with methods: fit, transform, fit_transform and
inverse_transform. Transformation is applied to each series before training
the forecaster. ColumnTransformers are not allowed since they do not have
inverse_transform method.
If single transformer: it is cloned and applied to all series.
If dict of transformers: a different transformer can be used for each series.
`sklearn.preprocessing.StandardScaler`
transformer_exog
transformer
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API. The transformation is applied to exog before training the
forecaster. inverse_transform is not available when using ColumnTransformers.
Function that defines the individual weights for each sample based on the
index. For example, a function that assigns a lower weight to certain dates.
Ignored if estimator does not have the argument sample_weight in its fit
method. The resulting sample_weight cannot have negative values.
Order of differencing applied to the time series before training the forecaster.
If None, no differencing is applied. The order of differentiation is the number
of times the differencing operation is applied to a time series. Differencing
involves computing the differences between consecutive data points in the series.
Before returning a prediction, the differencing operation is reversed.
Additional arguments to pass to the QuantileBinner used to discretize
the residuals into k bins according to the predicted values associated
with each residual. Available arguments are: n_bins, method, subsample,
random_state and dtype. Argument method is passed internally to the
function numpy.percentile.
New in version 0.15.0
The number of jobs to run in parallel. If -1, then the number of jobs is
set to the number of cores. If 'auto', n_jobs is set using the function
skforecast.utils.select_n_jobs_fit_forecaster.
estimator or pipeline compatible with the scikit-learn API
An instance of a estimator or pipeline compatible with the scikit-learn API.
An instance of this estimator is trained for each step. All of them
are stored in self.estimators_.
Future steps the forecaster will predict when using method predict().
Since a different model is created for each step, this value should be
defined before training.
The window size needed to create the predictors. It is calculated as the
maximum value between max_lag and max_size_window_features. If
differentiation is used, window_size is increased by n units equal to
the order of differentiation so that predictors can be generated correctly.
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API with methods: fit, transform, fit_transform and
inverse_transform. Transformation is applied to each series before training
the forecaster. ColumnTransformers are not allowed since they do not have
inverse_transform method.
If single transformer: it is cloned and applied to all series.
If dict of transformers: a different transformer can be used for each series.
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API. The transformation is applied to exog before training the
forecaster. inverse_transform is not available when using ColumnTransformers.
Function that defines the individual weights for each sample based on the
index. For example, a function that assigns a lower weight to certain dates.
Ignored if estimator does not have the argument sample_weight in its
fit method. The resulting sample_weight cannot have negative values.
This window represents the most recent data observed by the predictor
during its training phase. It contains the values needed to predict the
next step immediately after the training data. These values are stored
in the original scale of the time series before undergoing any transformations
or differentiation. When differentiation parameter is specified, the
dimensions of the last_window_ are expanded as many values as the order
of differentiation. For example, if lags = 7 and differentiation = 1,
last_window_ will have 8 values.
Type of each exogenous variable/s used in training before the transformation
applied by transformer_exog. If transformer_exog is not used, it
is equal to exog_dtypes_out_.
Type of each exogenous variable/s used in training after the transformation
applied by transformer_exog. If transformer_exog is not used, it
is equal to exog_dtypes_in_.
Names of the exogenous variables included in the matrix X_train created
internally for training. It can be different from exog_names_in_ if
some exogenous variables are transformed during the training process.
Residuals of the model when predicting training data. Only stored up
to 10_000 values per series in the form {series: residuals}. If
transformer_series is not None, residuals are stored in the
transformed scale. If differentiation is not None, residuals are
stored after differentiation.
In sample residuals binned according to the predicted value each residual
is associated with. The number of residuals stored per bin is limited to
10_000 // self.binner.n_bins_ per series in the form {series: residuals}.
If transformer_series is not None, residuals are stored in the
transformed scale. If differentiation is not None, residuals are
stored after differentiation.
New in version 0.15.0
Residuals of the model when predicting non-training data. Only stored up
to 10_000 values per series in the form {series: residuals}. Use
set_out_sample_residuals() method to set values. If transformer_series
is not None, residuals are stored in the transformed scale. If
differentiation is not None, residuals are stored after differentiation.
Out of sample residuals binned according to the predicted value each residual
is associated with. The number of residuals stored per bin is limited to
10_000 // self.binner.n_bins_ per series in the form {series: residuals}.
If transformer_series is not None, residuals are stored in the
transformed scale. If differentiation is not None, residuals are
stored after differentiation.
New in version 0.15.0
Dictionary of skforecast.preprocessing.QuantileBinner used to discretize
residuals of each series into k bins according to the predicted values
associated with each residual. In the form {series: binner}.
New in version 0.15.0
Intervals used to discretize residuals into k bins according to the predicted
values associated with each residual. In the form {series: binner_intervals_}.
New in version 0.15.0
The number of jobs to run in parallel. If -1, then the number of jobs is
set to the number of cores. If 'auto', n_jobs is set using the function
skforecast.utils.select_n_jobs_fit_forecaster.
Not used, present here for API consistency by convention.
Notes
A separate model is created for each forecasting time step. It is important to
note that all models share the same parameter and hyperparameter configuration.
def__init__(self,level:str,steps:int,estimator:object=None,lags:int|list[int]|np.ndarray[int]|range[int]|dict[str,int|list]|None=None,window_features:object|list[object]|None=None,transformer_series:object|dict[str,object]|None=StandardScaler(),transformer_exog:object|None=None,weight_func:Callable|None=None,differentiation:int|None=None,fit_kwargs:dict[str,object]|None=None,binner_kwargs:dict[str,object]|None=None,n_jobs:int|str='auto',forecaster_id:str|int|None=None,regressor:object=None)->None:self.estimator=copy(initialize_estimator(estimator,regressor))self.level=levelself.lags_=Noneself.transformer_series=transformer_seriesself.transformer_series_=Noneself.transformer_exog=transformer_exogself.weight_func=weight_funcself.source_code_weight_func=Noneself.differentiation=differentiationself.differentiation_max=Noneself.differentiator=Noneself.differentiator_=Noneself.last_window_=Noneself.index_type_=Noneself.index_freq_=Noneself.training_range_=Noneself.series_names_in_=Noneself.exog_in_=Falseself.exog_names_in_=Noneself.exog_type_in_=Noneself.exog_dtypes_in_=Noneself.exog_dtypes_out_=Noneself.X_train_series_names_in_=Noneself.X_train_window_features_names_out_=Noneself.X_train_exog_names_out_=Noneself.X_train_direct_exog_names_out_=Noneself.X_train_features_names_out_=Noneself.in_sample_residuals_=Noneself.out_sample_residuals_=Noneself.in_sample_residuals_by_bin_=Noneself.out_sample_residuals_by_bin_=Noneself.creation_date=pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')self.is_fitted=Falseself.fit_date=Noneself.skforecast_version=__version__self.python_version=sys.version.split(" ")[0]self.forecaster_id=forecaster_idself._probabilistic_mode="binned"self.dropna_from_series=False# Ignored in this forecasterself.encoding=None# Ignored in this forecasterifnotisinstance(level,str):raiseTypeError(f"`level` argument must be a str. Got {type(level)}.")ifnotisinstance(steps,int):raiseTypeError(f"`steps` argument must be an int greater than or equal to 1. "f"Got {type(steps)}.")ifsteps<1:raiseValueError(f"`steps` argument must be greater than or equal to 1. Got {steps}.")self.steps=np.arange(steps)+1self.max_step=stepsself.estimators_={step:clone(self.estimator)forstepinself.steps}ifisinstance(lags,dict):self.lags={}self.lags_names={}list_max_lags=[]forkeyinlags:iflags[key]isNone:self.lags[key]=Noneself.lags_names[key]=Noneelse:self.lags[key],lags_names,max_lag=initialize_lags(forecaster_name=type(self).__name__,lags=lags[key])self.lags_names[key]=([f'{key}_{lag}'forlaginlags_names]iflags_namesisnotNoneelseNone)ifmax_lagisnotNone:list_max_lags.append(max_lag)self.max_lag=max(list_max_lags)iflen(list_max_lags)!=0elseNoneelse:self.lags,self.lags_names,self.max_lag=initialize_lags(forecaster_name=type(self).__name__,lags=lags)self.window_features,self.window_features_names,self.max_size_window_features=(initialize_window_features(window_features))ifself.window_featuresisNoneand(self.lagsisNoneorself.max_lagisNone):raiseValueError("At least one of the arguments `lags` or `window_features` ""must be different from None. This is required to create the ""predictors used in training the forecaster.")self.window_size=max([wsforwsin[self.max_lag,self.max_size_window_features]ifwsisnotNone])self.window_features_class_names=Noneifwindow_featuresisnotNone:self.window_features_class_names=[type(wf).__name__forwfinself.window_features]self.weight_func,self.source_code_weight_func,_=initialize_weights(forecaster_name=type(self).__name__,estimator=estimator,weight_func=weight_func,series_weights=None)ifdifferentiationisnotNone:ifnotisinstance(differentiation,int)ordifferentiation<1:raiseValueError(f"Argument `differentiation` must be an integer equal to or "f"greater than 1. Got {differentiation}.")self.differentiation=differentiationself.differentiation_max=differentiationself.window_size+=differentiationself.differentiator=TimeSeriesDifferentiator(order=differentiation,window_size=self.window_size)self.fit_kwargs=check_select_fit_kwargs(estimator=estimator,fit_kwargs=fit_kwargs)self.binner={}self.binner_intervals_={}self.binner_kwargs=binner_kwargsifbinner_kwargsisNone:self.binner_kwargs={'n_bins':10,'method':'linear','subsample':200000,'random_state':789654,'dtype':np.float64}ifn_jobs=='auto':self.n_jobs=select_n_jobs_fit_forecaster(forecaster_name=type(self).__name__,estimator=self.estimator)else:ifnotisinstance(n_jobs,int):raiseTypeError(f"`n_jobs` must be an integer or `'auto'`. Got {type(n_jobs)}.")self.n_jobs=n_jobsifn_jobs>0elsecpu_count()self.__skforecast_tags__={"library":"skforecast","forecaster_name":"ForecasterDirectMultiVariate","forecaster_task":"regression","forecasting_scope":"global",# single-series | global"forecasting_strategy":"direct",# recursive | direct | deep_learning"index_types_supported":["pandas.RangeIndex","pandas.DatetimeIndex"],"requires_index_frequency":True,"allowed_input_types_series":["pandas.DataFrame"],"supports_exog":True,"allowed_input_types_exog":["pandas.Series","pandas.DataFrame"],"handles_missing_values_series":False,"handles_missing_values_exog":True,"supports_lags":True,"supports_window_features":True,"supports_transformer_series":True,"supports_transformer_exog":True,"supports_weight_func":True,"supports_series_weights":False,"supports_differentiation":True,"prediction_types":["point","interval","bootstrapping","quantiles","distribution"],"supports_probabilistic":True,"probabilistic_methods":["bootstrapping","conformal"],"handles_binned_residuals":True}
Create data_to_return_dict based on series names and lags configuration.
The dictionary contains the information to decide what data to return in
the _create_lags method.
def_create_data_to_return_dict(self,series_names_in_:list[str])->tuple[dict[str,str],list[str]]:""" Create `data_to_return_dict` based on series names and lags configuration. The dictionary contains the information to decide what data to return in the `_create_lags` method. Parameters ---------- series_names_in_ : list Names of the series used during training. Returns ------- data_to_return_dict : dict Dictionary with the information to decide what data to return in the `_create_lags` method. Options are 'X', 'y' or 'both'. X_train_series_names_in_ : list Names of the series added to `X_train` when creating the training matrices with `_create_train_X_y` method. It is a subset of `series_names_in_`. """ifisinstance(self.lags,dict):lags_keys=list(self.lags.keys())ifset(lags_keys)!=set(series_names_in_):# Set to avoid orderraiseValueError(f"When `lags` parameter is a `dict`, its keys must be the "f"same as `series` column names. If don't want to include lags, ""add '{column: None}' to the lags dict.\n"f" Lags keys : {lags_keys}.\n"f" `series` columns : {series_names_in_}.")self.lags_=copy(self.lags)else:self.lags_={series:self.lagsforseriesinseries_names_in_}ifself.lagsisnotNone:# Defined `lags_names` here to avoid overwriting when fit and then create_train_X_ylags_names=[f'lag_{i}'foriinself.lags]self.lags_names={series:[f'{series}_{lag}'forlaginlags_names]forseriesinseries_names_in_}else:self.lags_names={series:Noneforseriesinseries_names_in_}X_train_series_names_in_=series_names_in_ifself.lagsisNone:data_to_return_dict={self.level:'y'}else:# If col is not level and has lags, create 'X' if no lags don't include# If col is level, create 'both' (`X` and `y`)data_to_return_dict={col:('both'ifcol==self.levelelse'X')forcolinseries_names_in_ifcol==self.levelorself.lags_.get(col)isnotNone}# Adjust 'level' in case self.lags_[level] is Noneifself.lags_.get(self.level)isNone:data_to_return_dict[self.level]='y'ifself.window_featuresisNone:# X_train_series_names_in_ include series that will be added to X_trainX_train_series_names_in_=[colforcolindata_to_return_dict.keys()ifdata_to_return_dict[col]in['X','both']]returndata_to_return_dict,X_train_series_names_in_
def_create_lags(self,y:np.ndarray,lags:np.ndarray,data_to_return:str|None='both')->tuple[np.ndarray|None,np.ndarray|None]:""" Create the lagged values and their target variable from a time series. Note that the returned matrix `X_data` contains the lag 1 in the first column, the lag 2 in the in the second column and so on. The returned matrices are views into the original `y` so care must be taken when modifying them. Parameters ---------- y : numpy ndarray Training time series values. lags : numpy ndarray lags to create. data_to_return : str, default 'both' Specifies which data to return. Options are 'X', 'y', 'both' or None. Returns ------- X_data : numpy ndarray, None Lagged values (predictors). y_data : numpy ndarray, None Values of the time series related to each row of `X_data`. Notes ----- Returned matrices are views into the original `y` so care must be taken when modifying them. """X_data=Noney_data=Noneifdata_to_returnisnotNone:n_rows=len(y)-self.window_size-(self.max_step-1)windows=np.lib.stride_tricks.sliding_window_view(y,self.window_size+self.max_step)ifdata_to_return!='y':# If `data_to_return` is not 'y', it means is 'X' or 'both', X_data is createdlag_indices=[self.window_size-lagforlaginlags]X_data=windows[:n_rows,lag_indices]ifdata_to_return!='X':# If `data_to_return` is not 'X', it means is 'y' or 'both', y_data is createdy_data=windows[:n_rows,self.window_size:self.window_size+self.max_step]returnX_data,y_data
def_create_window_features(self,y:pd.Series,train_index:pd.Index,X_as_pandas:bool=False,)->tuple[list[np.ndarray|pd.DataFrame],list[str]]:""" Create window features from a time series. Parameters ---------- y : pandas Series Training time series. train_index : pandas Index Index of the training data. It is used to create the pandas DataFrame `X_train_window_features` when `X_as_pandas` is `True`. X_as_pandas : bool, default False If `True`, the returned matrix `X_train_window_features` is a pandas DataFrame. Returns ------- X_train_window_features : list List of numpy ndarrays or pandas DataFrames with the window features. X_train_window_features_names_out_ : list Names of the window features. """len_train_index=len(train_index)X_train_window_features=[]X_train_window_features_names_out_=[]forwfinself.window_features:X_train_wf=wf.transform_batch(y)ifnotisinstance(X_train_wf,pd.DataFrame):raiseTypeError(f"The method `transform_batch` of {type(wf).__name__} "f"must return a pandas DataFrame.")X_train_wf=X_train_wf.iloc[-len_train_index:]ifnotlen(X_train_wf)==len_train_index:raiseValueError(f"The method `transform_batch` of {type(wf).__name__} "f"must return a DataFrame with the same number of rows as "f"the input time series - (`window_size` + (`steps` - 1)): {len_train_index}.")X_train_wf.index=train_indexX_train_wf.columns=[f'{y.name}_{col}'forcolinX_train_wf.columns]X_train_window_features_names_out_.extend(X_train_wf.columns)ifnotX_as_pandas:X_train_wf=X_train_wf.to_numpy()X_train_window_features.append(X_train_wf)returnX_train_window_features,X_train_window_features_names_out_
Create training matrices from multiple time series and exogenous
variables. The resulting matrices contain the target variable and predictors
needed to train all the estimators (one per step).
Parameters:
Name
Type
Description
Default
series
pandas DataFrame
Training time series.
required
exog
pandas Series, pandas DataFrame
Exogenous variable/s included as predictor/s. Must have the same
number of observations as series and their indexes must be aligned.
None
Returns:
Name
Type
Description
X_train
pandas DataFrame
Training values (predictors) for each step. Note that the index
corresponds to that of the last step. It is updated for the corresponding
step in the filter_train_X_y_for_step method.
Names of the exogenous variables included in the matrix X_train created
internally for training. It can be different from exog_names_in_ if
some exogenous variables are transformed during the training process.
Type of each exogenous variable/s used in training before the transformation
applied by transformer_exog. If transformer_exog is not used, it
is equal to exog_dtypes_out_.
Type of each exogenous variable/s used in training after the transformation
applied by transformer_exog. If transformer_exog is not used, it
is equal to exog_dtypes_in_.
Source code in skforecast/direct/_forecaster_direct_multivariate.py
def_create_train_X_y(self,series:pd.DataFrame,exog:pd.Series|pd.DataFrame|None=None)->tuple[pd.DataFrame,dict[int,pd.Series],list[str],list[str],list[str],list[str],list[str],dict[str,type],dict[str,type]]:""" Create training matrices from multiple time series and exogenous variables. The resulting matrices contain the target variable and predictors needed to train all the estimators (one per step). Parameters ---------- series : pandas DataFrame Training time series. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. Must have the same number of observations as `series` and their indexes must be aligned. Returns ------- X_train : pandas DataFrame Training values (predictors) for each step. Note that the index corresponds to that of the last step. It is updated for the corresponding step in the filter_train_X_y_for_step method. y_train : dict Values of the time series related to each row of `X_train` for each step in the form {step: y_step_[i]}. series_names_in_ : list Names of the series used during training. X_train_series_names_in_ : list Names of the series added to `X_train` when creating the training matrices with `_create_train_X_y` method. It is a subset of `series_names_in_`. exog_names_in_ : list Names of the exogenous variables included in the training matrices. X_train_exog_names_out_ : list Names of the exogenous variables included in the matrix `X_train` created internally for training. It can be different from `exog_names_in_` if some exogenous variables are transformed during the training process. X_train_features_names_out_ : list Names of the columns of the matrix created internally for training. exog_dtypes_in_ : dict Type of each exogenous variable/s used in training before the transformation applied by `transformer_exog`. If `transformer_exog` is not used, it is equal to `exog_dtypes_out_`. exog_dtypes_out_ : dict Type of each exogenous variable/s used in training after the transformation applied by `transformer_exog`. If `transformer_exog` is not used, it is equal to `exog_dtypes_in_`. """ifnotisinstance(series,pd.DataFrame):raiseTypeError(f"`series` must be a pandas DataFrame. Got {type(series)}.")iflen(series)<self.window_size+self.max_step:raiseValueError(f"Minimum length of `series` for training this forecaster is "f"{self.window_size+self.max_step}. Reduce the number of "f"predicted steps, {self.max_step}, or the maximum "f"window_size, {self.window_size}, if no more data is available.\n"f" Length `series`: {len(series)}.\n"f" Max step : {self.max_step}.\n"f" Max window size: {self.window_size}.\n"f" Lags window size: {self.max_lag}.\n"f" Window features window size: {self.max_size_window_features}.")_,series_index=check_extract_values_and_index(data=series,data_label="`series`",return_values=False)series_names_in_=list(series.columns)ifself.levelnotinseries_names_in_:raiseValueError(f"One of the `series` columns must be named as the `level` of the forecaster.\n"f" Forecaster `level` : {self.level}.\n"f" `series` columns : {series_names_in_}.")data_to_return_dict,X_train_series_names_in_=(self._create_data_to_return_dict(series_names_in_=series_names_in_))series_to_create_autoreg_features_and_y=[colforcolinseries_names_in_ifcolinX_train_series_names_in_+[self.level]]fit_transformer=Falseifnotself.is_fitted:fit_transformer=Trueself.transformer_series_=initialize_transformer_series(forecaster_name=type(self).__name__,series_names_in_=series_to_create_autoreg_features_and_y,transformer_series=self.transformer_series)ifself.differentiationisNone:self.differentiator_={serie:Noneforserieinseries_to_create_autoreg_features_and_y}else:ifnotself.is_fitted:self.differentiator_={serie:copy(self.differentiator)forserieinseries_to_create_autoreg_features_and_y}exog_names_in_=Noneexog_dtypes_in_=Noneexog_dtypes_out_=NoneX_as_pandas=FalseifexogisnotNone:check_exog(exog=exog,allow_nan=True)exog=input_to_frame(data=exog,input_name='exog')_,exog_index=check_extract_values_and_index(data=exog,data_label='`exog`',ignore_freq=True,return_values=False)series_index_no_ws=series_index[self.window_size:]len_series=len(series)len_series_no_ws=len_series-self.window_sizelen_exog=len(exog)ifnotlen_exog==len_seriesandnotlen_exog==len_series_no_ws:raiseValueError(f"Length of `exog` must be equal to the length of `series` (if "f"index is fully aligned) or length of `series` - `window_size` "f"(if `exog` starts after the first `window_size` values).\n"f" `exog` : ({exog_index[0]} -- {exog_index[-1]}) (n={len_exog})\n"f" `series` : ({series_index[0]} -- {series_index[-1]}) (n={len_series})\n"f" `series` - `window_size` : ({series_index_no_ws[0]} -- {series_index_no_ws[-1]}) (n={len_series_no_ws})")exog_names_in_=exog.columns.to_list()iflen(set(exog_names_in_)-set(series_names_in_))!=len(exog_names_in_):raiseValueError(f"`exog` cannot contain a column named the same as one of "f"the series (column names of series).\n"f" `series` columns : {series_names_in_}.\n"f" `exog` columns : {exog_names_in_}.")# NOTE: Need here for filter_train_X_y_for_step to work without fittingself.exog_in_=Trueexog_dtypes_in_=get_exog_dtypes(exog=exog)exog=transform_dataframe(df=exog,transformer=self.transformer_exog,fit=fit_transformer,inverse_transform=False)check_exog_dtypes(exog,call_check_exog=True)exog_dtypes_out_=get_exog_dtypes(exog=exog)X_as_pandas=any(notpd.api.types.is_numeric_dtype(dtype)orpd.api.types.is_bool_dtype(dtype)fordtypeinset(exog.dtypes))iflen_exog==len_series:ifnot(exog_index==series_index).all():raiseValueError("When `exog` has the same length as `series`, the index ""of `exog` must be aligned with the index of `series` ""to ensure the correct alignment of values.")# The first `self.window_size` positions have to be removed from # exog since they are not in X_train.exog=exog.iloc[self.window_size:,]else:ifnot(exog_index==series_index_no_ws).all():raiseValueError("When `exog` doesn't contain the first `window_size` ""observations, the index of `exog` must be aligned with ""the index of `series` minus the first `window_size` ""observations to ensure the correct alignment of values.")X_train_autoreg=[]X_train_window_features_names_out_=[]ifself.window_featuresisnotNoneelseNoneX_train_features_names_out_=[]train_index=series_index[self.window_size+(self.max_step-1):]forcolinseries_to_create_autoreg_features_and_y:y_values=series[col].to_numpy(copy=True).ravel()ifnp.isnan(y_values).any():raiseValueError(f"Column '{col}' has missing values.")y_values=transform_numpy(array=y_values,transformer=self.transformer_series_[col],fit=fit_transformer,inverse_transform=False)ifself.differentiationisnotNone:ifnotself.is_fitted:y_values=self.differentiator_[col].fit_transform(y_values)else:differentiator=copy(self.differentiator_[col])y_values=differentiator.fit_transform(y_values)X_train_autoreg_col=[]X_train_lags,y_train_values=self._create_lags(y=y_values,lags=self.lags_[col],data_to_return=data_to_return_dict.get(col,None))ifX_train_lagsisnotNone:X_train_autoreg_col.append(X_train_lags)X_train_features_names_out_.extend(self.lags_names[col])ifcol==self.level:y_train=y_train_valuesifself.window_featuresisnotNone:n_diff=0ifself.differentiationisNoneelseself.differentiationend_wf=Noneifself.max_step==1else-(self.max_step-1)y_window_features=pd.Series(y_values[n_diff:end_wf],index=series_index[n_diff:end_wf],name=col)X_train_window_features,X_train_wf_names_out_=(self._create_window_features(y=y_window_features,X_as_pandas=False,train_index=train_index))X_train_autoreg_col.extend(X_train_window_features)X_train_window_features_names_out_.extend(X_train_wf_names_out_)X_train_features_names_out_.extend(X_train_wf_names_out_)ifX_train_autoreg_col:iflen(X_train_autoreg_col)==1:X_train_autoreg_col=X_train_autoreg_col[0]else:X_train_autoreg_col=np.concatenate(X_train_autoreg_col,axis=1)X_train_autoreg.append(X_train_autoreg_col)X_train=[]len_train_index=len(train_index)ifX_as_pandas:iflen(X_train_autoreg)==1:X_train_autoreg=X_train_autoreg[0]else:X_train_autoreg=np.concatenate(X_train_autoreg,axis=1)X_train_autoreg=pd.DataFrame(data=X_train_autoreg,columns=X_train_features_names_out_,index=train_index)X_train.append(X_train_autoreg)else:X_train.extend(X_train_autoreg)# NOTE: Need here for filter_train_X_y_for_step to work without fittingself.X_train_window_features_names_out_=X_train_window_features_names_out_X_train_exog_names_out_=NoneifexogisnotNone:X_train_exog_names_out_=exog.columns.to_list()ifX_as_pandas:exog_direct,X_train_direct_exog_names_out_=exog_to_direct(exog=exog,steps=self.max_step)exog_direct.index=train_indexelse:exog_direct,X_train_direct_exog_names_out_=exog_to_direct_numpy(exog=exog,steps=self.max_step)# NOTE: Need here for filter_train_X_y_for_step to work without fittingself.X_train_direct_exog_names_out_=X_train_direct_exog_names_out_X_train_features_names_out_.extend(self.X_train_direct_exog_names_out_)X_train.append(exog_direct)iflen(X_train)==1:X_train=X_train[0]else:ifX_as_pandas:X_train=pd.concat(X_train,axis=1)else:X_train=np.concatenate(X_train,axis=1)ifX_as_pandas:X_train.index=train_indexelse:X_train=pd.DataFrame(data=X_train,index=train_index,columns=X_train_features_names_out_)y_train={step:pd.Series(data=y_train[:,step-1],index=series_index[self.window_size+step-1:][:len_train_index],name=f"{self.level}_step_{step}")forstepinself.steps}return(X_train,y_train,series_names_in_,X_train_series_names_in_,exog_names_in_,X_train_exog_names_out_,X_train_features_names_out_,exog_dtypes_in_,exog_dtypes_out_)
Create training matrices from multiple time series and exogenous
variables. The resulting matrices contain the target variable and predictors
needed to train all the estimators (one per step).
Parameters:
Name
Type
Description
Default
series
pandas DataFrame
Training time series.
required
exog
pandas Series, pandas DataFrame
Exogenous variable/s included as predictor/s. Must have the same
number of observations as series and their indexes must be aligned.
If True, skforecast warnings will be suppressed during the creation
of the training matrices. See skforecast.exceptions.warn_skforecast_categories
for more information.
False
Returns:
Name
Type
Description
X_train
pandas DataFrame
Training values (predictors) for each step. Note that the index
corresponds to that of the last step. It is updated for the corresponding
step in the filter_train_X_y_for_step method.
defcreate_train_X_y(self,series:pd.DataFrame,exog:pd.Series|pd.DataFrame|None=None,suppress_warnings:bool=False)->tuple[pd.DataFrame,dict[int,pd.Series]]:""" Create training matrices from multiple time series and exogenous variables. The resulting matrices contain the target variable and predictors needed to train all the estimators (one per step). Parameters ---------- series : pandas DataFrame Training time series. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. Must have the same number of observations as `series` and their indexes must be aligned. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the creation of the training matrices. See skforecast.exceptions.warn_skforecast_categories for more information. Returns ------- X_train : pandas DataFrame Training values (predictors) for each step. Note that the index corresponds to that of the last step. It is updated for the corresponding step in the filter_train_X_y_for_step method. y_train : dict Values of the time series related to each row of `X_train` for each step in the form {step: y_step_[i]}. """set_skforecast_warnings(suppress_warnings,action='ignore')output=self._create_train_X_y(series=series,exog=exog)X_train=output[0]y_train=output[1]set_skforecast_warnings(suppress_warnings,action='default')returnX_train,y_train
Select the columns needed to train a forecaster for a specific step.
The input matrices should be created using _create_train_X_y method.
This method updates the index of X_train to the corresponding one
according to y_train. If remove_suffix=True the suffix "_step_i"
will be removed from the column names.
deffilter_train_X_y_for_step(self,step:int,X_train:pd.DataFrame,y_train:dict[int,pd.Series],remove_suffix:bool=False)->tuple[pd.DataFrame,pd.Series]:""" Select the columns needed to train a forecaster for a specific step. The input matrices should be created using `_create_train_X_y` method. This method updates the index of `X_train` to the corresponding one according to `y_train`. If `remove_suffix=True` the suffix "_step_i" will be removed from the column names. Parameters ---------- step : int step for which columns must be selected. Starts at 1. X_train : pandas DataFrame Dataframe created with the `_create_train_X_y` method, first return. y_train : dict Dict created with the `_create_train_X_y` method, second return. remove_suffix : bool, default False If True, suffix "_step_i" is removed from the column names. Returns ------- X_train_step : pandas DataFrame Training values (predictors) for the selected step. y_train_step : pandas Series Values of the time series related to each row of `X_train`. """if(step<1)or(step>self.max_step):raiseValueError(f"Invalid value `step`. For this forecaster, minimum value is 1 "f"and the maximum step is {self.max_step}.")y_train_step=y_train[step]# Matrix X_train starts at index 0.ifnotself.exog_in_:X_train_step=X_trainelse:n_lags=len(list(chain(*[vforvinself.lags_.values()ifvisnotNone])))n_window_features=(len(self.X_train_window_features_names_out_)ifself.window_featuresisnotNoneelse0)idx_columns_autoreg=np.arange(n_lags+n_window_features)n_exog=len(self.X_train_direct_exog_names_out_)/self.max_stepidx_columns_exog=(np.arange((step-1)*n_exog,(step)*n_exog)+idx_columns_autoreg[-1]+1)idx_columns=np.concatenate((idx_columns_autoreg,idx_columns_exog))X_train_step=X_train.iloc[:,idx_columns]X_train_step.index=y_train_step.indexifremove_suffix:X_train_step.columns=[col_name.replace(f"_step_{step}","")forcol_nameinX_train_step.columns]y_train_step.name=y_train_step.name.replace(f"_step_{step}","")returnX_train_step,y_train_step
def_train_test_split_one_step_ahead(self,series:pd.DataFrame,initial_train_size:int,exog:pd.Series|pd.DataFrame|None=None)->tuple[pd.DataFrame,dict[int,pd.Series],pd.DataFrame,dict[int,pd.Series],pd.Series,pd.Series]:""" Create matrices needed to train and test the forecaster for one-step-ahead predictions. Parameters ---------- series : pandas DataFrame Training time series. initial_train_size : int Initial size of the training set. It is the number of observations used to train the forecaster before making the first prediction. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. Must have the same number of observations as `series` and their indexes must be aligned. Returns ------- X_train : pandas DataFrame Predictor values used to train the model. y_train : dict Values of the time series related to each row of `X_train` for each step in the form {step: y_step_[i]}. X_test : pandas DataFrame Predictor values used to test the model. y_test : dict Values of the time series related to each row of `X_test` for each step in the form {step: y_step_[i]}. X_train_encoding : pandas Series Series identifiers for each row of `X_train`. X_test_encoding : pandas Series Series identifiers for each row of `X_test`. """span_index=series.indexfold=[0,[0,initial_train_size],[initial_train_size-self.window_size,initial_train_size],[initial_train_size-self.window_size,len(span_index)],[0,0],# Dummy valueTrue]data_fold=_extract_data_folds_multiseries(series=series,folds=[fold],span_index=span_index,window_size=self.window_size,exog=exog,dropna_last_window=self.dropna_from_series,externally_fitted=False)series_train,_,levels_last_window,exog_train,exog_test,_=next(data_fold)start_test_idx=initial_train_size-self.window_sizeseries_test=series.iloc[start_test_idx:,:]series_test=series_test.loc[:,levels_last_window]series_test=series_test.dropna(axis=1,how='all')_is_fitted=self.is_fitted_series_names_in_=self.series_names_in__exog_names_in_=self.exog_names_in_self.is_fitted=FalseX_train,y_train,series_names_in_,_,exog_names_in_,*_=(self._create_train_X_y(series=series_train,exog=exog_train,))self.series_names_in_=series_names_in_ifexogisnotNone:self.exog_names_in_=exog_names_in_self.is_fitted=TrueX_test,y_test,*_=self._create_train_X_y(series=series_test,exog=exog_test,)self.is_fitted=_is_fittedself.series_names_in_=_series_names_in_self.exog_names_in_=_exog_names_in_X_train_encoding=pd.Series(self.level,index=X_train.index)X_test_encoding=pd.Series(self.level,index=X_test.index)returnX_train,y_train,X_test,y_test,X_train_encoding,X_test_encoding
defcreate_sample_weights(self,X_train:pd.DataFrame)->np.ndarray:""" Create weights for each observation according to the forecaster's attribute `weight_func`. Parameters ---------- X_train : pandas DataFrame Dataframe created with `_create_train_X_y` and filter_train_X_y_for_step` methods, first return. Returns ------- sample_weight : numpy ndarray Weights to use in `fit` method. """sample_weight=Noneifself.weight_funcisnotNone:sample_weight=self.weight_func(X_train.index)ifsample_weightisnotNone:ifnp.isnan(sample_weight).any():raiseValueError("The resulting `sample_weight` cannot have NaN values.")ifnp.any(sample_weight<0):raiseValueError("The resulting `sample_weight` cannot have negative values.")ifnp.sum(sample_weight)==0:raiseValueError(("The resulting `sample_weight` cannot be normalized because ""the sum of the weights is zero."))returnsample_weight
Additional arguments to be passed to the fit method of the estimator
can be added with the fit_kwargs argument when initializing the forecaster.
Parameters:
Name
Type
Description
Default
series
pandas DataFrame
Training time series.
required
exog
pandas Series, pandas DataFrame
Exogenous variable/s included as predictor/s. Must have the same
number of observations as series and their indexes must be aligned so
that series[i] is regressed on exog[i].
If True, in-sample residuals will be stored in the forecaster object
after fitting (in_sample_residuals_ and in_sample_residuals_by_bin_
attributes).
If False, only the intervals of the bins are stored.
If True, skforecast warnings will be suppressed during the training
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
Returns:
Type
Description
None
Source code in skforecast/direct/_forecaster_direct_multivariate.py
deffit(self,series:pd.DataFrame,exog:pd.Series|pd.DataFrame|None=None,store_last_window:bool=True,store_in_sample_residuals:bool=False,random_state:int=123,suppress_warnings:bool=False)->None:""" Training Forecaster. Additional arguments to be passed to the `fit` method of the estimator can be added with the `fit_kwargs` argument when initializing the forecaster. Parameters ---------- series : pandas DataFrame Training time series. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. Must have the same number of observations as `series` and their indexes must be aligned so that series[i] is regressed on exog[i]. store_last_window : bool, default True Whether or not to store the last window (`last_window_`) of training data. store_in_sample_residuals : bool, default False If `True`, in-sample residuals will be stored in the forecaster object after fitting (`in_sample_residuals_` and `in_sample_residuals_by_bin_` attributes). If `False`, only the intervals of the bins are stored. random_state : int, default 123 Set a seed for the random generator so that the stored sample residuals are always deterministic. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the training process. See skforecast.exceptions.warn_skforecast_categories for more information. Returns ------- None """set_skforecast_warnings(suppress_warnings,action='ignore')# Reset values in case the forecaster has already been fitted.self.lags_=Noneself.last_window_=Noneself.index_type_=Noneself.index_freq_=Noneself.training_range_=Noneself.series_names_in_=Noneself.exog_in_=Falseself.exog_names_in_=Noneself.exog_type_in_=Noneself.exog_dtypes_in_=Noneself.exog_dtypes_out_=Noneself.X_train_series_names_in_=Noneself.X_train_window_features_names_out_=Noneself.X_train_exog_names_out_=Noneself.X_train_direct_exog_names_out_=Noneself.X_train_features_names_out_=Noneself.in_sample_residuals_=Noneself.in_sample_residuals_by_bin_=Noneself.binner={}self.binner_intervals_={}self.is_fitted=Falseself.fit_date=None(X_train,y_train,series_names_in_,X_train_series_names_in_,exog_names_in_,X_train_exog_names_out_,X_train_features_names_out_,exog_dtypes_in_,exog_dtypes_out_)=self._create_train_X_y(series=series,exog=exog)deffit_forecaster(estimator,X_train,y_train,step):""" Auxiliary function to fit each of the forecaster's estimators in parallel. Parameters ---------- estimator : object Estimator to be fitted. X_train : pandas DataFrame Dataframe created with the `_create_train_X_y` method, first return. y_train : dict Dict created with the `_create_train_X_y` method, second return. step : int Step of the forecaster to be fitted. Returns ------- Tuple with the step, fitted estimator, in-sample residuals, true values and predicted values for the step. """X_train_step,y_train_step=self.filter_train_X_y_for_step(step=step,X_train=X_train,y_train=y_train,remove_suffix=True)sample_weight=self.create_sample_weights(X_train=X_train_step)ifsample_weightisnotNone:estimator.fit(X=X_train_step,y=y_train_step,sample_weight=sample_weight,**self.fit_kwargs)else:estimator.fit(X=X_train_step,y=y_train_step,**self.fit_kwargs)# NOTE: This is done to save time during fit in functions such as backtesting()y_true_step=Noney_pred_step=Noneifself._probabilistic_modeisnotFalse:y_true_step=y_train_step.to_numpy()y_pred_step=estimator.predict(X_train_step)returnstep,estimator,y_true_step,y_pred_stepresults_fit=(Parallel(n_jobs=self.n_jobs)(delayed(fit_forecaster)(estimator=copy(self.estimator),X_train=X_train,y_train=y_train,step=step)forstepinself.steps))self.estimators_={step:estimatorforstep,estimator,*_inresults_fit}self.in_sample_residuals_={}self.in_sample_residuals_by_bin_={}ifself._probabilistic_modeisnotFalse:forlevelin[self.level]:y_true_level,y_pred_level=zip(*[(y_true,y_pred)for*_,y_true,y_predinresults_fit])self._binning_in_sample_residuals(level=level,y_true=np.concatenate(y_true_level),y_pred=np.concatenate(y_pred_level),store_in_sample_residuals=store_in_sample_residuals,random_state=random_state)ifnotstore_in_sample_residuals:forlevelin[self.level]:self.in_sample_residuals_[level]=Noneself.in_sample_residuals_by_bin_[level]=Noneself.series_names_in_=series_names_in_self.X_train_series_names_in_=X_train_series_names_in_self.X_train_features_names_out_=X_train_features_names_out_self.is_fitted=Trueself.fit_date=pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')self.training_range_=series.index[[0,-1]]self.index_type_=type(series.index)ifisinstance(series.index,pd.DatetimeIndex):self.index_freq_=series.index.freqelse:self.index_freq_=series.index.stepifexogisnotNone:self.exog_in_=Trueself.exog_names_in_=exog_names_in_self.exog_type_in_=type(exog)self.exog_dtypes_in_=exog_dtypes_in_self.exog_dtypes_out_=exog_dtypes_out_self.X_train_exog_names_out_=X_train_exog_names_out_ifstore_last_window:self.last_window_=series.iloc[-self.window_size:,][self.X_train_series_names_in_].copy()set_skforecast_warnings(suppress_warnings,action='default')
Bin residuals according to the predicted value each residual is
associated with. First a skforecast.preprocessing.QuantileBinner object
is fitted to the predicted values. Then, residuals are binned according
to the predicted value each residual is associated with. Residuals are
stored in the forecaster object as in_sample_residuals_ and
in_sample_residuals_by_bin_.
y_true and y_pred assumed to be differentiated and/or transformed
according to the attributes differentiation and transformer_series.
The number of residuals stored per bin is limited to
10_000 // self.binner.n_bins_. The total number of residuals stored is
10_000.
New in version 0.15.0
If True, in-sample residuals will be stored in the forecaster object
after fitting (in_sample_residuals_ and in_sample_residuals_by_bin_
attributes).
If False, only the intervals of the bins are stored.
def_binning_in_sample_residuals(self,level:str,y_true:np.ndarray,y_pred:np.ndarray,store_in_sample_residuals:bool=False,random_state:int=123)->None:""" Bin residuals according to the predicted value each residual is associated with. First a `skforecast.preprocessing.QuantileBinner` object is fitted to the predicted values. Then, residuals are binned according to the predicted value each residual is associated with. Residuals are stored in the forecaster object as `in_sample_residuals_` and `in_sample_residuals_by_bin_`. `y_true` and `y_pred` assumed to be differentiated and/or transformed according to the attributes `differentiation` and `transformer_series`. The number of residuals stored per bin is limited to `10_000 // self.binner.n_bins_`. The total number of residuals stored is `10_000`. **New in version 0.15.0** Parameters ---------- level : str Name of the series (level) to store the residuals. y_true : numpy ndarray True values of the time series. y_pred : numpy ndarray Predicted values of the time series. store_in_sample_residuals : bool, default False If `True`, in-sample residuals will be stored in the forecaster object after fitting (`in_sample_residuals_` and `in_sample_residuals_by_bin_` attributes). If `False`, only the intervals of the bins are stored. random_state : int, default 123 Set a seed for the random generator so that the stored sample residuals are always deterministic. Returns ------- None """y_true=np.asarray(y_true)y_pred=np.asarray(y_pred)residuals=y_true-y_predifself._probabilistic_mode=="binned":data=pd.DataFrame({'prediction':y_pred,'residuals':residuals})self.binner[level]=QuantileBinner(**self.binner_kwargs)self.binner[level].fit(y_pred)self.binner_intervals_[level]=self.binner[level].intervals_ifstore_in_sample_residuals:rng=np.random.default_rng(seed=random_state)ifself._probabilistic_mode=="binned":data['bin']=self.binner[level].transform(y_pred).astype(int)self.in_sample_residuals_by_bin_[level]=(data.groupby('bin')['residuals'].apply(np.array).to_dict())max_sample=10_000//self.binner[level].n_bins_fork,vinself.in_sample_residuals_by_bin_[level].items():iflen(v)>max_sample:sample=v[rng.integers(low=0,high=len(v),size=max_sample)]self.in_sample_residuals_by_bin_[level][k]=sampleelse:self.in_sample_residuals_by_bin_[level]=Noneiflen(residuals)>10_000:residuals=residuals[rng.integers(low=0,high=len(residuals),size=10_000)]self.in_sample_residuals_[level]=residuals
Predict n steps. The value of steps must be less than or equal to the
value of steps defined when initializing the forecaster. Starts at 1.
If int: Only steps within the range of 1 to int are predicted.
If list: List of ints. Only the steps contained in the list
are predicted.
If None: As many steps are predicted as were defined at
initialization.
None
last_window
pandas Series, pandas DataFrame
Series values used to create the predictors (lags) needed to
predict steps.
If last_window = None, the values stored in self.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
If True, the input is checked for possible warnings and errors
with the check_predict_input function. This argument is created
for internal use and is not recommended to be changed.
def_create_predict_inputs(self,steps:int|list[int]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|None=None,predict_probabilistic:bool=False,use_in_sample_residuals:bool=True,use_binned_residuals:bool=True,check_inputs:bool=True)->tuple[list[np.ndarray],list[str],list[int],pd.Index]:""" Create the inputs needed for the prediction process. Parameters ---------- steps : int, list, None, default None Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. - If `int`: Only steps within the range of 1 to int are predicted. - If `list`: List of ints. Only the steps contained in the list are predicted. - If `None`: As many steps are predicted as were defined at initialization. last_window : pandas Series, pandas DataFrame, default None Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. predict_probabilistic : bool, default False If `True`, the necessary checks for probabilistic predictions will be performed. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. use_binned_residuals : bool, default True If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly. check_inputs : bool, default True If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed. Returns ------- Xs : list List of numpy arrays with the predictors for each step. Xs_col_names : list Names of the columns of the matrix created internally for prediction. steps : list Steps to predict. prediction_index : pandas Index Index of the predictions. """steps=prepare_steps_direct(steps=steps,max_step=self.max_step)iflast_windowisNone:last_window=self.last_window_ifcheck_inputs:check_predict_input(forecaster_name=type(self).__name__,steps=steps,is_fitted=self.is_fitted,exog_in_=self.exog_in_,index_type_=self.index_type_,index_freq_=self.index_freq_,window_size=self.window_size,last_window=last_window,exog=exog,exog_names_in_=self.exog_names_in_,interval=None,max_step=self.max_step,series_names_in_=self.X_train_series_names_in_)ifpredict_probabilistic:check_residuals_input(forecaster_name=type(self).__name__,use_in_sample_residuals=use_in_sample_residuals,in_sample_residuals_=self.in_sample_residuals_,out_sample_residuals_=self.out_sample_residuals_,use_binned_residuals=use_binned_residuals,in_sample_residuals_by_bin_=self.in_sample_residuals_by_bin_,out_sample_residuals_by_bin_=self.out_sample_residuals_by_bin_,levels=[self.level],)last_window=last_window.iloc[-self.window_size:,last_window.columns.get_indexer(self.X_train_series_names_in_)].copy()X_autoreg=[]Xs_col_names=[]forseriesinself.X_train_series_names_in_:last_window_series=transform_numpy(array=last_window[series].to_numpy(),transformer=self.transformer_series_[series],fit=False,inverse_transform=False)ifself.differentiationisnotNone:last_window_series=self.differentiator_[series].fit_transform(last_window_series)ifself.lagsisnotNone:X_lags=last_window_series[-self.lags_[series]]X_autoreg.append(X_lags)Xs_col_names.extend(self.lags_names[series])ifself.window_featuresisnotNone:n_diff=0ifself.differentiationisNoneelseself.differentiationX_window_features=np.concatenate([wf.transform(last_window_series[n_diff:])forwfinself.window_features])X_autoreg.append(X_window_features)# HACK: This is not the best way to do it. Can have any problem# if the window_features are not in the same order as the# self.window_features_names.Xs_col_names.extend([f"{series}_{wf}"forwfinself.window_features_names])X_autoreg=np.concatenate(X_autoreg).reshape(1,-1)ifexogisnotNone:exog=input_to_frame(data=exog,input_name='exog')ifexog.columns.tolist()!=self.exog_names_in_:exog=exog[self.exog_names_in_]exog=transform_dataframe(df=exog,transformer=self.transformer_exog,fit=False,inverse_transform=False)# NOTE: Only check dtypes if they are not the same as seen in trainingifnotexog.dtypes.to_dict()==self.exog_dtypes_out_:check_exog_dtypes(exog=exog)else:check_exog(exog=exog,allow_nan=False)exog_values,_=exog_to_direct_numpy(exog=exog.to_numpy()[:max(steps)],steps=max(steps))exog_values=exog_values[0]n_exog=exog.shape[1]Xs=[np.concatenate([X_autoreg,exog_values[(step-1)*n_exog:step*n_exog].reshape(1,-1)],axis=1)forstepinsteps]# HACK: This is not the best way to do it. Can have any problem# if the exog_columns are not in the same order as the# self.window_features_names.Xs_col_names=Xs_col_names+exog.columns.to_list()else:Xs=[X_autoreg]*len(steps)prediction_index=expand_index(index=last_window.index,steps=max(steps))[np.array(steps)-1]ifisinstance(last_window.index,pd.DatetimeIndex)andnp.array_equal(steps,np.arange(min(steps),max(steps)+1)):prediction_index.freq=last_window.index.freq# HACK: Why no use self.X_train_features_names_out_ as Xs_col_names?returnXs,Xs_col_names,steps,prediction_index
Predict n steps. The value of steps must be less than or equal to the
value of steps defined when initializing the forecaster. Starts at 1.
If int: Only steps within the range of 1 to int are predicted.
If list: List of ints. Only the steps contained in the list
are predicted.
If None: As many steps are predicted as were defined at
initialization.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed to
predict steps.
If last_window = None, the values stored in self.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
If True, the input is checked for possible warnings and errors
with the check_predict_input function. This argument is created
for internal use and is not recommended to be changed.
True
levels
Ignored
Not used, present here for API consistency by convention.
None
Returns:
Name
Type
Description
X_predict
pandas DataFrame
Pandas DataFrame with the predictors for each step. The index
is the same as the prediction index.
Source code in skforecast/direct/_forecaster_direct_multivariate.py
defcreate_predict_X(self,steps:int|list[int]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|None=None,suppress_warnings:bool=False,check_inputs:bool=True,levels:Any=None)->pd.DataFrame:""" Create the predictors needed to predict `steps` ahead. Parameters ---------- steps : int, list, None, default None Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. - If `int`: Only steps within the range of 1 to int are predicted. - If `list`: List of ints. Only the steps contained in the list are predicted. - If `None`: As many steps are predicted as were defined at initialization. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. check_inputs : bool, default True If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed. levels : Ignored Not used, present here for API consistency by convention. Returns ------- X_predict : pandas DataFrame Pandas DataFrame with the predictors for each step. The index is the same as the prediction index. """set_skforecast_warnings(suppress_warnings,action='ignore')(Xs,Xs_col_names,steps,prediction_index)=self._create_predict_inputs(steps=steps,last_window=last_window,exog=exog,check_inputs=check_inputs)X_predict=pd.DataFrame(data=np.concatenate(Xs,axis=0),columns=Xs_col_names,index=prediction_index)X_predict.insert(0,'level',np.tile([self.level],len(steps)))ifself.exog_in_:categorical_features=any(notpd.api.types.is_numeric_dtype(dtype)orpd.api.types.is_bool_dtype(dtype)fordtypeinset(self.exog_dtypes_out_.values()))ifcategorical_features:X_predict=X_predict.astype(self.exog_dtypes_out_)ifself.transformer_seriesisnotNoneorself.differentiationisnotNone:warnings.warn("The output matrix is in the transformed scale due to the ""inclusion of transformations or differentiation in the Forecaster. ""As a result, any predictions generated using this matrix will also ""be in the transformed scale. Please refer to the documentation ""for more details: ""https://skforecast.org/latest/user_guides/training-and-prediction-matrices.html",DataTransformationWarning)set_skforecast_warnings(suppress_warnings,action='default')returnX_predict
Predict n steps. The value of steps must be less than or equal to the
value of steps defined when initializing the forecaster. Starts at 1.
If int: Only steps within the range of 1 to int are predicted.
If list: List of ints. Only the steps contained in the list
are predicted.
If None: As many steps are predicted as were defined at
initialization.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed to
predict steps.
If last_window = None, the values stored in self.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
If True, the input is checked for possible warnings and errors
with the check_predict_input function. This argument is created
for internal use and is not recommended to be changed.
True
levels
Ignored
Not used, present here for API consistency by convention.
None
Returns:
Name
Type
Description
predictions
pandas DataFrame
Long-format DataFrame with the predictions. The columns are level
and pred.
Source code in skforecast/direct/_forecaster_direct_multivariate.py
defpredict(self,steps:int|list[int]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|None=None,suppress_warnings:bool=False,check_inputs:bool=True,levels:Any=None)->pd.DataFrame:""" Predict n steps ahead Parameters ---------- steps : int, list, None, default None Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. - If `int`: Only steps within the range of 1 to int are predicted. - If `list`: List of ints. Only the steps contained in the list are predicted. - If `None`: As many steps are predicted as were defined at initialization. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. check_inputs : bool, default True If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed. levels : Ignored Not used, present here for API consistency by convention. Returns ------- predictions : pandas DataFrame Long-format DataFrame with the predictions. The columns are `level` and `pred`. """set_skforecast_warnings(suppress_warnings,action='ignore')(Xs,_,steps,prediction_index)=self._create_predict_inputs(steps=steps,last_window=last_window,exog=exog,check_inputs=check_inputs,)estimators=[self.estimators_[step]forstepinsteps]withwarnings.catch_warnings():warnings.filterwarnings("ignore",message="X does not have valid feature names",category=UserWarning)predictions=np.array([estimator.predict(X).ravel().item()forestimator,Xinzip(estimators,Xs)])ifself.differentiationisnotNone:predictions=(self.differentiator_[self.level].inverse_transform_next_window(predictions))predictions=transform_numpy(array=predictions,transformer=self.transformer_series_[self.level],fit=False,inverse_transform=True)# TODO: This DataFrame has freq because it only contain 1 level# TODO: Adapt to multiple levels# n_steps, n_levels = predictions.shape# predictions = pd.DataFrame(# {"level": np.tile(levels, n_steps), "pred": predictions.ravel()},# index = np.repeat(prediction_index, n_levels),# )predictions=pd.DataFrame({"level":np.tile([self.level],len(steps)),"pred":predictions},index=prediction_index,)set_skforecast_warnings(suppress_warnings,action='default')returnpredictions
Generate multiple forecasting predictions using a bootstrapping process.
By sampling from a collection of past observed errors (the residuals),
each iteration of bootstrapping generates a different set of predictions.
See the References section for more information.
Predict n steps. The value of steps must be less than or equal to the
value of steps defined when initializing the forecaster. Starts at 1.
If int: Only steps within the range of 1 to int are predicted.
If list: List of ints. Only the steps contained in the list
are predicted.
If None: As many steps are predicted as were defined at
initialization.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed to
predict steps.
If last_window = None, the values stored inself.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
levels
Ignored
Not used, present here for API consistency by convention.
None
Returns:
Name
Type
Description
boot_predictions
pandas DataFrame
Long-format DataFrame with the bootstrapping predictions. The columns
are level, pred_boot_0, pred_boot_1, ..., pred_boot_n_boot.
defpredict_bootstrapping(self,steps:int|list[int]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|None=None,n_boot:int=250,random_state:int=123,use_in_sample_residuals:bool=True,use_binned_residuals:bool=True,suppress_warnings:bool=False,levels:Any=None)->pd.DataFrame:""" Generate multiple forecasting predictions using a bootstrapping process. By sampling from a collection of past observed errors (the residuals), each iteration of bootstrapping generates a different set of predictions. See the References section for more information. Parameters ---------- steps : int, list, None, default None Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. - If `int`: Only steps within the range of 1 to int are predicted. - If `list`: List of ints. Only the steps contained in the list are predicted. - If `None`: As many steps are predicted as were defined at initialization. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in` self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. n_boot : int, default 250 Number of bootstrapping iterations to perform when estimating prediction intervals. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. use_binned_residuals : bool, default True If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly. random_state : int, default 123 Seed for the random number generator to ensure reproducibility. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. levels : Ignored Not used, present here for API consistency by convention. Returns ------- boot_predictions : pandas DataFrame Long-format DataFrame with the bootstrapping predictions. The columns are `level`, `pred_boot_0`, `pred_boot_1`, ..., `pred_boot_n_boot`. References ---------- .. [1] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method """set_skforecast_warnings(suppress_warnings,action='ignore')(Xs,_,steps,prediction_index)=self._create_predict_inputs(steps=steps,last_window=last_window,exog=exog,predict_probabilistic=True,use_in_sample_residuals=use_in_sample_residuals,use_binned_residuals=use_binned_residuals)ifuse_in_sample_residuals:residuals=self.in_sample_residuals_[self.level]residuals_by_bin=self.in_sample_residuals_by_bin_[self.level]else:residuals=self.out_sample_residuals_[self.level]residuals_by_bin=self.out_sample_residuals_by_bin_[self.level]# NOTE: Predictors and residuals are transformed and differentiatedestimators=[self.estimators_[step]forstepinsteps]withwarnings.catch_warnings():warnings.filterwarnings("ignore",message="X does not have valid feature names",category=UserWarning)predictions=np.array([estimator.predict(X).ravel().item()forestimator,Xinzip(estimators,Xs)])rng=np.random.default_rng(seed=random_state)ifnotuse_binned_residuals:sampled_residuals=residuals[rng.integers(low=0,high=residuals.size,size=(len(steps),n_boot))]else:predicted_bins=self.binner[self.level].transform(predictions)sampled_residuals=np.full(shape=(predicted_bins.size,n_boot),fill_value=np.nan,order='C',dtype=float)fori,bininenumerate(predicted_bins):sampled_residuals[i,:]=residuals_by_bin[bin][rng.integers(low=0,high=residuals_by_bin[bin].size,size=n_boot)]boot_predictions=np.tile(predictions,(n_boot,1)).Tboot_columns=[f"pred_boot_{i}"foriinrange(n_boot)]boot_predictions=boot_predictions+sampled_residualsifself.differentiationisnotNone:boot_predictions=(self.differentiator_[self.level].inverse_transform_next_window(boot_predictions))ifself.transformer_series_[self.level]:boot_predictions=np.apply_along_axis(func1d=transform_numpy,axis=0,arr=boot_predictions,transformer=self.transformer_series_[self.level],fit=False,inverse_transform=True)# TODO: This DataFrame has freq because it only contain 1 level# TODO: Adapt to multiple levelsboot_predictions=pd.DataFrame(data=boot_predictions,index=prediction_index,columns=boot_columns)boot_predictions.insert(0,'level',np.tile([self.level],len(steps)))set_skforecast_warnings(suppress_warnings,action='default')returnboot_predictions
Predict n steps. The value of steps must be less than or equal to the
value of steps defined when initializing the forecaster. Starts at 1.
If int: Only steps within the range of 1 to int are predicted.
If list: List of ints. Only the steps contained in the list
are predicted.
If None: As many steps are predicted as were defined at
initialization.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed to
predict steps.
If last_window = None, the values stored inself.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
def_predict_interval_conformal(self,steps:int|list[int]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|None=None,nominal_coverage:float=0.95,use_in_sample_residuals:bool=True,use_binned_residuals:bool=True)->pd.DataFrame:""" Generate prediction intervals using the conformal prediction split method [1]_. Parameters ---------- steps : int, list, None, default None Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. - If `int`: Only steps within the range of 1 to int are predicted. - If `list`: List of ints. Only the steps contained in the list are predicted. - If `None`: As many steps are predicted as were defined at initialization. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in` self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. nominal_coverage : float, default 0.95 Nominal coverage, also known as expected coverage, of the prediction intervals. Must be between 0 and 1. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. use_binned_residuals : bool, default True If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly. Returns ------- predictions : pandas DataFrame Values predicted by the forecaster and their estimated interval. - pred: predictions. - lower_bound: lower bound of the interval. - upper_bound: upper bound of the interval. References ---------- .. [1] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method """(Xs,_,steps,prediction_index)=self._create_predict_inputs(steps=steps,last_window=last_window,exog=exog,predict_probabilistic=True,use_in_sample_residuals=use_in_sample_residuals,use_binned_residuals=use_binned_residuals)ifuse_in_sample_residuals:residuals=self.in_sample_residuals_[self.level]residuals_by_bin=self.in_sample_residuals_by_bin_[self.level]else:residuals=self.out_sample_residuals_[self.level]residuals_by_bin=self.out_sample_residuals_by_bin_[self.level]# NOTE: Predictors and residuals are transformed and differentiated estimators=[self.estimators_[step]forstepinsteps]withwarnings.catch_warnings():warnings.filterwarnings("ignore",message="X does not have valid feature names",category=UserWarning)predictions=np.array([estimator.predict(X).ravel().item()forestimator,Xinzip(estimators,Xs)])ifuse_binned_residuals:correction_factor_by_bin={k:np.quantile(np.abs(v),nominal_coverage)fork,vinresiduals_by_bin.items()}replace_func=np.vectorize(lambdax:correction_factor_by_bin[x])predictions_bin=self.binner[self.level].transform(predictions)correction_factor=replace_func(predictions_bin)else:correction_factor=np.quantile(np.abs(residuals),nominal_coverage)lower_bound=predictions-correction_factorupper_bound=predictions+correction_factorpredictions=np.column_stack([predictions,lower_bound,upper_bound])ifself.differentiationisnotNone:predictions=(self.differentiator_[self.level].inverse_transform_next_window(predictions))ifself.transformer_series_[self.level]:predictions=np.apply_along_axis(func1d=transform_numpy,axis=0,arr=predictions,transformer=self.transformer_series_[self.level],fit=False,inverse_transform=True)predictions=pd.DataFrame(data=predictions,index=prediction_index,columns=["pred","lower_bound","upper_bound"])predictions.insert(0,'level',np.tile([self.level],len(steps)))returnpredictions
Predict n steps ahead and estimate prediction intervals using either
bootstrapping or conformal prediction methods. Refer to the References
section for additional details on these methods.
Predict n steps. The value of steps must be less than or equal to the
value of steps defined when initializing the forecaster. Starts at 1.
If int: Only steps within the range of 1 to int are predicted.
If list: List of ints. Only the steps contained in the list
are predicted.
If None: As many steps are predicted as were defined at
initialization.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed to
predict steps.
If last_window = None, the values stored inself.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
Confidence level of the prediction interval. Interpretation depends
on the method used:
If float, represents the nominal (expected) coverage (between 0
and 1). For instance, interval=0.95 corresponds to [2.5, 97.5]
percentiles.
If list or tuple, defines the exact percentiles to compute, which
must be between 0 and 100 inclusive. For example, interval
of 95% should be as interval = [2.5, 97.5].
When using method='conformal', the interval must be a float or
a list/tuple defining a symmetric interval.
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
levels
Ignored
Not used, present here for API consistency by convention.
None
Returns:
Name
Type
Description
predictions
pandas DataFrame
Long-format DataFrame with the predictions and the lower and upper
bounds of the estimated interval. The columns are level, pred,
lower_bound, upper_bound.
defpredict_interval(self,steps:int|list[int]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|None=None,method:str='conformal',interval:float|list[float]|tuple[float]=[5,95],n_boot:int=250,use_in_sample_residuals:bool=True,use_binned_residuals:bool=True,random_state:int=123,suppress_warnings:bool=False,levels:Any=None)->pd.DataFrame:""" Predict n steps ahead and estimate prediction intervals using either bootstrapping or conformal prediction methods. Refer to the References section for additional details on these methods. Parameters ---------- steps : int, list, None, default None Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. - If `int`: Only steps within the range of 1 to int are predicted. - If `list`: List of ints. Only the steps contained in the list are predicted. - If `None`: As many steps are predicted as were defined at initialization. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in` self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. method : str, default 'conformal' Technique used to estimate prediction intervals. Available options: - 'bootstrapping': Bootstrapping is used to generate prediction intervals [1]_. - 'conformal': Employs the conformal prediction split method for interval estimation [2]_. interval : float, list, tuple, default [5, 95] Confidence level of the prediction interval. Interpretation depends on the method used: - If `float`, represents the nominal (expected) coverage (between 0 and 1). For instance, `interval=0.95` corresponds to `[2.5, 97.5]` percentiles. - If `list` or `tuple`, defines the exact percentiles to compute, which must be between 0 and 100 inclusive. For example, interval of 95% should be as `interval = [2.5, 97.5]`. - When using `method='conformal'`, the interval must be a float or a list/tuple defining a symmetric interval. n_boot : int, default 250 Number of bootstrapping iterations to perform when estimating prediction intervals. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. use_binned_residuals : bool, default True If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly. random_state : int, default 123 Seed for the random number generator to ensure reproducibility. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. levels : Ignored Not used, present here for API consistency by convention. Returns ------- predictions : pandas DataFrame Long-format DataFrame with the predictions and the lower and upper bounds of the estimated interval. The columns are `level`, `pred`, `lower_bound`, `upper_bound`. References ---------- .. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html .. [2] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method """set_skforecast_warnings(suppress_warnings,action='ignore')ifmethod=="bootstrapping":ifisinstance(interval,(list,tuple)):check_interval(interval=interval,ensure_symmetric_intervals=False)interval=np.array(interval)/100else:check_interval(alpha=interval,alpha_literal='interval')interval=np.array([0.5-interval/2,0.5+interval/2])boot_predictions=self.predict_bootstrapping(steps=steps,last_window=last_window,exog=exog,n_boot=n_boot,random_state=random_state,use_in_sample_residuals=use_in_sample_residuals,use_binned_residuals=use_binned_residuals)predictions=self.predict(steps=steps,last_window=last_window,exog=exog,check_inputs=False)boot_predictions[['lower_bound','upper_bound']]=(boot_predictions.iloc[:,1:].quantile(q=interval,axis=1).transpose())predictions=pd.concat([predictions,boot_predictions[['lower_bound','upper_bound']]],axis=1)elifmethod=='conformal':ifisinstance(interval,(list,tuple)):check_interval(interval=interval,ensure_symmetric_intervals=True)nominal_coverage=(interval[1]-interval[0])/100else:check_interval(alpha=interval,alpha_literal='interval')nominal_coverage=intervalpredictions=self._predict_interval_conformal(steps=steps,last_window=last_window,exog=exog,nominal_coverage=nominal_coverage,use_in_sample_residuals=use_in_sample_residuals,use_binned_residuals=use_binned_residuals)else:raiseValueError(f"Invalid `method` '{method}'. Choose 'bootstrapping' or 'conformal'.")set_skforecast_warnings(suppress_warnings,action='default')returnpredictions
Predict n steps. The value of steps must be less than or equal to the
value of steps defined when initializing the forecaster. Starts at 1.
If int: Only steps within the range of 1 to int are predicted.
If list: List of ints. Only the steps contained in the list
are predicted.
If None: As many steps are predicted as were defined at
initialization.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed to
predict steps.
If last_window = None, the values stored inself.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
Sequence of quantiles to compute, which must be between 0 and 1
inclusive. For example, quantiles of 0.05, 0.5 and 0.95 should be as
quantiles = [0.05, 0.5, 0.95].
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
levels
Ignored
Not used, present here for API consistency by convention.
None
Returns:
Name
Type
Description
predictions
pandas DataFrame
Long-format DataFrame with the quantiles predicted by the forecaster.
For example, if quantiles = [0.05, 0.5, 0.95], the columns are
level, q_0.05, q_0.5, q_0.95.
defpredict_quantiles(self,steps:int|list[int]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|None=None,quantiles:list[float]|tuple[float]=[0.05,0.5,0.95],n_boot:int=250,use_in_sample_residuals:bool=True,use_binned_residuals:bool=True,random_state:int=123,suppress_warnings:bool=False,levels:Any=None)->pd.DataFrame:""" Bootstrapping based predicted quantiles. Parameters ---------- steps : int, list, None, default None Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. - If `int`: Only steps within the range of 1 to int are predicted. - If `list`: List of ints. Only the steps contained in the list are predicted. - If `None`: As many steps are predicted as were defined at initialization. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in` self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. quantiles : list, tuple, default [0.05, 0.5, 0.95] Sequence of quantiles to compute, which must be between 0 and 1 inclusive. For example, quantiles of 0.05, 0.5 and 0.95 should be as `quantiles = [0.05, 0.5, 0.95]`. n_boot : int, default 250 Number of bootstrapping iterations to perform when estimating quantiles. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. use_binned_residuals : bool, default True If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly. random_state : int, default 123 Seed for the random number generator to ensure reproducibility. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. levels : Ignored Not used, present here for API consistency by convention. Returns ------- predictions : pandas DataFrame Long-format DataFrame with the quantiles predicted by the forecaster. For example, if `quantiles = [0.05, 0.5, 0.95]`, the columns are `level`, `q_0.05`, `q_0.5`, `q_0.95`. References ---------- .. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html """set_skforecast_warnings(suppress_warnings,action='ignore')check_interval(quantiles=quantiles)predictions=self.predict_bootstrapping(steps=steps,last_window=last_window,exog=exog,n_boot=n_boot,random_state=random_state,use_in_sample_residuals=use_in_sample_residuals,use_binned_residuals=use_binned_residuals)quantiles_cols=[f'q_{q}'forqinquantiles]predictions[quantiles_cols]=(predictions.iloc[:,1:].quantile(q=quantiles,axis=1).transpose())predictions=predictions[['level']+quantiles_cols]set_skforecast_warnings(suppress_warnings,action='default')returnpredictions
Fit a given probability distribution for each step. After generating
multiple forecasting predictions through a bootstrapping process, each
step is fitted to the given distribution.
Predict n steps. The value of steps must be less than or equal to the
value of steps defined when initializing the forecaster. Starts at 1.
If int: Only steps within the range of 1 to int are predicted.
If list: List of ints. Only the steps contained in the list
are predicted.
If None: As many steps are predicted as were defined at
initialization.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed to
predict steps.
If last_window = None, the values stored inself.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
levels
Ignored
Not used, present here for API consistency by convention.
None
Returns:
Name
Type
Description
predictions
pandas DataFrame
Long-format DataFrame with the parameters of the fitted distribution
for each step. The columns are level, param_0, param_1, ...,
param_n, where param_i are the parameters of the distribution.
defpredict_dist(self,distribution:object,steps:int|list[int]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|None=None,n_boot:int=250,use_in_sample_residuals:bool=True,use_binned_residuals:bool=True,random_state:int=123,suppress_warnings:bool=False,levels:Any=None)->pd.DataFrame:""" Fit a given probability distribution for each step. After generating multiple forecasting predictions through a bootstrapping process, each step is fitted to the given distribution. Parameters ---------- distribution : object A distribution object from scipy.stats with methods `_pdf` and `fit`. For example scipy.stats.norm. steps : int, list, None, default None Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. - If `int`: Only steps within the range of 1 to int are predicted. - If `list`: List of ints. Only the steps contained in the list are predicted. - If `None`: As many steps are predicted as were defined at initialization. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in` self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. n_boot : int, default 250 Number of bootstrapping iterations to perform when estimating prediction intervals. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. use_binned_residuals : bool, default True If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly. random_state : int, default 123 Seed for the random number generator to ensure reproducibility. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. levels : Ignored Not used, present here for API consistency by convention. Returns ------- predictions : pandas DataFrame Long-format DataFrame with the parameters of the fitted distribution for each step. The columns are `level`, `param_0`, `param_1`, ..., `param_n`, where `param_i` are the parameters of the distribution. References ---------- .. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html """ifnothasattr(distribution,"_pdf")ornotcallable(getattr(distribution,"fit",None)):raiseTypeError("`distribution` must be a valid probability distribution object ""from scipy.stats, with methods `_pdf` and `fit`.")set_skforecast_warnings(suppress_warnings,action='ignore')predictions=self.predict_bootstrapping(steps=steps,last_window=last_window,exog=exog,n_boot=n_boot,random_state=random_state,use_in_sample_residuals=use_in_sample_residuals,use_binned_residuals=use_binned_residuals)param_names=[pforpininspect.signature(distribution._pdf).parametersifnotp=="x"]+["loc","scale"]predictions[param_names]=(predictions.iloc[:,1:].apply(lambdax:distribution.fit(x),axis=1,result_type='expand'))predictions=predictions[['level']+param_names]set_skforecast_warnings(suppress_warnings,action='default')returnpredictions
Set new values to the parameters of the scikit-learn model stored in the
forecaster. It is important to note that all models share the same
configuration of parameters and hyperparameters.
defset_params(self,params:dict[str,object])->None:""" Set new values to the parameters of the scikit-learn model stored in the forecaster. It is important to note that all models share the same configuration of parameters and hyperparameters. Parameters ---------- params : dict Parameters values. Returns ------- None """self.estimator=clone(self.estimator)self.estimator.set_params(**params)self.estimators_={step:clone(self.estimator)forstepinself.steps}
defset_fit_kwargs(self,fit_kwargs:dict[str,object])->None:""" Set new values for the additional keyword arguments passed to the `fit` method of the estimator. Parameters ---------- fit_kwargs : dict Dict of the form {"argument": new_value}. Returns ------- None """self.fit_kwargs=check_select_fit_kwargs(self.estimator,fit_kwargs=fit_kwargs)
defset_lags(self,lags:int|list[int]|np.ndarray[int]|range[int]|dict[str,int|list]|None=None,)->None:""" Set new value to the attribute `lags`. Attributes `lags_names`, `max_lag` and `window_size` are also updated. Parameters ---------- lags : int, list, numpy ndarray, range, dict, default None Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. - `int`: include lags from 1 to `lags` (included). - `list`, `1d numpy ndarray` or `range`: include only lags present in `lags`, all elements must be int. - `dict`: create different lags for each series. {'series_column_name': lags}. - `None`: no lags are included as predictors. Returns ------- None """ifself.window_featuresisNoneandlagsisNone:raiseValueError("At least one of the arguments `lags` or `window_features` ""must be different from None. This is required to create the ""predictors used in training the forecaster.")ifisinstance(lags,dict):self.lags={}self.lags_names={}list_max_lags=[]forkeyinlags:iflags[key]isNone:self.lags[key]=Noneself.lags_names[key]=Noneelse:self.lags[key],lags_names,max_lag=initialize_lags(forecaster_name=type(self).__name__,lags=lags[key])self.lags_names[key]=([f'{key}_{lag}'forlaginlags_names]iflags_namesisnotNoneelseNone)ifmax_lagisnotNone:list_max_lags.append(max_lag)self.max_lag=max(list_max_lags)iflen(list_max_lags)!=0elseNoneelse:self.lags,self.lags_names,self.max_lag=initialize_lags(forecaster_name=type(self).__name__,lags=lags)# Repeated here in case of lags is a dict with all values as Noneifself.window_featuresisNoneand(lagsisNoneorself.max_lagisNone):raiseValueError("At least one of the arguments `lags` or `window_features` ""must be different from None. This is required to create the ""predictors used in training the forecaster.")self.window_size=max([wsforwsin[self.max_lag,self.max_size_window_features]ifwsisnotNone])ifself.differentiationisnotNone:self.window_size+=self.differentiationself.differentiator.set_params(window_size=self.window_size)
Set new value to the attribute window_features. Attributes
max_size_window_features, window_features_names,
window_features_class_names and window_size are also updated.
Instance or list of instances used to create window features. Window features
are created from the original time series and are included as predictors.
None
Returns:
Type
Description
None
Source code in skforecast/direct/_forecaster_direct_multivariate.py
defset_window_features(self,window_features:object|list[object]|None=None)->None:""" Set new value to the attribute `window_features`. Attributes `max_size_window_features`, `window_features_names`, `window_features_class_names` and `window_size` are also updated. Parameters ---------- window_features : object, list, default None Instance or list of instances used to create window features. Window features are created from the original time series and are included as predictors. Returns ------- None """ifwindow_featuresisNoneandself.max_lagisNone:raiseValueError("At least one of the arguments `lags` or `window_features` ""must be different from None. This is required to create the ""predictors used in training the forecaster.")self.window_features,self.window_features_names,self.max_size_window_features=(initialize_window_features(window_features))self.window_features_class_names=Noneifwindow_featuresisnotNone:self.window_features_class_names=[type(wf).__name__forwfinself.window_features]self.window_size=max([wsforwsin[self.max_lag,self.max_size_window_features]ifwsisnotNone])ifself.differentiationisnotNone:self.window_size+=self.differentiationself.differentiator.set_params(window_size=self.window_size)
Set in-sample residuals in case they were not calculated during the
training process.
In-sample residuals are calculated as the difference between the true
values and the predictions made by the forecaster using the training
data. The following internal attributes are updated:
in_sample_residuals_: Dictionary containing a numpy ndarray with the
residuals for each series in the form {series: residuals}.
binner_intervals_: intervals used to bin the residuals are calculated
using the quantiles of the predicted values.
in_sample_residuals_by_bin_: residuals are binned according to the
predicted value they are associated with and stored in a dictionary, where
the keys are the intervals of the predicted values and the values are
the residuals associated with that range.
A total of 10_000 residuals are stored in the attribute in_sample_residuals_.
If the number of residuals is greater than 10_000, a random sample of
10_000 residuals is stored. The number of residuals stored per bin is
limited to 10_000 // self.binner.n_bins_.
Parameters:
Name
Type
Description
Default
series
pandas DataFrame
Training time series.
required
exog
pandas Series, pandas DataFrame
Exogenous variable/s included as predictor/s. Must have the same
number of observations as y and their indexes must be aligned so
that y[i] is regressed on exog[i].
If True, skforecast warnings will be suppressed during the sampling
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
Returns:
Type
Description
None
Source code in skforecast/direct/_forecaster_direct_multivariate.py
defset_in_sample_residuals(self,series:pd.DataFrame,exog:pd.Series|pd.DataFrame|None=None,random_state:int=123,suppress_warnings:bool=False)->None:""" Set in-sample residuals in case they were not calculated during the training process. In-sample residuals are calculated as the difference between the true values and the predictions made by the forecaster using the training data. The following internal attributes are updated: + `in_sample_residuals_`: Dictionary containing a numpy ndarray with the residuals for each series in the form `{series: residuals}`. + `binner_intervals_`: intervals used to bin the residuals are calculated using the quantiles of the predicted values. + `in_sample_residuals_by_bin_`: residuals are binned according to the predicted value they are associated with and stored in a dictionary, where the keys are the intervals of the predicted values and the values are the residuals associated with that range. A total of 10_000 residuals are stored in the attribute `in_sample_residuals_`. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. The number of residuals stored per bin is limited to `10_000 // self.binner.n_bins_`. Parameters ---------- series : pandas DataFrame Training time series. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. Must have the same number of observations as `y` and their indexes must be aligned so that y[i] is regressed on exog[i]. random_state : int, default 123 Sets a seed to the random sampling for reproducible output. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the sampling process. See skforecast.exceptions.warn_skforecast_categories for more information. Returns ------- None """set_skforecast_warnings(suppress_warnings,action='ignore')ifnotself.is_fitted:raiseNotFittedError("This forecaster is not fitted yet. Call `fit` with appropriate ""arguments before using `set_in_sample_residuals()`.")check_y(y=series[self.level],series_id='`series`')series_index_range=check_extract_values_and_index(data=series,data_label='`series`',return_values=False)[1][[0,-1]]ifnotseries_index_range.equals(self.training_range_):raiseIndexError(f"The index range of `series` does not match the range "f"used during training. Please ensure the index is aligned "f"with the training data.\n"f" Expected : {self.training_range_}\n"f" Received : {series_index_range}")# NOTE: This attributes are modified in _create_train_X_y, store original valuesoriginal_exog_in_=self.exog_in_original_X_train_window_features_names_out_=self.X_train_window_features_names_out_original_X_train_direct_exog_names_out_=self.X_train_direct_exog_names_out_(X_train,y_train,_,_,_,_,X_train_features_names_out_,*_)=self._create_train_X_y(series=series,exog=exog)ifnotX_train_features_names_out_==self.X_train_features_names_out_:# NOTE: Reset attributes modified in _create_train_X_y to their original valuesself.exog_in_=original_exog_in_self.X_train_window_features_names_out_=original_X_train_window_features_names_out_self.X_train_direct_exog_names_out_=original_X_train_direct_exog_names_out_raiseValueError(f"Feature mismatch detected after matrix creation. The features "f"generated from the provided data do not match those used during "f"the training process. To correctly set in-sample residuals, "f"ensure that the same data and preprocessing steps are applied.\n"f" Expected output : {self.X_train_features_names_out_}\n"f" Current output : {X_train_features_names_out_}")y_true_steps=[]y_pred_steps=[]self.in_sample_residuals_={}forstepinself.steps:X_train_step,y_train_step=self.filter_train_X_y_for_step(step=step,X_train=X_train,y_train=y_train,remove_suffix=True)y_true_steps.append(y_train_step.to_numpy())y_pred_steps.append(self.estimators_[step].predict(X_train_step))self._binning_in_sample_residuals(level=self.level,y_true=np.concatenate(y_true_steps),y_pred=np.concatenate(y_pred_steps),store_in_sample_residuals=True,random_state=random_state)# NOTE: Reset attributes modified in _create_train_X_y to their original valuesself.exog_in_=original_exog_in_self.X_train_window_features_names_out_=original_X_train_window_features_names_out_self.X_train_direct_exog_names_out_=original_X_train_direct_exog_names_out_set_skforecast_warnings(suppress_warnings,action='default')
Set new values to the attribute out_sample_residuals_. Out of sample
residuals are meant to be calculated using observations that did not
participate in the training process. y_true and y_pred are expected
to be in the original scale of the time series. Residuals are calculated
as y_true - y_pred, after applying the necessary transformations and
differentiations if the forecaster includes them (self.transformer_series
and self.differentiation).
A total of 10_000 residuals are stored in the attribute out_sample_residuals_.
If the number of residuals is greater than 10_000, a random sample of
10_000 residuals is stored.
If True, new residuals are added to the once already stored in the
attribute out_sample_residuals_. If after appending the new residuals,
the limit of 10_000 samples is exceeded, a random sample of 10_000 is
kept.
defset_out_sample_residuals(self,y_true:dict[str,np.ndarray|pd.Series],y_pred:dict[str,np.ndarray|pd.Series],append:bool=False,random_state:int=123)->None:""" Set new values to the attribute `out_sample_residuals_`. Out of sample residuals are meant to be calculated using observations that did not participate in the training process. `y_true` and `y_pred` are expected to be in the original scale of the time series. Residuals are calculated as `y_true` - `y_pred`, after applying the necessary transformations and differentiations if the forecaster includes them (`self.transformer_series` and `self.differentiation`). A total of 10_000 residuals are stored in the attribute `out_sample_residuals_`. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. Parameters ---------- y_true : dict Dictionary of numpy ndarrays or pandas Series with the true values of the time series for each series in the form {series: y_true}. y_pred : dict Dictionary of numpy ndarrays or pandas Series with the predicted values of the time series for each series in the form {series: y_pred}. append : bool, default False If `True`, new residuals are added to the once already stored in the attribute `out_sample_residuals_`. If after appending the new residuals, the limit of 10_000 samples is exceeded, a random sample of 10_000 is kept. random_state : int, default 123 Sets a seed to the random sampling for reproducible output. Returns ------- None """ifnotself.is_fitted:raiseNotFittedError("This forecaster is not fitted yet. Call `fit` with appropriate ""arguments before using `set_out_sample_residuals()`.")ifnotisinstance(y_true,dict):raiseTypeError(f"`y_true` must be a dictionary of numpy ndarrays or pandas Series. "f"Got {type(y_true)}.")ifnotisinstance(y_pred,dict):raiseTypeError(f"`y_pred` must be a dictionary of numpy ndarrays or pandas Series. "f"Got {type(y_pred)}.")ifnotset(y_true.keys())==set(y_pred.keys()):raiseValueError(f"`y_true` and `y_pred` must have the same keys. "f"Got {set(y_true.keys())} and {set(y_pred.keys())}.")forkiny_true.keys():ifnotisinstance(y_true[k],(np.ndarray,pd.Series)):raiseTypeError(f"Values of `y_true` must be numpy ndarrays or pandas Series. "f"Got {type(y_true[k])} for series {k}.")ifnotisinstance(y_pred[k],(np.ndarray,pd.Series)):raiseTypeError(f"Values of `y_pred` must be numpy ndarrays or pandas Series. "f"Got {type(y_pred[k])} for series {k}.")iflen(y_true[k])!=len(y_pred[k]):raiseValueError(f"`y_true` and `y_pred` must have the same length. "f"Got {len(y_true[k])} and {len(y_pred[k])} for series {k}.")ifisinstance(y_true[k],pd.Series)andisinstance(y_pred[k],pd.Series):ifnoty_true[k].index.equals(y_pred[k].index):raiseValueError(f"When containing pandas Series, elements in `y_true` and "f"`y_pred` must have the same index. Error in series {k}.")ifnotset(y_pred.keys())=={self.level}:raiseValueError(f"`y_pred` and `y_true` must have only the key '{self.level}'. "f"Got {set(y_pred.keys())}.")y_true=deepcopy(y_true[self.level])y_pred=deepcopy(y_pred[self.level])ifnotisinstance(y_pred,np.ndarray):y_pred=y_pred.to_numpy()ifnotisinstance(y_true,np.ndarray):y_true=y_true.to_numpy()ifself.transformer_series:y_true=transform_numpy(array=y_true,transformer=self.transformer_series_[self.level],fit=False,inverse_transform=False)y_pred=transform_numpy(array=y_pred,transformer=self.transformer_series_[self.level],fit=False,inverse_transform=False)ifself.differentiationisnotNone:differentiator=copy(self.differentiator)differentiator.set_params(window_size=None)y_true=differentiator.fit_transform(y_true)[self.differentiation:]y_pred=differentiator.fit_transform(y_pred)[self.differentiation:]data=pd.DataFrame({'prediction':y_pred,'residuals':y_true-y_pred}).dropna()y_pred=data['prediction'].to_numpy()residuals=data['residuals'].to_numpy()data['bin']=self.binner[self.level].transform(y_pred).astype(int)residuals_by_bin=data.groupby('bin')['residuals'].apply(np.array).to_dict()ifself.out_sample_residuals_isNone:self.out_sample_residuals_={self.level:None}self.out_sample_residuals_by_bin_={self.level:None}out_sample_residuals=(np.array([])ifself.out_sample_residuals_[self.level]isNoneelseself.out_sample_residuals_[self.level])out_sample_residuals_by_bin=({}ifself.out_sample_residuals_by_bin_[self.level]isNoneelseself.out_sample_residuals_by_bin_[self.level])ifappend:out_sample_residuals=np.concatenate([out_sample_residuals,residuals])fork,vinresiduals_by_bin.items():ifkinout_sample_residuals_by_bin:out_sample_residuals_by_bin[k]=np.concatenate((out_sample_residuals_by_bin[k],v))else:out_sample_residuals_by_bin[k]=velse:out_sample_residuals=residualsout_sample_residuals_by_bin=residuals_by_binmax_samples=10_000//self.binner[self.level].n_bins_rng=np.random.default_rng(seed=random_state)fork,vinout_sample_residuals_by_bin.items():iflen(v)>max_samples:sample=rng.choice(a=v,size=max_samples,replace=False)out_sample_residuals_by_bin[k]=sampleforkinself.binner_intervals_.get(self.level,{}).keys():ifknotinout_sample_residuals_by_bin:out_sample_residuals_by_bin[k]=np.array([])empty_bins=[kfork,vinout_sample_residuals_by_bin.items()ifv.size==0]ifempty_bins:warnings.warn(f"The following bins of level '{self.level}' have no out of sample residuals: "f"{empty_bins}. No predicted values fall in the interval "f"{[self.binner_intervals_[self.level][bin]forbininempty_bins]}. "f"Empty bins will be filled with a random sample of residuals.",ResidualsUsageWarning)empty_bin_size=min(max_samples,len(out_sample_residuals))forkinempty_bins:out_sample_residuals_by_bin[k]=rng.choice(a=out_sample_residuals,size=empty_bin_size,replace=False)iflen(out_sample_residuals)>10_000:out_sample_residuals=rng.choice(a=out_sample_residuals,size=10_000,replace=False)self.out_sample_residuals_[self.level]=out_sample_residualsself.out_sample_residuals_by_bin_[self.level]=out_sample_residuals_by_bin
Return feature importance of the model stored in the forecaster for a
specific step. Since a separate model is created for each forecast time
step, it is necessary to select the model from which retrieve information.
Only valid when estimator stores internally the feature importances in
the attribute feature_importances_ or coef_. Otherwise, it returns None.
defget_feature_importances(self,step:int,sort_importance:bool=True)->pd.DataFrame:""" Return feature importance of the model stored in the forecaster for a specific step. Since a separate model is created for each forecast time step, it is necessary to select the model from which retrieve information. Only valid when estimator stores internally the feature importances in the attribute `feature_importances_` or `coef_`. Otherwise, it returns `None`. Parameters ---------- step : int Model from which retrieve information (a separate model is created for each forecast time step). First step is 1. sort_importance: bool, default True If `True`, sorts the feature importances in descending order. Returns ------- feature_importances : pandas DataFrame Feature importances associated with each predictor. """ifnotisinstance(step,int):raiseTypeError(f"`step` must be an integer. Got {type(step)}.")ifnotself.is_fitted:raiseNotFittedError("This forecaster is not fitted yet. Call `fit` with appropriate ""arguments before using `get_feature_importances()`.")if(step<1)or(step>self.max_step):raiseValueError(f"The step must have a value from 1 to the maximum number of steps "f"({self.max_step}). Got {step}.")ifisinstance(self.estimator,Pipeline):estimator=self.estimators_[step][-1]else:estimator=self.estimators_[step]n_lags=len(list(chain(*[vforvinself.lags_.values()ifvisnotNone])))n_window_features=(len(self.X_train_window_features_names_out_)ifself.window_featuresisnotNoneelse0)idx_columns_autoreg=np.arange(n_lags+n_window_features)ifself.exog_in_:idx_columns_exog=np.flatnonzero([name.endswith(f"step_{step}")fornameinself.X_train_features_names_out_])else:idx_columns_exog=np.array([],dtype=int)idx_columns=np.concatenate((idx_columns_autoreg,idx_columns_exog))idx_columns=[int(x)forxinidx_columns]# Required since numpy 2.0feature_names=[self.X_train_features_names_out_[i].replace(f"_step_{step}","")foriinidx_columns]ifhasattr(estimator,'feature_importances_'):feature_importances=estimator.feature_importances_elifhasattr(estimator,'coef_'):feature_importances=estimator.coef_else:warnings.warn(f"Impossible to access feature importances for estimator of type "f"{type(estimator)}. This method is only valid when the "f"estimator stores internally the feature importances in the "f"attribute `feature_importances_` or `coef_`.")feature_importances=Noneiffeature_importancesisnotNone:feature_importances=pd.DataFrame({'feature':feature_names,'importance':feature_importances})ifsort_importance:feature_importances=feature_importances.sort_values(by='importance',ascending=False)returnfeature_importances