Instance or list of instances used to create window features. Window features
are created from the original time series and are included as predictors.
If 'ordinal', a single column is created with integer values from 0
to n_series - 1.
If 'ordinal_category', a single column is created with integer
values from 0 to n_series - 1 and the column is transformed into
pandas.category dtype so that it can be used as a categorical variable.
If 'onehot', a binary column is created for each series.
If None, no column is created to identify the series. Internally, the
series are identified as an integer from 0 to n_series - 1, but no column
is created in the training matrices.
Changed to 'ordinal' in version 0.14.0
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API with methods: fit, transform, fit_transform and
inverse_transform. Transformation is applied to each series before training
the forecaster. ColumnTransformers are not allowed since they do not have
inverse_transform method.
If single transformer: it is cloned and applied to all series.
If dict of transformers: a different transformer can be used for each series.
None
transformer_exog
transformer
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API. The transformation is applied to exog before training the
forecaster. inverse_transform is not available when using ColumnTransformers.
Function that defines the individual weights for each sample based on the
index. For example, a function that assigns a lower weight to certain dates.
Ignored if estimator does not have the argument sample_weight in its
fit method. See Notes section for more details on the use of the weights.
If single function: it is applied to all series.
If dict {'series_column_name' : Callable}: a different function can be
used for each series, a weight of 1 is given to all series not present in
weight_func.
Weights associated with each series {'series_column_name' : float}. It is only
applied if the estimator used accepts sample_weight in its fit method.
See Notes section for more details on the use of the weights.
If a dict is provided, a weight of 1 is given to all series not present
in series_weights.
Order of differencing applied to the time series before training the forecaster.
The order of differentiation is the number of times the differencing operation
is applied to a time series. Differencing involves computing the differences
between consecutive data points in the series. Before returning a prediction,
the differencing operation is reversed.
If int, the same order of differentiation is applied to all series.
If dict, a different order of differentiation (including None) can
be used for each series. The keys must be the names of the series used
to fit the forecaster. If a series is not present in the dictionary, no
differencing is applied.
Additional arguments to pass to the QuantileBinner used to discretize
the residuals into k bins according to the predicted values associated
with each residual. Available arguments are: n_bins, method, subsample,
random_state and dtype. Argument method is passed internally to the
function numpy.percentile.
New in version 0.14.0
The window size needed to create the predictors. It is calculated as the
maximum value between max_lag and max_size_window_features. If
differentiation is used, window_size is increased by n units equal to
the order of differentiation so that predictors can be generated correctly.
If 'ordinal', a single column is created with integer values from 0
to n_series - 1.
If 'ordinal_category', a single column is created with integer
values from 0 to n_series - 1 and the column is transformed into
pandas.category dtype so that it can be used as a categorical variable.
If 'onehot', a binary column is created for each series.
If None, no column is created to identify the series. Internally, the
series are identified as an integer from 0 to n_series - 1, but no column
is created in the training matrices.
Changed to 'ordinal' in version 0.14.0
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API with methods: fit, transform, fit_transform and
inverse_transform. Transformation is applied to each series before training
the forecaster. ColumnTransformers are not allowed since they do not have
inverse_transform method.
If single transformer: it is cloned and applied to all series.
If dict of transformers: a different transformer can be used for each series.
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API. The transformation is applied to exog before training the
forecaster. inverse_transform is not available when using ColumnTransformers.
Function that defines the individual weights for each sample based on the
index. For example, a function that assigns a lower weight to certain dates.
Ignored if estimator does not have the argument sample_weight in its
fit method. See Notes section for more details on the use of the weights.
If single function: it is applied to all series.
If dict {'series_column_name' : Callable}: a different function can be
used for each series, a weight of 1 is given to all series not present
in weight_func.
Weights associated with each series {'series_column_name' : float}. It is only
applied if the estimator used accepts sample_weight in its fit method.
See Notes section for more details on the use of the weights.
If a dict is provided, a weight of 1 is given to all series not present
in series_weights.
Last window of training data for each series. It stores the values
needed to predict the next step immediately after the training data.
These values are stored in the original scale of the time series before
undergoing any transformations or differentiation. When differentiation
parameter is specified, the dimensions of the last_window_ are expanded
as many values as the order of differentiation. For example, if
lags = 7 and differentiation = 1, last_window_ will have 8 values.
Type of each exogenous variable/s used in training before the transformation
applied by transformer_exog. If transformer_exog is not used, it
is equal to exog_dtypes_out_.
Type of each exogenous variable/s used in training after the transformation
applied by transformer_exog. If transformer_exog is not used, it
is equal to exog_dtypes_in_.
Names of the series (levels) included in the matrix X_train created
internally for training. It can be different from series_names_in_ if
some series are dropped during the training process because of NaNs or
because they are not present in the training period.
Names of the exogenous variables included in the matrix X_train created
internally for training. It can be different from exog_names_in_ if
some exogenous variables are transformed during the training process.
Residuals of the model when predicting training data. Only stored up
to 10_000 values per series in the form {series: residuals}. If
transformer_series is not None, residuals are stored in the
transformed scale. If differentiation is not None, residuals are
stored after differentiation.
In sample residuals binned according to the predicted value each residual
is associated with. The number of residuals stored per bin is limited to
10_000 // self.binner.n_bins_ per series in the form {series: residuals}.
If transformer_series is not None, residuals are stored in the
transformed scale. If differentiation is not None, residuals are
stored after differentiation.
New in version 0.15.0
Residuals of the model when predicting non-training data. Only stored up
to 10_000 values per series in the form {series: residuals}. Use
set_out_sample_residuals() method to set values. If transformer_series
is not None, residuals are stored in the transformed scale. If
differentiation is not None, residuals are stored after differentiation.
Out of sample residuals binned according to the predicted value each residual
is associated with. The number of residuals stored per bin is limited to
10_000 // self.binner.n_bins_ per series in the form {series: residuals}.
If transformer_series is not None, residuals are stored in the
transformed scale. If differentiation is not None, residuals are
stored after differentiation.
New in version 0.15.0
Dictionary of skforecast.preprocessing.QuantileBinner used to discretize
residuals of each series into k bins according to the predicted values
associated with each residual. In the form {series: binner}.
New in version 0.15.0
Intervals used to discretize residuals into k bins according to the predicted
values associated with each residual. In the form {series: binner_intervals_}.
New in version 0.15.0
Private attribute used to indicate whether the forecaster should perform
some calculations during backtesting.
Notes
The weights are used to control the influence that each observation has on the
training of the model. ForecasterRecursiveMultiSeries accepts two types of weights.
If the two types of weights are indicated, they are multiplied to create the final
weights. The resulting sample_weight cannot have negative values.
series_weights : controls the relative importance of each series. If a
series has twice as much weight as the others, the observations of that series
influence the training twice as much. The higher the weight of a series
relative to the others, the more the model will focus on trying to learn
that series.
weight_func : controls the relative importance of each observation
according to its index value. For example, a function that assigns a lower
weight to certain dates.
def__init__(self,estimator:object=None,lags:int|list[int]|np.ndarray[int]|range[int]|None=None,window_features:object|list[object]|None=None,encoding:str|None='ordinal',transformer_series:object|dict[str,object]|None=None,transformer_exog:object|None=None,weight_func:Callable|dict[str,Callable]|None=None,series_weights:dict[str,float]|None=None,differentiation:int|dict[str,int|None]|None=None,dropna_from_series:bool=False,fit_kwargs:dict[str,object]|None=None,binner_kwargs:dict[str,object]|None=None,forecaster_id:str|int|None=None,regressor:object=None)->None:self.estimator=copy(initialize_estimator(estimator,regressor))self.encoding=encodingself.encoder=Noneself.encoding_mapping_={}self.transformer_series=transformer_seriesself.transformer_series_=Noneself.transformer_exog=transformer_exogself.weight_func=weight_funcself.weight_func_=Noneself.source_code_weight_func=Noneself.series_weights=series_weightsself.series_weights_=Noneself.differentiation=differentiationself.differentiation_max=Noneself.differentiator=Noneself.differentiator_=Noneself.dropna_from_series=dropna_from_seriesself.last_window_=Noneself.index_type_=Noneself.index_freq_=Noneself.training_range_=Noneself.series_names_in_=Noneself.exog_in_=Falseself.exog_names_in_=Noneself.exog_type_in_=Noneself.exog_dtypes_in_=Noneself.exog_dtypes_out_=Noneself.X_train_series_names_in_=Noneself.X_train_window_features_names_out_=Noneself.X_train_exog_names_out_=Noneself.X_train_features_names_out_=Noneself.in_sample_residuals_=Noneself.in_sample_residuals_by_bin_=Noneself.out_sample_residuals_=Noneself.out_sample_residuals_by_bin_=Noneself.creation_date=pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')self.is_fitted=Falseself.fit_date=Noneself.skforecast_version=__version__self.python_version=sys.version.split(" ")[0]self.forecaster_id=forecaster_idself._probabilistic_mode="binned"self.lags,self.lags_names,self.max_lag=initialize_lags(type(self).__name__,lags)self.window_features,self.window_features_names,self.max_size_window_features=(initialize_window_features(window_features))ifself.window_featuresisNoneandself.lagsisNone:raiseValueError("At least one of the arguments `lags` or `window_features` ""must be different from None. This is required to create the ""predictors used in training the forecaster.")self.window_size=max([wsforwsin[self.max_lag,self.max_size_window_features]ifwsisnotNone])self.window_features_class_names=Noneifwindow_featuresisnotNone:self.window_features_class_names=[type(wf).__name__forwfinself.window_features]ifself.encodingnotin['ordinal','ordinal_category','onehot',None]:raiseValueError(f"Argument `encoding` must be one of the following values: 'ordinal', "f"'ordinal_category', 'onehot' or None. Got '{self.encoding}'.")ifself.encoding=='onehot':self.encoder=OneHotEncoder(categories='auto',sparse_output=False,drop=None,dtype=int).set_output(transform='pandas')else:self.encoder=OrdinalEncoder(categories='auto',dtype=int).set_output(transform='pandas')scaling_estimators=tuple(member[1]formemberininspect.getmembers(sklearn.linear_model,inspect.isclass)+inspect.getmembers(sklearn.svm,inspect.isclass))ifself.transformer_seriesisNoneandisinstance(estimator,scaling_estimators):warnings.warn("When using a linear model, it is recommended to use a transformer_series ""to ensure all series are in the same scale. You can use, for example, a ""`StandardScaler` from sklearn.preprocessing.",DataTransformationWarning)ifisinstance(self.transformer_series,dict):ifself.encodingisNone:raiseTypeError("When `encoding` is None, `transformer_series` must be a single ""transformer (not `dict`) as it is applied to all series.")if'_unknown_level'notinself.transformer_series.keys():raiseValueError("If `transformer_series` is a `dict`, a transformer must be ""provided to transform series that do not exist during training. ""Add the key '_unknown_level' to `transformer_series`. ""For example: {'_unknown_level': your_transformer}.")self.weight_func,self.source_code_weight_func,self.series_weights=(initialize_weights(forecaster_name=type(self).__name__,estimator=estimator,weight_func=weight_func,series_weights=series_weights,))ifdifferentiationisnotNone:ifisinstance(differentiation,int):ifdifferentiation<1:raiseValueError(f"If `differentiation` is an integer, it must be equal "f"to or greater than 1. Got {differentiation}.")self.differentiation=differentiationself.differentiation_max=differentiationself.window_size+=self.differentiation_maxself.differentiator=TimeSeriesDifferentiator(order=differentiation,window_size=self.window_size)elifisinstance(differentiation,dict):ifself.encodingisNone:raiseTypeError("When `encoding` is None, `differentiation` must be an ""integer equal to or greater than 1. Same differentiation ""must be applied to all series.")if'_unknown_level'notindifferentiation.keys():raiseValueError("If `differentiation` is a `dict`, an order must be provided ""to differentiate series that do not exist during training. ""Add the key '_unknown_level' to `differentiation`. ""For example: {'_unknown_level': 1}.")differentiation_max=[]forlevel,diffindifferentiation.items():ifdiffisnotNone:ifnotisinstance(diff,int)ordiff<1:raiseValueError(f"If `differentiation` is a dict, the values must be "f"None or integers equal to or greater than 1. "f"Got {diff} for series '{level}'.")differentiation_max.append(diff)iflen(differentiation_max)==0:raiseValueError("If `differentiation` is a dict, at least one value must be ""different from None. Got all values equal to None. If you ""do not want to differentiate any series, set `differentiation` ""to None.")self.differentiation=differentiationself.differentiation_max=max(differentiation_max)self.window_size+=self.differentiation_maxself.differentiator={level:(TimeSeriesDifferentiator(order=diff,window_size=self.window_size)ifdiffisnotNoneelseNone)forlevel,diffindifferentiation.items()}else:raiseTypeError(f"When including `differentiation`, this argument must be "f"an integer (equal to or greater than 1) or a dict of "f"integers. Got {type(differentiation)}.")self.fit_kwargs=check_select_fit_kwargs(estimator=estimator,fit_kwargs=fit_kwargs)self.binner={}self.binner_intervals_={}self.binner_kwargs=binner_kwargsifbinner_kwargsisNone:self.binner_kwargs={'n_bins':10,'method':'linear','subsample':200000,'random_state':789654,'dtype':np.float64}self.__skforecast_tags__={"library":"skforecast","forecaster_name":"ForecasterRecursiveMultiSeries","forecaster_task":"regression","forecasting_scope":"global",# single-series | global"forecasting_strategy":"recursive",# recursive | direct | deep_learning"index_types_supported":["pandas.RangeIndex","pandas.DatetimeIndex"],"requires_index_frequency":True,"allowed_input_types_series":["pandas.DataFrame","dict"],"supports_exog":True,"allowed_input_types_exog":["pandas.Series","pandas.DataFrame","dict"],"handles_missing_values_series":True,"handles_missing_values_exog":True,"supports_lags":True,"supports_window_features":True,"allowed_encoding":["ordinal","ordinal_category","onehot",None],"supports_transformer_series":True,"supports_transformer_exog":True,"supports_weight_func":True,"supports_series_weights":True,"supports_differentiation":True,"prediction_types":["point","interval","bootstrapping","quantiles","distribution"],"supports_probabilistic":True,"probabilistic_methods":["bootstrapping","conformal"],"handles_binned_residuals":True}
def_create_lags(self,y:np.ndarray,X_as_pandas:bool=False,train_index:pd.Index|None=None)->tuple[np.ndarray|pd.DataFrame|None,np.ndarray]:""" Create the lagged values and their target variable from a time series. Note that the returned matrix `X_data` contains the lag 1 in the first column, the lag 2 in the in the second column and so on. Returned matrices are views into the original `y` so care must be taken when modifying them. Parameters ---------- y : numpy ndarray Training time series values. X_as_pandas : bool, default False If `True`, the returned matrix `X_data` is a pandas DataFrame. train_index : pandas Index, default None Index of the training data. It is used to create the pandas DataFrame `X_data` when `X_as_pandas` is `True`. Returns ------- X_data : numpy ndarray, pandas DataFrame, None Lagged values (predictors). y_data : numpy ndarray Values of the time series related to each row of `X_data`. Notes ----- Returned matrices are views into the original `y` so care must be taken when modifying them. """X_data=Noneifself.lagsisnotNone:y_strided=np.lib.stride_tricks.sliding_window_view(y,self.window_size)[:-1]X_data=y_strided[:,self.window_size-self.lags]ifX_as_pandas:X_data=pd.DataFrame(data=X_data,columns=self.lags_names,index=train_index)y_data=y[self.window_size:]returnX_data,y_data
def_create_window_features(self,y:pd.Series,train_index:pd.Index,X_as_pandas:bool=False,)->tuple[list[np.ndarray|pd.DataFrame],list[str]]:""" Create window features from a time series. Parameters ---------- y : pandas Series Training time series. train_index : pandas Index Index of the training data. It is used to create the pandas DataFrame `X_train_window_features` when `X_as_pandas` is `True`. X_as_pandas : bool, default False If `True`, the returned matrix `X_train_window_features` is a pandas DataFrame. Returns ------- X_train_window_features : list List of numpy ndarrays or pandas DataFrames with the window features. X_train_window_features_names_out_ : list Names of the window features. """len_train_index=len(train_index)X_train_window_features=[]X_train_window_features_names_out_=[]forwfinself.window_features:X_train_wf=wf.transform_batch(y)ifnotisinstance(X_train_wf,pd.DataFrame):raiseTypeError(f"The method `transform_batch` of {type(wf).__name__} "f"must return a pandas DataFrame.")X_train_wf=X_train_wf.iloc[-len_train_index:]ifnotlen(X_train_wf)==len_train_index:raiseValueError(f"The method `transform_batch` of {type(wf).__name__} "f"must return a DataFrame with the same number of rows as "f"the input time series - `window_size`: {len_train_index}.")ifnot(X_train_wf.index==train_index).all():raiseValueError(f"The method `transform_batch` of {type(wf).__name__} "f"must return a DataFrame with the same index as "f"the input time series - `window_size`.")X_train_window_features_names_out_.extend(X_train_wf.columns)ifnotX_as_pandas:X_train_wf=X_train_wf.to_numpy()X_train_window_features.append(X_train_wf)returnX_train_window_features,X_train_window_features_names_out_
def_create_train_X_y_single_series(self,y:pd.Series,ignore_exog:bool,exog:pd.DataFrame|None=None)->tuple[pd.DataFrame,list[str],pd.DataFrame,pd.Series]:""" Create training matrices from univariate time series and exogenous variables. This method does not transform the exog variables. Parameters ---------- y : pandas Series Training time series. ignore_exog : bool If `True`, `exog` is ignored. exog : pandas DataFrame, default None Exogenous variable/s included as predictor/s. Returns ------- X_train_lags : pandas DataFrame Training values of lags. Shape: (len(y) - self.max_lag, len(self.lags)) X_train_window_features_names_out_ : list Names of the window features. X_train_exog : pandas DataFrame Training values of exogenous variables. Shape: (len(y) - self.max_lag, len(exog.columns)) y_train : pandas Series Values (target) of the time series related to each row of `X_train`. Shape: (len(y) - self.max_lag, ) """series_name=y.nameiflen(y)<=self.window_size:raiseValueError(f"Length of '{series_name}' must be greater than the maximum window size "f"needed by the forecaster.\n"f" Length '{series_name}': {len(y)}.\n"f" Max window size: {self.window_size}.\n"f" Lags window size: {self.max_lag}.\n"f" Window features window size: {self.max_size_window_features}.")ifself.encodingisNone:fit_transformer=Falsetransformer_series=self.transformer_series_['_unknown_level']else:fit_transformer=Falseifself.is_fittedelseTruetransformer_series=self.transformer_series_[series_name]y_values=y.to_numpy()y_index=y.indexy_values=transform_numpy(array=y_values,transformer=transformer_series,fit=fit_transformer,inverse_transform=False)ifself.differentiator_[series_name]isnotNone:ifnotself.is_fitted:y_values=self.differentiator_[series_name].fit_transform(y_values)else:differentiator=copy(self.differentiator_[series_name])y_values=differentiator.fit_transform(y_values)X_train_autoreg=[]train_index=y_index[self.window_size:]X_train_lags,y_train=self._create_lags(y=y_values,X_as_pandas=True,train_index=train_index)ifX_train_lagsisnotNone:X_train_autoreg.append(X_train_lags)X_train_window_features_names_out_=Noneifself.window_featuresisnotNone:# NOTE: The first `self.differentiation_max` positions of `y_values`# must be removed to match the length of `y_train` after creating# the window features. This is because `y_train` is created using the # global window size of the Forecaster, which includes the maximum # differentiation (self.differentiation_max).n_diff=0ifself.differentiationisNoneelseself.differentiation_maxy_window_features=pd.Series(y_values[n_diff:],index=y_index[n_diff:])X_train_window_features,X_train_window_features_names_out_=(self._create_window_features(y=y_window_features,X_as_pandas=True,train_index=train_index))X_train_autoreg.extend(X_train_window_features)iflen(X_train_autoreg)==1:X_train_autoreg=X_train_autoreg[0]else:X_train_autoreg=pd.concat(X_train_autoreg,axis=1)X_train_autoreg['_level_skforecast']=series_nameifignore_exog:X_train_exog=Noneelse:ifexogisnotNone:# The first `self.window_size` positions have to be removed from exog# since they are not in X_train_autoreg.X_train_exog=exog.iloc[self.window_size:,]else:# NOTE: This is faster than creating a pandas Series without values.X_train_exog=pd.Series(data=np.nan,index=train_index,name='_dummy_exog_col_to_keep_shape')y_train=pd.Series(data=y_train,index=train_index,name='y')returnX_train_autoreg,X_train_window_features_names_out_,X_train_exog,y_train
Create training matrices from multiple time series and exogenous
variables. See Notes section for more details depending on the type of
series and exog.
Names of the series (levels) included in the matrix X_train created
internally for training. It can be different from series_names_in_ if
some series are dropped during the training process because of NaNs or
because they are not present in the training period.
Names of the exogenous variables included in the matrix X_train created
internally for training. It can be different from exog_names_in_ if
some exogenous variables are transformed during the training process.
Type of each exogenous variable/s used in training before the transformation
applied by transformer_exog. If transformer_exog is not used, it
is equal to exog_dtypes_out_.
Type of each exogenous variable/s used in training after the transformation
applied by transformer_exog. If transformer_exog is not used, it
is equal to exog_dtypes_in_.
Last window of training data for each series. It stores the values
needed to predict the next step immediately after the training data.
Notes
If series is a wide-format pandas DataFrame, each column represents a
different time series, and the index must be either a DatetimeIndex or
a RangeIndex with frequency or step size, as appropriate
If series is a long-format pandas DataFrame with a MultiIndex, the
first level of the index must contain the series IDs, and the second
level must be a DatetimeIndex with the same frequency across all series.
If series is a dictionary, each key must be a series ID, and each value
must be a named pandas Series. All series must have the same index, which
must be either a DatetimeIndex or a RangeIndex, and they must share the
same frequency or step size, as appropriate.
If exog is a wide-format pandas DataFrame, it must share the same
index type as series. Each column represents a different exogenous variable,
and the same values are applied to all time series.
If exog is a long-format pandas Series or DataFrame with a MultiIndex,
the first level contains the series IDs to which it belongs, and the second
level must be a pandas DatetimeIndex. Each exogenous variable must be
represented as a separate column.
If exog is a dictionary, each key must correspond to a series ID, and
each value must be either a named pandas Series or DataFrame with the
same index type as series, or None. It is not required for all series
to contain all exogenous variables, but data types must be consistent
across series for each variable.
Source code in skforecast/recursive/_forecaster_recursive_multiseries.py
def_create_train_X_y(self,series:pd.DataFrame|dict[str,pd.Series|pd.DataFrame],exog:pd.Series|pd.DataFrame|dict[str,pd.Series|pd.DataFrame]|None=None,store_last_window:bool|list[str]=True,)->tuple[pd.DataFrame,pd.Series,dict[str,pd.Index],list[str],list[str],list[str],list[str],list[str],dict[str,type],dict[str,type],dict[str,pd.Series],]:""" Create training matrices from multiple time series and exogenous variables. See Notes section for more details depending on the type of `series` and `exog`. Parameters ---------- series : pandas DataFrame, dict Training time series. exog : pandas Series, pandas DataFrame, dict, default None Exogenous variable/s included as predictor/s. store_last_window : bool, list, default True Whether or not to store the last window (`last_window_`) of training data. - If `True`, last window is stored for all series. - If `list`, last window is stored for the series present in the list. - If `False`, last window is not stored. Returns ------- X_train : pandas DataFrame Training values (predictors). y_train : pandas Series Values of the time series related to each row of `X_train`. series_indexes : dict Dictionary with the index of each series. series_names_in_ : list Names of the series (levels) provided by the user during training. X_train_series_names_in_ : list Names of the series (levels) included in the matrix `X_train` created internally for training. It can be different from `series_names_in_` if some series are dropped during the training process because of NaNs or because they are not present in the training period. exog_names_in_ : list Names of the exogenous variables used during training. X_train_window_features_names_out_ : list Names of the window features included in the matrix `X_train` created internally for training. X_train_exog_names_out_ : list Names of the exogenous variables included in the matrix `X_train` created internally for training. It can be different from `exog_names_in_` if some exogenous variables are transformed during the training process. exog_dtypes_in_ : dict Type of each exogenous variable/s used in training before the transformation applied by `transformer_exog`. If `transformer_exog` is not used, it is equal to `exog_dtypes_out_`. exog_dtypes_out_ : dict Type of each exogenous variable/s used in training after the transformation applied by `transformer_exog`. If `transformer_exog` is not used, it is equal to `exog_dtypes_in_`. last_window_ : dict Last window of training data for each series. It stores the values needed to predict the next `step` immediately after the training data. Notes ----- - If `series` is a wide-format pandas DataFrame, each column represents a different time series, and the index must be either a `DatetimeIndex` or a `RangeIndex` with frequency or step size, as appropriate - If `series` is a long-format pandas DataFrame with a MultiIndex, the first level of the index must contain the series IDs, and the second level must be a `DatetimeIndex` with the same frequency across all series. - If series is a dictionary, each key must be a series ID, and each value must be a named pandas Series. All series must have the same index, which must be either a `DatetimeIndex` or a `RangeIndex`, and they must share the same frequency or step size, as appropriate. - If `exog` is a wide-format pandas DataFrame, it must share the same index type as series. Each column represents a different exogenous variable, and the same values are applied to all time series. - If `exog` is a long-format pandas Series or DataFrame with a MultiIndex, the first level contains the series IDs to which it belongs, and the second level must be a pandas `DatetimeIndex`. Each exogenous variable must be represented as a separate column. - If `exog` is a dictionary, each key must correspond to a series ID, and each value must be either a named pandas `Series` or `DataFrame` with the same index type as `series`, or `None`. It is not required for all series to contain all exogenous variables, but data types must be consistent across series for each variable. """series_dict,series_indexes=check_preprocess_series(series=series)series_names_in_=list(series_dict.keys())ifself.is_fittedandnotset(series_names_in_).issubset(set(self.series_names_in_)):raiseValueError(f"Once the Forecaster has been trained, `series` must contain "f"the same series names as those used during training:\n"f" Got : {series_names_in_}\n"f" Expected : {self.series_names_in_}")exog_dict={serie:Noneforserieinseries_names_in_}exog_names_in_=NoneifexogisnotNone:exog_dict,exog_names_in_=check_preprocess_exog_multiseries(series_names_in_=series_names_in_,series_index_type=type(series_indexes[series_names_in_[0]]),exog=exog,exog_dict=exog_dict)ifself.is_fitted:ifself.exog_names_in_isNone:raiseValueError("Once the Forecaster has been trained, `exog` must be `None` ""because no exogenous variables were added during training.")else:ifnotset(exog_names_in_)==set(self.exog_names_in_):raiseValueError(f"Once the Forecaster has been trained, `exog` must contain "f"the same exogenous variables as those used during training:\n"f" Got : {exog_names_in_}\n"f" Expected : {self.exog_names_in_}")ifnotself.is_fitted:self.transformer_series_=initialize_transformer_series(forecaster_name=type(self).__name__,series_names_in_=series_names_in_,encoding=self.encoding,transformer_series=self.transformer_series)self.differentiator_=initialize_differentiator_multiseries(series_names_in_=series_names_in_,differentiator=self.differentiator)series_dict,exog_dict=align_series_and_exog_multiseries(series_dict=series_dict,exog_dict=exog_dict)ifnotself.is_fittedandself.transformer_series_['_unknown_level']isnotNone:self.transformer_series_['_unknown_level'].fit(np.concatenate(list(series_dict.values())).reshape(-1,1))ignore_exog=TrueifexogisNoneelseFalseinput_matrices=[[series_dict[k],exog_dict[k],ignore_exog]forkinseries_dict.keys()]X_train_autoreg_buffer=[]X_train_exog_buffer=[]y_train_buffer=[]formatricesininput_matrices:(X_train_autoreg,X_train_window_features_names_out_,X_train_exog,y_train)=self._create_train_X_y_single_series(y=matrices[0],exog=matrices[1],ignore_exog=matrices[2],)X_train_autoreg_buffer.append(X_train_autoreg)X_train_exog_buffer.append(X_train_exog)y_train_buffer.append(y_train)X_train=pd.concat(X_train_autoreg_buffer,axis=0,copy=False)y_train=pd.concat(y_train_buffer,axis=0,copy=False)ifself.is_fitted:encoded_values=self.encoder.transform(X_train[['_level_skforecast']])else:encoded_values=self.encoder.fit_transform(X_train[['_level_skforecast']])fori,levelinenumerate(self.encoder.categories_[0]):self.encoding_mapping_[level]=iifself.encoding=='onehot':encoded_values.columns=encoded_values.columns.str.replace('_level_skforecast_','')X_train=pd.concat([X_train.drop(columns='_level_skforecast'),encoded_values],axis=1,copy=False)else:X_train['_level_skforecast']=encoded_valuesifself.encoding=='ordinal_category':X_train['_level_skforecast']=(X_train['_level_skforecast'].astype('category'))delencoded_valuesexog_dtypes_in_=Noneexog_dtypes_out_=NoneX_train_exog_names_out_=NoneifexogisnotNone:X_train_exog=pd.concat(X_train_exog_buffer,axis=0,copy=False)ifisinstance(X_train_exog,pd.Series):warnings.warn(f"No exogenous variables were found in `exog` that match the "f"series IDs provided in `series`. As a result, no exogenous "f"variables are included in the training matrices. Please "f"review the series IDs in `exog` and ensure they match the "f"following IDs: {series_names_in_}. The forecaster will be "f"considered trained without exogenous variables.",MissingExogWarning)else:if'_dummy_exog_col_to_keep_shape'inX_train_exog.columns:X_train_exog=(X_train_exog.drop(columns=['_dummy_exog_col_to_keep_shape']))exog_names_in_=X_train_exog.columns.to_list()exog_dtypes_in_=get_exog_dtypes(exog=X_train_exog)fit_transformer=Falseifself.is_fittedelseTrueX_train_exog=transform_dataframe(df=X_train_exog,transformer=self.transformer_exog,fit=fit_transformer,inverse_transform=False)ifnot(X_train_exog.index==X_train.index).all():raiseValueError("Different index for `series` and `exog` after transformation. ""They must be equal to ensure the correct alignment of values.")check_exog_dtypes(X_train_exog,call_check_exog=False)exog_dtypes_out_=get_exog_dtypes(exog=X_train_exog)X_train_exog_names_out_=X_train_exog.columns.to_list()X_train=pd.concat([X_train,X_train_exog],axis=1,copy=False)ify_train.isna().to_numpy().any():mask=y_train.notna().to_numpy()y_train=y_train.iloc[mask]X_train=X_train.iloc[mask,]warnings.warn("NaNs detected in `y_train`. They have been dropped because the ""target variable cannot have NaN values. Same rows have been ""dropped from `X_train` to maintain alignment. This is caused by ""series with interspersed NaNs.",MissingValuesWarning)ifself.dropna_from_series:ifnp.any(X_train.isnull().to_numpy()):mask=X_train.notna().all(axis=1).to_numpy()X_train=X_train.iloc[mask,]y_train=y_train.iloc[mask]warnings.warn("NaNs detected in `X_train`. They have been dropped. If ""you want to keep them, set `forecaster.dropna_from_series = False`. ""Same rows have been removed from `y_train` to maintain alignment. ""This caused by series with interspersed NaNs.",MissingValuesWarning)else:ifnp.any(X_train.isnull().to_numpy()):warnings.warn("NaNs detected in `X_train`. Some estimators do not allow ""NaN values during training. If you want to drop them, ""set `forecaster.dropna_from_series = True`.",MissingValuesWarning)ifX_train.empty:raiseValueError("All samples have been removed due to NaNs. Set ""`forecaster.dropna_from_series = False` or review `exog` values.")ifself.encoding=='onehot':X_train_series_names_in_=[colforcolinseries_names_in_ifX_train[col].sum()>0]else:unique_levels=X_train['_level_skforecast'].unique()X_train_series_names_in_=[kfork,vinself.encoding_mapping_.items()ifvinunique_levels]# The last time window of training data is stored so that lags needed as# predictors in the first iteration of `predict()` can be calculated.last_window_=Noneifstore_last_window:series_to_store=(X_train_series_names_in_ifstore_last_windowisTrueelsestore_last_window)series_not_in_series_dict=set(series_to_store)-set(X_train_series_names_in_)ifseries_not_in_series_dict:warnings.warn(f"Series {series_not_in_series_dict} are not present in "f"`series`. No last window is stored for them.",IgnoredArgumentWarning)series_to_store=[sforsinseries_to_storeifsnotinseries_not_in_series_dict]ifseries_to_store:last_window_={k:v.iloc[-self.window_size:].copy()fork,vinseries_dict.items()ifkinseries_to_store}return(X_train,y_train,series_indexes,series_names_in_,X_train_series_names_in_,exog_names_in_,X_train_window_features_names_out_,X_train_exog_names_out_,exog_dtypes_in_,exog_dtypes_out_,last_window_)
Create training matrices from multiple time series and exogenous
variables. See Notes section for more details depending on the type of
series and exog.
If True, skforecast warnings will be suppressed during the creation
of the training matrices. See skforecast.exceptions.warn_skforecast_categories
for more information.
False
Returns:
Name
Type
Description
X_train
pandas DataFrame
Training values (predictors).
y_train
pandas Series
Values (target) of the time series related to each row of X_train.
Notes
If series is a wide-format pandas DataFrame, each column represents a
different time series, and the index must be either a DatetimeIndex or
a RangeIndex with frequency or step size, as appropriate
If series is a long-format pandas DataFrame with a MultiIndex, the
first level of the index must contain the series IDs, and the second
level must be a DatetimeIndex with the same frequency across all series.
If series is a dictionary, each key must be a series ID, and each value
must be a named pandas Series. All series must have the same index, which
must be either a DatetimeIndex or a RangeIndex, and they must share the
same frequency or step size, as appropriate.
If exog is a wide-format pandas DataFrame, it must share the same
index type as series. Each column represents a different exogenous variable,
and the same values are applied to all time series.
If exog is a long-format pandas Series or DataFrame with a MultiIndex,
the first level contains the series IDs to which it belongs, and the second
level must be a pandas DatetimeIndex. Each exogenous variable must be
represented as a separate column.
If exog is a dictionary, each key must correspond to a series ID, and
each value must be either a named pandas Series or DataFrame with the
same index type as series, or None. It is not required for all series
to contain all exogenous variables, but data types must be consistent
across series for each variable.
Source code in skforecast/recursive/_forecaster_recursive_multiseries.py
defcreate_train_X_y(self,series:pd.DataFrame|dict[str,pd.Series|pd.DataFrame],exog:pd.Series|pd.DataFrame|dict[str,pd.Series|pd.DataFrame]|None=None,suppress_warnings:bool=False)->tuple[pd.DataFrame,pd.Series]:""" Create training matrices from multiple time series and exogenous variables. See Notes section for more details depending on the type of `series` and `exog`. Parameters ---------- series : pandas DataFrame, dict Training time series. exog : pandas Series, pandas DataFrame, dict, default None Exogenous variable/s included as predictor/s. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the creation of the training matrices. See skforecast.exceptions.warn_skforecast_categories for more information. Returns ------- X_train : pandas DataFrame Training values (predictors). y_train : pandas Series Values (target) of the time series related to each row of `X_train`. Notes ----- - If `series` is a wide-format pandas DataFrame, each column represents a different time series, and the index must be either a `DatetimeIndex` or a `RangeIndex` with frequency or step size, as appropriate - If `series` is a long-format pandas DataFrame with a MultiIndex, the first level of the index must contain the series IDs, and the second level must be a `DatetimeIndex` with the same frequency across all series. - If series is a dictionary, each key must be a series ID, and each value must be a named pandas Series. All series must have the same index, which must be either a `DatetimeIndex` or a `RangeIndex`, and they must share the same frequency or step size, as appropriate. - If `exog` is a wide-format pandas DataFrame, it must share the same index type as series. Each column represents a different exogenous variable, and the same values are applied to all time series. - If `exog` is a long-format pandas Series or DataFrame with a MultiIndex, the first level contains the series IDs to which it belongs, and the second level must be a pandas `DatetimeIndex`. Each exogenous variable must be represented as a separate column. - If `exog` is a dictionary, each key must correspond to a series ID, and each value must be either a named pandas `Series` or `DataFrame` with the same index type as `series`, or `None`. It is not required for all series to contain all exogenous variables, but data types must be consistent across series for each variable. """set_skforecast_warnings(suppress_warnings,action='ignore')output=self._create_train_X_y(series=series,exog=exog,store_last_window=False)X_train=output[0]y_train=output[1]ifself.encodingisNone:X_train=X_train.drop(columns='_level_skforecast')set_skforecast_warnings(suppress_warnings,action='default')returnX_train,y_train
def_train_test_split_one_step_ahead(self,series:dict[str,pd.Series],initial_train_size:int,exog:dict[str,pd.DataFrame|None]|None=None)->tuple[pd.DataFrame,pd.Series,pd.DataFrame,pd.Series,pd.Series,pd.Series]:""" Create matrices needed to train and test the forecaster for one-step-ahead predictions. Parameters ---------- series : dict Training time series (already checked and preprocessed as a dict). initial_train_size : int Initial size of the training set. It is the number of observations used to train the forecaster before making the first prediction. exog : dict, default None Exogenous variable/s included as predictor/s (already checked and preprocessed as a dict). Returns ------- X_train : pandas DataFrame Predictor values used to train the model. y_train : pandas Series Target values related to each row of `X_train`. X_test : pandas DataFrame Predictor values used to test the model. y_test : pandas Series Target values related to each row of `X_test`. X_train_encoding : pandas Series Series identifiers for each row of `X_train`. X_test_encoding : pandas Series Series identifiers for each row of `X_test`. """# NOTE: `series` and `exog` are assumed to be an already checked dict as# they have gone through the `check_one_step_ahead_input` function.min_index=[]max_index=[]forvinseries.values():idx=v.indexmin_index.append(idx[0])max_index.append(idx[-1])ifisinstance(idx,pd.DatetimeIndex):span_index=pd.date_range(start=min(min_index),end=max(max_index),freq=idx.freq)else:span_index=pd.RangeIndex(start=min(min_index),stop=max(max_index)+1,step=idx.step)fold=[0,[0,initial_train_size],[initial_train_size-self.window_size,initial_train_size],[initial_train_size-self.window_size,len(span_index)],[0,0],# Dummy valueTrue]data_fold=_extract_data_folds_multiseries(series=series,folds=[fold],span_index=span_index,window_size=self.window_size,exog=exog,dropna_last_window=self.dropna_from_series,externally_fitted=False)series_train,_,levels_last_window,exog_train,exog_test,_=next(data_fold)start_test_date=span_index[initial_train_size-self.window_size]series_test={k:v.loc[start_test_date:]fork,vinseries.items()ifkinlevels_last_windowandnotv.emptyandnotv.isna().to_numpy().all()}forecaster_state=(self.is_fitted,self.series_names_in_,self.exog_names_in_)self.is_fitted=FalseX_train,y_train,_,series_names_in_,_,exog_names_in_,*_=(self._create_train_X_y(series=series_train,exog=exog_train,store_last_window=False))self.series_names_in_=series_names_in_ifexogisnotNone:self.exog_names_in_=exog_names_in_self.is_fitted=TrueX_test,y_test,*_=self._create_train_X_y(series=series_test,exog=exog_test,store_last_window=False)self.is_fitted,self.series_names_in_,self.exog_names_in_=forecaster_stateifself.encodingin["ordinal","ordinal_category"]:X_train_encoding=self.encoder.inverse_transform(X_train[["_level_skforecast"]]).ravel()X_test_encoding=self.encoder.inverse_transform(X_test[["_level_skforecast"]]).ravel()elifself.encoding=='onehot':encoding_keys=self.encoding_mapping_.keys()X_train_encoding=self.encoder.inverse_transform(X_train.loc[:,encoding_keys]).ravel()X_test_encoding=self.encoder.inverse_transform(X_test.loc[:,encoding_keys]).ravel()else:X_train_encoding=self.encoder.inverse_transform(X_train[["_level_skforecast"]]).ravel()X_test_encoding=self.encoder.inverse_transform(X_test[["_level_skforecast"]]).ravel()X_train=X_train.drop(columns="_level_skforecast")X_test=X_test.drop(columns="_level_skforecast")X_train_encoding=pd.Series(data=X_train_encoding,index=X_train.index).fillna("_unknown_level")X_test_encoding=pd.Series(data=X_test_encoding,index=X_test.index).fillna("_unknown_level")returnX_train,y_train,X_test,y_test,X_train_encoding,X_test_encoding
def_weight_func_all_1(self,index:pd.Index)->np.ndarray:""" Weight function that assigns a weight of 1 to all observations. Parameters ---------- index : pandas Index Index of the series. Returns ------- weights : numpy ndarray Weights to use in `fit` method. """weights=np.ones(len(index),dtype=float)returnweights
Create weights for each observation according to the forecaster's attributes
series_weights and weight_func. The resulting weights are product of both
types of weights.
defcreate_sample_weights(self,series_names_in_:list,X_train:pd.DataFrame)->np.ndarray:""" Create weights for each observation according to the forecaster's attributes `series_weights` and `weight_func`. The resulting weights are product of both types of weights. Parameters ---------- series_names_in_ : list Names of the series (levels) used during training. X_train : pandas DataFrame Dataframe created with the `create_train_X_y` method, first return. Returns ------- weights : numpy ndarray Weights to use in `fit` method. """weights=Noneweights_samples=Noneseries_weights=Noneifself.series_weightsisnotNone:# Series not present in series_weights have a weight of 1 in all their samples.# Keys in series_weights not present in series are ignored.series_not_in_series_weights=(set(series_names_in_)-set(self.series_weights.keys()))ifseries_not_in_series_weights:warnings.warn(f"{series_not_in_series_weights} not present in `series_weights`. "f"A weight of 1 is given to all their samples.",IgnoredArgumentWarning)self.series_weights_={col:1.forcolinseries_names_in_}self.series_weights_.update({k:vfork,vinself.series_weights.items()ifkinself.series_weights_})ifself.encoding=="onehot":series_weights=[np.repeat(self.series_weights_[serie],sum(X_train[serie]))forserieinseries_names_in_]else:series_weights=[np.repeat(self.series_weights_[serie],sum(X_train["_level_skforecast"]==self.encoding_mapping_[serie]),)forserieinseries_names_in_]series_weights=np.concatenate(series_weights)ifself.weight_funcisnotNone:ifisinstance(self.weight_func,Callable):self.weight_func_={col:copy(self.weight_func)forcolinseries_names_in_}else:# Series not present in weight_func have a weight of 1 in all their samplesseries_not_in_weight_func=(set(series_names_in_)-set(self.weight_func.keys()))ifseries_not_in_weight_func:warnings.warn(f"{series_not_in_weight_func} not present in `weight_func`. "f"A weight of 1 is given to all their samples.",IgnoredArgumentWarning)self.weight_func_={col:self._weight_func_all_1forcolinseries_names_in_}self.weight_func_.update({k:vfork,vinself.weight_func.items()ifkinself.weight_func_})weights_samples=[]forkeyinself.weight_func_.keys():ifself.encoding=="onehot":idx=X_train.index[X_train[key]==1.0]else:idx=X_train.index[X_train["_level_skforecast"]==self.encoding_mapping_[key]]weights_samples.append(self.weight_func_[key](idx))weights_samples=np.concatenate(weights_samples)ifseries_weightsisnotNone:weights=series_weightsifweights_samplesisnotNone:weights=weights*weights_sampleselse:ifweights_samplesisnotNone:weights=weights_samplesifweightsisnotNone:ifnp.isnan(weights).any():raiseValueError("The resulting `weights` cannot have NaN values.")ifnp.any(weights<0):raiseValueError("The resulting `weights` cannot have negative values.")ifnp.sum(weights)==0:raiseValueError("The resulting `weights` cannot be normalized because ""the sum of the weights is zero.")returnweights
If True, in-sample residuals will be stored in the forecaster object
after fitting (in_sample_residuals_ and in_sample_residuals_by_bin_
attributes).
If False, only the intervals of the bins are stored.
If True, skforecast warnings will be suppressed during the training
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
Returns:
Type
Description
None
Notes
If series is a wide-format pandas DataFrame, each column represents a
different time series, and the index must be either a DatetimeIndex or
a RangeIndex with frequency or step size, as appropriate
If series is a long-format pandas DataFrame with a MultiIndex, the
first level of the index must contain the series IDs, and the second
level must be a DatetimeIndex with the same frequency across all series.
If series is a dictionary, each key must be a series ID, and each value
must be a named pandas Series. All series must have the same index, which
must be either a DatetimeIndex or a RangeIndex, and they must share the
same frequency or step size, as appropriate.
If exog is a wide-format pandas DataFrame, it must share the same
index type as series. Each column represents a different exogenous variable,
and the same values are applied to all time series.
If exog is a long-format pandas Series or DataFrame with a MultiIndex,
the first level contains the series IDs to which it belongs, and the second
level must be a pandas DatetimeIndex. Each exogenous variable must be
represented as a separate column.
If exog is a dictionary, each key must correspond to a series ID, and
each value must be either a named pandas Series or DataFrame with the
same index type as series, or None. It is not required for all series
to contain all exogenous variables, but data types must be consistent
across series for each variable.
Source code in skforecast/recursive/_forecaster_recursive_multiseries.py
deffit(self,series:pd.DataFrame|dict[str,pd.Series|pd.DataFrame],exog:pd.Series|pd.DataFrame|dict[str,pd.Series|pd.DataFrame]|None=None,store_last_window:bool|list[str]=True,store_in_sample_residuals:bool=False,random_state:int=123,suppress_warnings:bool=False)->None:""" Training Forecaster. See Notes section for more details depending on the type of `series` and `exog`. Additional arguments to be passed to the `fit` method of the estimator can be added with the `fit_kwargs` argument when initializing the forecaster. Parameters ---------- series : pandas DataFrame, dict Training time series. exog : pandas Series, pandas DataFrame, dict, default None Exogenous variable/s included as predictor/s. store_last_window : bool, list, default True Whether or not to store the last window (`last_window_`) of training data. - If `True`, last window is stored for all series. - If `list`, last window is stored for the series present in the list. - If `False`, last window is not stored. store_in_sample_residuals : bool, default False If `True`, in-sample residuals will be stored in the forecaster object after fitting (`in_sample_residuals_` and `in_sample_residuals_by_bin_` attributes). If `False`, only the intervals of the bins are stored. random_state : int, default 123 Set a seed for the random generator so that the stored sample residuals are always deterministic. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the training process. See skforecast.exceptions.warn_skforecast_categories for more information. Returns ------- None Notes ----- - If `series` is a wide-format pandas DataFrame, each column represents a different time series, and the index must be either a `DatetimeIndex` or a `RangeIndex` with frequency or step size, as appropriate - If `series` is a long-format pandas DataFrame with a MultiIndex, the first level of the index must contain the series IDs, and the second level must be a `DatetimeIndex` with the same frequency across all series. - If series is a dictionary, each key must be a series ID, and each value must be a named pandas Series. All series must have the same index, which must be either a `DatetimeIndex` or a `RangeIndex`, and they must share the same frequency or step size, as appropriate. - If `exog` is a wide-format pandas DataFrame, it must share the same index type as series. Each column represents a different exogenous variable, and the same values are applied to all time series. - If `exog` is a long-format pandas Series or DataFrame with a MultiIndex, the first level contains the series IDs to which it belongs, and the second level must be a pandas `DatetimeIndex`. Each exogenous variable must be represented as a separate column. - If `exog` is a dictionary, each key must correspond to a series ID, and each value must be either a named pandas `Series` or `DataFrame` with the same index type as `series`, or `None`. It is not required for all series to contain all exogenous variables, but data types must be consistent across series for each variable. """set_skforecast_warnings(suppress_warnings,action='ignore')# TODO: create a method reset_forecaster() to reset all attributes# Reset values in case the forecaster has already been fitted.self.last_window_=Noneself.index_type_=Noneself.index_freq_=Noneself.training_range_=Noneself.series_names_in_=Noneself.exog_in_=Falseself.exog_names_in_=Noneself.exog_type_in_=Noneself.exog_dtypes_in_=Noneself.exog_dtypes_out_=Noneself.X_train_series_names_in_=Noneself.X_train_window_features_names_out_=Noneself.X_train_exog_names_out_=Noneself.X_train_features_names_out_=Noneself.encoding_mapping_={}self.in_sample_residuals_=Noneself.in_sample_residuals_by_bin_=Noneself.binner={}self.binner_intervals_={}self.is_fitted=Falseself.fit_date=None(X_train,y_train,series_indexes,series_names_in_,X_train_series_names_in_,exog_names_in_,X_train_window_features_names_out_,X_train_exog_names_out_,exog_dtypes_in_,exog_dtypes_out_,last_window_)=self._create_train_X_y(series=series,exog=exog,store_last_window=store_last_window)sample_weight=self.create_sample_weights(series_names_in_=series_names_in_,X_train=X_train)X_train_estimator=(X_trainifself.encodingisnotNoneelseX_train.drop(columns="_level_skforecast"))ifsample_weightisnotNone:self.estimator.fit(X=X_train_estimator,y=y_train,sample_weight=sample_weight,**self.fit_kwargs)else:self.estimator.fit(X=X_train_estimator,y=y_train,**self.fit_kwargs)self.series_names_in_=series_names_in_self.X_train_series_names_in_=X_train_series_names_in_self.X_train_window_features_names_out_=X_train_window_features_names_out_self.X_train_features_names_out_=X_train_estimator.columns.to_list()self.is_fitted=Trueself.fit_date=pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')self.training_range_={k:v[[0,-1]]fork,vinseries_indexes.items()}self.index_type_=type(series_indexes[series_names_in_[0]])ifisinstance(series_indexes[series_names_in_[0]],pd.DatetimeIndex):self.index_freq_=series_indexes[series_names_in_[0]].freqelse:self.index_freq_=series_indexes[series_names_in_[0]].step# NOTE: When `exog` doesn't match series IDs, exogs are not included in the# training matrices, X_train_exog_names_out_ is None and the forecaster # will be considered trained without exogenous variables.ifexogisnotNoneandX_train_exog_names_out_isnotNone:self.exog_in_=Trueself.exog_names_in_=exog_names_in_self.exog_type_in_=type(exog)self.exog_dtypes_in_=exog_dtypes_in_self.exog_dtypes_out_=exog_dtypes_out_self.X_train_exog_names_out_=X_train_exog_names_out_self.in_sample_residuals_={}self.in_sample_residuals_by_bin_={}ifself._probabilistic_modeisnotFalse:y_train=y_train.to_numpy()y_pred=self.estimator.predict(X_train_estimator)ifself.encodingisnotNone:forlevelinX_train_series_names_in_:ifself.encoding=='onehot':mask=X_train[level].to_numpy()==1.else:encoded_value=self.encoding_mapping_[level]mask=X_train['_level_skforecast'].to_numpy()==encoded_valueself._binning_in_sample_residuals(level=level,y_true=y_train[mask],y_pred=y_pred[mask],store_in_sample_residuals=store_in_sample_residuals,random_state=random_state)# NOTE: the _unknown_level is a random sample of 10_000 residuals of all levels.self._binning_in_sample_residuals(level='_unknown_level',y_true=y_train,y_pred=y_pred,store_in_sample_residuals=store_in_sample_residuals,random_state=random_state)ifnotstore_in_sample_residuals:# NOTE: create empty dictionaries to avoid errors when calling predict()ifself.encodingisnotNone:forlevelinX_train_series_names_in_:self.in_sample_residuals_[level]=Noneself.in_sample_residuals_by_bin_[level]=Noneself.in_sample_residuals_['_unknown_level']=Noneself.in_sample_residuals_by_bin_['_unknown_level']=Noneifstore_last_window:self.last_window_=last_window_set_skforecast_warnings(suppress_warnings,action='default')
Bin residuals according to the predicted value each residual is
associated with. First a skforecast.preprocessing.QuantileBinner object
is fitted to the predicted values. Then, residuals are binned according
to the predicted value each residual is associated with. Residuals are
stored in the forecaster object as in_sample_residuals_ and
in_sample_residuals_by_bin_.
y_true and y_pred assumed to be differentiated and/or transformed
according to the attributes differentiation and transformer_series.
The number of residuals stored per bin is limited to
10_000 // self.binner.n_bins_. The total number of residuals stored is
10_000.
New in version 0.15.0
If True, in-sample residuals will be stored in the forecaster object
after fitting (in_sample_residuals_ and in_sample_residuals_by_bin_
attributes).
If False, only the intervals of the bins are stored.
def_binning_in_sample_residuals(self,level:str,y_true:np.ndarray,y_pred:np.ndarray,store_in_sample_residuals:bool=False,random_state:int=123)->None:""" Bin residuals according to the predicted value each residual is associated with. First a `skforecast.preprocessing.QuantileBinner` object is fitted to the predicted values. Then, residuals are binned according to the predicted value each residual is associated with. Residuals are stored in the forecaster object as `in_sample_residuals_` and `in_sample_residuals_by_bin_`. `y_true` and `y_pred` assumed to be differentiated and/or transformed according to the attributes `differentiation` and `transformer_series`. The number of residuals stored per bin is limited to `10_000 // self.binner.n_bins_`. The total number of residuals stored is `10_000`. **New in version 0.15.0** Parameters ---------- level : str Name of the series (level) to store the residuals. y_true : numpy ndarray True values of the time series. y_pred : numpy ndarray Predicted values of the time series. store_in_sample_residuals : bool, default False If `True`, in-sample residuals will be stored in the forecaster object after fitting (`in_sample_residuals_` and `in_sample_residuals_by_bin_` attributes). If `False`, only the intervals of the bins are stored. random_state : int, default 123 Set a seed for the random generator so that the stored sample residuals are always deterministic. Returns ------- None """y_true=np.asarray(y_true)y_pred=np.asarray(y_pred)residuals=y_true-y_predifself._probabilistic_mode=="binned":data=pd.DataFrame({'prediction':y_pred,'residuals':residuals})self.binner[level]=QuantileBinner(**self.binner_kwargs)self.binner[level].fit(y_pred)self.binner_intervals_[level]=self.binner[level].intervals_ifstore_in_sample_residuals:rng=np.random.default_rng(seed=random_state)ifself._probabilistic_mode=="binned":data['bin']=self.binner[level].transform(y_pred).astype(int)self.in_sample_residuals_by_bin_[level]=(data.groupby('bin')['residuals'].apply(np.array).to_dict())max_sample=10_000//self.binner[level].n_bins_fork,vinself.in_sample_residuals_by_bin_[level].items():iflen(v)>max_sample:sample=v[rng.integers(low=0,high=len(v),size=max_sample)]self.in_sample_residuals_by_bin_[level][k]=sampleiflen(residuals)>10_000:residuals=residuals[rng.integers(low=0,high=len(residuals),size=10_000)]self.in_sample_residuals_[level]=residuals
Create the inputs needed for the first iteration of the prediction
process. As this is a recursive process, the last window is updated at
each iteration of the prediction process.
Time series to be predicted. If None all levels whose last window
ends at the same datetime index will be predicted together.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed in the
first iteration of the prediction (t + 1).
If last_window = None, the values stored in self.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
If True, the input is checked for possible warnings and errors
with the check_predict_input function. This argument is created
for internal use and is not recommended to be changed.
True
Returns:
Name
Type
Description
last_window
pandas DataFrame
Series values used to create the predictors needed in the first
iteration of the prediction (t + 1).
Exogenous variable/s included as predictor/s for each series in
each step. The keys are the steps and the values are numpy arrays
where each column is an exog and each row a series (level).
def_create_predict_inputs(self,steps:int,levels:str|list[str]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|dict[str,pd.Series|pd.DataFrame]|None=None,predict_probabilistic:bool=False,use_in_sample_residuals:bool=True,use_binned_residuals:bool=True,check_inputs:bool=True)->tuple[pd.DataFrame,dict[str,np.ndarray]|None,list[str],pd.Index]:""" Create the inputs needed for the first iteration of the prediction process. As this is a recursive process, the last window is updated at each iteration of the prediction process. Parameters ---------- steps : int Number of steps to predict. levels : str, list, default None Time series to be predicted. If `None` all levels whose last window ends at the same datetime index will be predicted together. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. predict_probabilistic : bool, default False If `True`, the necessary checks for probabilistic predictions will be performed. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. use_binned_residuals : bool, default True If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly. check_inputs : bool, default True If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed. Returns ------- last_window : pandas DataFrame Series values used to create the predictors needed in the first iteration of the prediction (t + 1). exog_values_dict : dict, None Exogenous variable/s included as predictor/s for each series in each step. The keys are the steps and the values are numpy arrays where each column is an exog and each row a series (level). levels : list Names of the series (levels) to be predicted. prediction_index : pandas Index Index of the predictions. """input_levels_is_None=TrueiflevelsisNoneelseFalselevels,input_levels_is_list=prepare_levels_multiseries(X_train_series_names_in_=self.X_train_series_names_in_,levels=levels)ifself.is_fitted:iflast_windowisNone:levels,last_window=preprocess_levels_self_last_window_multiseries(levels=levels,input_levels_is_list=input_levels_is_list,last_window_=self.last_window_)else:ifisinstance(last_window,pd.DataFrame)andisinstance(last_window.index,pd.MultiIndex):raiseValueError("`last_window` must be a pandas DataFrame with one column ""per series and a single-level index (MultiIndex is not ""supported).")ifinput_levels_is_Noneandisinstance(last_window,pd.DataFrame):levels=last_window.columns.to_list()ifisinstance(exog,(pd.Series,pd.DataFrame))andisinstance(exog.index,pd.MultiIndex):ifnotisinstance(exog.index.levels[1],pd.DatetimeIndex):raiseTypeError(f"When `exog` is a pandas MultiIndex DataFrame, its index "f"must be a pandas DatetimeIndex. If you want to use a pandas "f"RangeIndex, use a dictionary instead. Found `exog` index "f"type: {type(exog.index.levels[1])}.")exog=exog.copy().to_frame()ifisinstance(exog,pd.Series)elseexog.copy()exog={series_id:exog.loc[series_id]forseries_idinexog.index.levels[0]ifseries_idinlevels}ifcheck_inputs:check_predict_input(forecaster_name=type(self).__name__,steps=steps,is_fitted=self.is_fitted,exog_in_=self.exog_in_,index_type_=self.index_type_,index_freq_=self.index_freq_,window_size=self.window_size,last_window=last_window,exog=exog,exog_names_in_=self.exog_names_in_,interval=None,levels=levels,series_names_in_=self.series_names_in_,encoding=self.encoding)ifpredict_probabilistic:check_residuals_input(forecaster_name=type(self).__name__,use_in_sample_residuals=use_in_sample_residuals,in_sample_residuals_=self.in_sample_residuals_,out_sample_residuals_=self.out_sample_residuals_,use_binned_residuals=use_binned_residuals,in_sample_residuals_by_bin_=self.in_sample_residuals_by_bin_,out_sample_residuals_by_bin_=self.out_sample_residuals_by_bin_,levels=levels,encoding=self.encoding)last_window=last_window.iloc[-self.window_size:,last_window.columns.get_indexer(levels)].copy()prediction_index=expand_index(index=last_window.index,steps=steps)ifexogisnotNone:ifisinstance(exog,dict):# Empty dataframe to be filled with the exog values of each levelempty_exog=pd.DataFrame(data={col:pd.Series(dtype=dtype)forcol,dtypeinself.exog_dtypes_in_.items()},index=prediction_index,)else:exog=input_to_frame(data=exog,input_name='exog')ifexog.columns.tolist()!=self.exog_names_in_:exog=exog[self.exog_names_in_]exog=transform_dataframe(df=exog,transformer=self.transformer_exog,fit=False,inverse_transform=False)# NOTE: Only check dtypes if they are not the same as seen in trainingifnotexog.dtypes.to_dict()==self.exog_dtypes_out_:check_exog_dtypes(exog=exog)else:check_exog(exog=exog,allow_nan=False)exog_values=exog.iloc[:steps,:]else:exog_values=None# NOTE: This needs to be done to ensure that the last window dtype is float.last_window_values=last_window.to_numpy()last_window_matrix=np.full(shape=last_window.shape,fill_value=np.nan,order='F',dtype=float)exog_values_all_levels=[]foridx_level,levelinenumerate(levels):last_window_level=last_window_values[:,idx_level]last_window_level=transform_numpy(array=last_window_level,transformer=self.transformer_series_.get(level,self.transformer_series_['_unknown_level']),fit=False,inverse_transform=False)ifself.differentiationisnotNone:iflevelnotinself.differentiator_.keys():self.differentiator_[level]=copy(self.differentiator_['_unknown_level'])ifself.differentiator_[level]isnotNone:last_window_level=(self.differentiator_[level].fit_transform(last_window_level))last_window_matrix[:,idx_level]=last_window_levelifisinstance(exog,dict):# Fill the empty dataframe with the exog values of each level# and transform them if necessaryexog_values=exog.get(level,None)ifexog_valuesisnotNone:ifisinstance(exog_values,pd.Series):exog_values=exog_values.to_frame()exog_values=exog_values.reindex_like(empty_exog)else:exog_values=empty_exog.copy()exog_values_all_levels.append(exog_values)last_window=pd.DataFrame(data=last_window_matrix,columns=levels,index=last_window.index)ifexogisnotNone:# Exog is transformed into a dict where each key is a step and each value# is a numpy array where each column is an exog and each row a seriesexog_values_all_levels=pd.concat(exog_values_all_levels)ifisinstance(exog,dict):exog_values_all_levels=transform_dataframe(df=exog_values_all_levels,transformer=self.transformer_exog,fit=False,inverse_transform=False)# NOTE: Only check dtypes if they are not the same as seen in trainingifnotexog_values_all_levels.dtypes.to_dict()==self.exog_dtypes_out_:check_exog_dtypes(exog=exog_values_all_levels)else:check_exog(exog=exog_values_all_levels,allow_nan=False)exog_values_all_levels=exog_values_all_levels.to_numpy()exog_values_dict={i+1:exog_values_all_levels[i::steps,:]foriinrange(steps)}else:exog_values_dict=Nonereturnlast_window,exog_values_dict,levels,prediction_index
Exogenous variable/s included as predictor/s for each series in
each step. The keys are the steps and the values are numpy arrays
where each column is an exog and each row a series (level).
None
residuals
numpy ndarray
Residuals used to generate bootstrapping predictions in the form
(steps, levels).
def_recursive_predict(self,steps:int,levels:list,last_window:pd.DataFrame,exog_values_dict:dict[str,np.ndarray]|None=None,residuals:np.ndarray|None=None,use_binned_residuals:bool=True)->np.ndarray:""" Predict n steps for one or multiple levels. It is an iterative process in which, each prediction, is used as a predictor for the next step. Parameters ---------- steps : int Number of steps to predict. levels : list Time series to be predicted. last_window : pandas DataFrame Series values used to create the predictors needed in the first iteration of the prediction (t + 1). exog_values_dict : dict, default None Exogenous variable/s included as predictor/s for each series in each step. The keys are the steps and the values are numpy arrays where each column is an exog and each row a series (level). residuals : numpy ndarray, default None Residuals used to generate bootstrapping predictions in the form (steps, levels). use_binned_residuals : bool, default True If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly. **New in version 0.15.0** Returns ------- predictions : numpy ndarray Predicted values. """original_device=set_cpu_gpu_device(estimator=self.estimator,device='cpu')n_levels=len(levels)n_lags=len(self.lags)ifself.lagsisnotNoneelse0n_window_features=(len(self.X_train_window_features_names_out_)ifself.window_featuresisnotNoneelse0)n_autoreg=n_lags+n_window_featuresn_exog=len(self.X_train_exog_names_out_)ifexog_values_dictisnotNoneelse0ifself.encodingisnotNone:ifself.encoding=="onehot":levels_encoded=np.zeros((n_levels,len(self.X_train_series_names_in_)),dtype=float)fori,levelinenumerate(levels):iflevelinself.X_train_series_names_in_:levels_encoded[i,self.X_train_series_names_in_.index(level)]=1.else:levels_encoded=np.array([self.encoding_mapping_.get(level,np.nan)forlevelinlevels],dtype="float64").reshape(-1,1)levels_encoded_shape=levels_encoded.shape[1]else:levels_encoded_shape=0features_shape=n_autoreg+levels_encoded_shape+n_exogfeatures=np.full(shape=(n_levels,features_shape),fill_value=np.nan,order='F',dtype=float)ifself.encodingisnotNone:features[:,n_autoreg:n_autoreg+levels_encoded_shape]=levels_encodedpredictions=np.full(shape=(steps,n_levels),fill_value=np.nan,order='C',dtype=float)last_window=np.concatenate((last_window.to_numpy(),predictions),axis=0)foriinrange(steps):ifself.lagsisnotNone:features[:,:n_lags]=last_window[-self.lags-(steps-i),:].transpose()ifself.window_featuresisnotNone:features[:,n_lags:n_autoreg]=np.concatenate([wf.transform(last_window[i:-(steps-i),:])forwfinself.window_features],axis=1)ifexog_values_dictisnotNone:features[:,-n_exog:]=exog_values_dict[i+1]pred=self.estimator.predict(features)# NOTE: CatBoost makes the input array read-only.ifnotfeatures.flags.writeable:features.flags.writeable=TrueifresidualsisnotNone:ifuse_binned_residuals:step_residual=np.full(shape=n_levels,fill_value=np.nan,dtype=float)forj,levelinenumerate(levels):predicted_bin=(self.binner.get(level,self.binner['_unknown_level']).transform(pred[j]).item())step_residual[j]=residuals[predicted_bin][i,j]else:step_residual=residuals[i,:]pred+=step_residualpredictions[i,:]=pred# Update `last_window` values. The first position is discarded and # the new prediction is added at the end.last_window[-(steps-i),:]=predset_cpu_gpu_device(estimator=self.estimator,device=original_device)returnpredictions
Create the predictors needed to predict steps ahead. As it is a recursive
process, the predictors are created at each iteration of the prediction
process.
Time series to be predicted. If None all levels whose last window
ends at the same datetime index will be predicted together.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed in the
first iteration of the prediction (t + 1).
If last_window = None, the values stored in self.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
If True, the input is checked for possible warnings and errors
with the check_predict_input function. This argument is created
for internal use and is not recommended to be changed.
True
Returns:
Name
Type
Description
X_predict
pandas DataFrame
Long-format DataFrame with the predictors. The columns are level and
one column for each predictor. The index is the same as the prediction
index.
Source code in skforecast/recursive/_forecaster_recursive_multiseries.py
defcreate_predict_X(self,steps:int,levels:str|list[str]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|dict[str,pd.Series|pd.DataFrame]|None=None,suppress_warnings:bool=False,check_inputs:bool=True)->pd.DataFrame:""" Create the predictors needed to predict `steps` ahead. As it is a recursive process, the predictors are created at each iteration of the prediction process. Parameters ---------- steps : int Number of steps to predict. levels : str, list, default None Time series to be predicted. If `None` all levels whose last window ends at the same datetime index will be predicted together. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. check_inputs : bool, default True If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed. Returns ------- X_predict : pandas DataFrame Long-format DataFrame with the predictors. The columns are `level` and one column for each predictor. The index is the same as the prediction index. """set_skforecast_warnings(suppress_warnings,action='ignore')(last_window,exog_values_dict,levels,prediction_index)=self._create_predict_inputs(steps=steps,levels=levels,last_window=last_window,exog=exog,check_inputs=check_inputs)withwarnings.catch_warnings():warnings.filterwarnings("ignore",message="X does not have valid feature names",category=UserWarning)predictions=self._recursive_predict(steps=steps,levels=levels,last_window=last_window,exog_values_dict=exog_values_dict)ifself.lagsisnotNone:idx_lags=np.arange(-steps,0)[:,None]-self.lagslen_X_train_series_names_in_=len(self.X_train_series_names_in_)exog_shape=len(self.X_train_exog_names_out_)ifexogisnotNoneelse0X_predict=[]fori,levelinenumerate(levels):X_predict_level=[]full_predictors_level=np.concatenate((last_window[level].to_numpy(),predictions[:,i]))ifself.lagsisnotNone:X_predict_level.append(full_predictors_level[idx_lags+len(full_predictors_level)])ifself.window_featuresisnotNone:X_window_features=np.full(shape=(steps,len(self.X_train_window_features_names_out_)),fill_value=np.nan,order='C',dtype=float)forjinrange(steps):X_window_features[j,:]=np.concatenate([wf.transform(full_predictors_level[j:-(steps-j)])forwfinself.window_features])X_predict_level.append(X_window_features)ifself.encodingisnotNone:ifself.encoding=='onehot':level_encoded=np.zeros(shape=(1,len_X_train_series_names_in_),dtype=float)level_encoded[0][self.X_train_series_names_in_.index(level)]=1.else:level_encoded=np.array([self.encoding_mapping_.get(level,None)],dtype='float64')level_encoded=np.tile(level_encoded,(steps,1))X_predict_level.append(level_encoded)ifexogisnotNone:exog_cols=np.full(shape=(steps,exog_shape),fill_value=np.nan,order='C',dtype=float)forjinrange(steps):exog_cols[j,:]=exog_values_dict[j+1][i,:]X_predict_level.append(exog_cols)X_predict.append(np.concatenate(X_predict_level,axis=1))X_predict=pd.DataFrame(data=np.concatenate(X_predict),index=np.tile(prediction_index,len(levels)),columns=self.X_train_features_names_out_)X_predict.insert(0,'level',np.repeat(levels,steps))# NOTE: Order needed to have the same structure as the output of predict methods.order_dict={level:ifori,levelinenumerate(levels)}X_predict['order']=X_predict['level'].map(order_dict)X_predict=(X_predict.reset_index().sort_values(by=['index','order']).set_index('index').rename_axis(index=None).drop(columns='order'))ifself.exog_in_:categorical_features=any(notpd.api.types.is_numeric_dtype(dtype)orpd.api.types.is_bool_dtype(dtype)fordtypeinset(self.exog_dtypes_out_.values()))ifcategorical_features:X_predict=X_predict.astype(self.exog_dtypes_out_)ifself.transformer_seriesisnotNoneorself.differentiationisnotNone:warnings.warn("The output matrix is in the transformed scale due to the ""inclusion of transformations or differentiation in the Forecaster. ""As a result, any predictions generated using this matrix will also ""be in the transformed scale. Please refer to the documentation ""for more details: ""https://skforecast.org/latest/user_guides/training-and-prediction-matrices.html",DataTransformationWarning)set_skforecast_warnings(suppress_warnings,action='default')returnX_predict
Predict n steps ahead. It is an recursive process in which, each prediction,
is used as a predictor for the next step. Only levels whose last window
ends at the same datetime index can be predicted together.
Time series to be predicted. If None all levels whose last window
ends at the same datetime index will be predicted together.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed in the
first iteration of the prediction (t + 1).
If last_window = None, the values stored in self.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
If True, the input is checked for possible warnings and errors
with the check_predict_input function. This argument is created
for internal use and is not recommended to be changed.
True
Returns:
Name
Type
Description
predictions
pandas DataFrame
Long-format DataFrame with the predictions. The columns are level
and pred.
Source code in skforecast/recursive/_forecaster_recursive_multiseries.py
defpredict(self,steps:int,levels:str|list[str]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|dict[str,pd.Series|pd.DataFrame]|None=None,suppress_warnings:bool=False,check_inputs:bool=True)->pd.DataFrame:""" Predict n steps ahead. It is an recursive process in which, each prediction, is used as a predictor for the next step. Only levels whose last window ends at the same datetime index can be predicted together. Parameters ---------- steps : int Number of steps to predict. levels : str, list, default None Time series to be predicted. If `None` all levels whose last window ends at the same datetime index will be predicted together. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, dict, default None Exogenous variable/s included as predictor/s. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. check_inputs : bool, default True If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed. Returns ------- predictions : pandas DataFrame Long-format DataFrame with the predictions. The columns are `level` and `pred`. """set_skforecast_warnings(suppress_warnings,action='ignore')(last_window,exog_values_dict,levels,prediction_index)=self._create_predict_inputs(steps=steps,levels=levels,last_window=last_window,exog=exog,check_inputs=check_inputs)withwarnings.catch_warnings():warnings.filterwarnings("ignore",message="X does not have valid feature names",category=UserWarning)predictions=self._recursive_predict(steps=steps,levels=levels,last_window=last_window,exog_values_dict=exog_values_dict)fori,levelinenumerate(levels):ifself.differentiationisnotNoneandself.differentiator_[level]isnotNone:predictions[:,i]=(self.differentiator_[level].inverse_transform_next_window(predictions[:,i]))predictions[:,i]=transform_numpy(array=predictions[:,i],transformer=self.transformer_series_.get(level,self.transformer_series_['_unknown_level']),fit=False,inverse_transform=True)n_steps,n_levels=predictions.shapepredictions=pd.DataFrame({"level":np.tile(levels,n_steps),"pred":predictions.ravel()},index=np.repeat(prediction_index,n_levels),)set_skforecast_warnings(suppress_warnings,action='default')returnpredictions
Generate multiple forecasting predictions using a bootstrapping process.
By sampling from a collection of past observed errors (the residuals),
each iteration of bootstrapping generates a different set of predictions.
Only levels whose last window ends at the same datetime index can be
predicted together. See the References section for more information.
Time series to be predicted. If None all levels whose last window
ends at the same datetime index will be predicted together.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed in the
first iteration of the prediction (t + 1).
If last_window = None, the values stored in self.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
Returns:
Name
Type
Description
boot_predictions
pandas DataFrame
Long-format DataFrame with the bootstrapping predictions. The columns
are level, pred_boot_0, pred_boot_1, ..., pred_boot_n_boot.
defpredict_bootstrapping(self,steps:int,levels:str|list[str]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|dict[str,pd.Series|pd.DataFrame]|None=None,n_boot:int=250,use_in_sample_residuals:bool=True,use_binned_residuals:bool=True,random_state:int=123,suppress_warnings:bool=False)->pd.DataFrame:""" Generate multiple forecasting predictions using a bootstrapping process. By sampling from a collection of past observed errors (the residuals), each iteration of bootstrapping generates a different set of predictions. Only levels whose last window ends at the same datetime index can be predicted together. See the References section for more information. Parameters ---------- steps : int Number of steps to predict. levels : str, list, default None Time series to be predicted. If `None` all levels whose last window ends at the same datetime index will be predicted together. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, dict, default None Exogenous variable/s included as predictor/s. n_boot : int, default 250 Number of bootstrapping iterations to perform when estimating prediction intervals. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. use_binned_residuals : bool, default True If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly. **New in version 0.15.0** random_state : int, default 123 Seed for the random number generator to ensure reproducibility. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. Returns ------- boot_predictions : pandas DataFrame Long-format DataFrame with the bootstrapping predictions. The columns are `level`, `pred_boot_0`, `pred_boot_1`, ..., `pred_boot_n_boot`. References ---------- .. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html """set_skforecast_warnings(suppress_warnings,action='ignore')(last_window,exog_values_dict,levels,prediction_index)=self._create_predict_inputs(steps=steps,levels=levels,last_window=last_window,exog=exog,predict_probabilistic=True,use_in_sample_residuals=use_in_sample_residuals,use_binned_residuals=use_binned_residuals)ifuse_in_sample_residuals:residuals=self.in_sample_residuals_residuals_by_bin=self.in_sample_residuals_by_bin_else:residuals=self.out_sample_residuals_residuals_by_bin=self.out_sample_residuals_by_bin_n_levels=len(levels)rng=np.random.default_rng(seed=random_state)sampled_residuals_grid=np.full(shape=(steps,n_boot,n_levels),fill_value=np.nan,order='F',dtype=float)ifuse_binned_residuals:sampled_residuals={k:sampled_residuals_grid.copy()forkinrange(self.binner_kwargs['n_bins'])}forbininsampled_residuals.keys():fori,levelinenumerate(levels):sampled_residuals[bin][:,:,i]=rng.choice(a=residuals_by_bin.get(level,residuals_by_bin['_unknown_level'])[bin],size=(steps,n_boot),replace=True)else:fori,levelinenumerate(levels):sampled_residuals_grid[:,:,i]=rng.choice(a=residuals.get(level,residuals['_unknown_level']),size=(steps,n_boot),replace=True)sampled_residuals={'all':sampled_residuals_grid}boot_columns=[]boot_predictions=np.full(shape=(steps,n_levels,n_boot),fill_value=np.nan,order='F',dtype=float)withwarnings.catch_warnings():warnings.filterwarnings("ignore",message="X does not have valid feature names",category=UserWarning)foriinrange(n_boot):ifuse_binned_residuals:boot_sampled_residuals={k:v[:,i,:]fork,vinsampled_residuals.items()}else:boot_sampled_residuals=sampled_residuals['all'][:,i,:]boot_columns.append(f"pred_boot_{i}")boot_predictions[:,:,i]=self._recursive_predict(steps=steps,levels=levels,last_window=last_window,exog_values_dict=exog_values_dict,residuals=boot_sampled_residuals,use_binned_residuals=use_binned_residuals,)fori,levelinenumerate(levels):ifself.differentiationisnotNoneandself.differentiator_[level]isnotNone:boot_predictions[:,i,:]=(self.differentiator_[level].inverse_transform_next_window(boot_predictions[:,i,:]))transformer_level=self.transformer_series_.get(level,self.transformer_series_['_unknown_level'])iftransformer_levelisnotNone:boot_predictions[:,i,:]=np.apply_along_axis(func1d=transform_numpy,axis=0,arr=boot_predictions[:,i,:],transformer=transformer_level,fit=False,inverse_transform=True)boot_predictions=pd.DataFrame(data=boot_predictions.reshape(-1,n_boot),index=np.repeat(prediction_index,n_levels),columns=boot_columns)boot_predictions.insert(0,'level',np.tile(levels,steps))set_skforecast_warnings(suppress_warnings,action='default')returnboot_predictions
Time series to be predicted. If None all levels whose last window
ends at the same datetime index will be predicted together.
None
last_window
pandas Series, pandas DataFrame
Series values used to create the predictors (lags) needed in the
first iteration of the prediction (t + 1).
If last_window = None, the values stored inself.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
def_predict_interval_conformal(self,steps:int|str|pd.Timestamp,levels:str|list[str]|None=None,last_window:pd.Series|pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|None=None,nominal_coverage:float=0.95,use_in_sample_residuals:bool=True,use_binned_residuals:bool=True)->pd.DataFrame:""" Generate prediction intervals using the conformal prediction split method [1]_. Parameters ---------- steps : int, str, pandas Timestamp Number of steps to predict. - If steps is int, number of steps to predict. - If str or pandas Datetime, the prediction will be up to that date. levels : str, list, default None Time series to be predicted. If `None` all levels whose last window ends at the same datetime index will be predicted together. last_window : pandas Series, pandas DataFrame, default None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in` self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. nominal_coverage : float, default 0.95 Nominal coverage, also known as expected coverage, of the prediction intervals. Must be between 0 and 1. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. use_binned_residuals : bool, default True If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly. Returns ------- predictions : pandas DataFrame Values predicted by the forecaster and their estimated interval. - pred: predictions. - lower_bound: lower bound of the interval. - upper_bound: upper bound of the interval. References ---------- .. [1] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method """(last_window,exog_values_dict,levels,prediction_index)=self._create_predict_inputs(steps=steps,levels=levels,last_window=last_window,exog=exog,predict_probabilistic=True,use_in_sample_residuals=use_in_sample_residuals,use_binned_residuals=use_binned_residuals)ifuse_in_sample_residuals:residuals=self.in_sample_residuals_residuals_by_bin=self.in_sample_residuals_by_bin_else:residuals=self.out_sample_residuals_residuals_by_bin=self.out_sample_residuals_by_bin_withwarnings.catch_warnings():warnings.filterwarnings("ignore",message="X does not have valid feature names",category=UserWarning)predictions=self._recursive_predict(steps=steps,levels=levels,last_window=last_window,exog_values_dict=exog_values_dict)n_levels=len(levels)correction_factor=np.full(shape=(steps,n_levels),fill_value=np.nan,order='C',dtype=float)ifuse_binned_residuals:fori,levelinenumerate(levels):residuals_level=residuals_by_bin.get(level,residuals_by_bin['_unknown_level'])correction_factor_by_bin={k:np.quantile(np.abs(v),nominal_coverage)fork,vinresiduals_level.items()}replace_func=np.vectorize(lambdax:correction_factor_by_bin[x])predictions_bin=(self.binner.get(level,self.binner['_unknown_level']).transform(predictions[:,i]))correction_factor[:,i]=replace_func(predictions_bin)else:fori,levelinenumerate(levels):correction_factor[:,i]=np.quantile(np.abs(residuals.get(level,residuals['_unknown_level'])),nominal_coverage)lower_bound=predictions-correction_factorupper_bound=predictions+correction_factor# NOTE: Create a 3D array with shape (n_levels, intervals, steps)predictions=np.array([predictions,lower_bound,upper_bound]).swapaxes(0,2)fori,levelinenumerate(levels):ifself.differentiationisnotNoneandself.differentiator_[level]isnotNone:predictions[i,:,:]=(self.differentiator_[level].inverse_transform_next_window(predictions[i,:,:]))transformer_level=self.transformer_series_.get(level,self.transformer_series_['_unknown_level'])iftransformer_levelisnotNone:predictions[i,:,:]=np.apply_along_axis(func1d=transform_numpy,axis=0,arr=predictions[i,:,:],transformer=transformer_level,fit=False,inverse_transform=True)predictions=pd.DataFrame(data=predictions.swapaxes(0,1).reshape(-1,3),index=np.repeat(prediction_index,n_levels),columns=["pred","lower_bound","upper_bound"])predictions.insert(0,'level',np.tile(levels,steps))returnpredictions
Predict n steps ahead and estimate prediction intervals using either
bootstrapping or conformal prediction methods. Refer to the References
section for additional details on these methods.
Time series to be predicted. If None all levels whose last window
ends at the same datetime index will be predicted together.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed in the
first iteration of the prediction (t + 1).
If last_window = None, the values stored in self.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
Confidence level of the prediction interval. Interpretation depends
on the method used:
If float, represents the nominal (expected) coverage (between 0
and 1). For instance, interval=0.95 corresponds to [2.5, 97.5]
percentiles.
If list or tuple, defines the exact percentiles to compute, which
must be between 0 and 100 inclusive. For example, interval
of 95% should be as interval = [2.5, 97.5].
When using method='conformal', the interval must be a float or
a list/tuple defining a symmetric interval.
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
Returns:
Name
Type
Description
predictions
pandas DataFrame
Long-format DataFrame with the predictions and the lower and upper
bounds of the estimated interval. The columns are level, pred,
lower_bound, upper_bound.
defpredict_interval(self,steps:int,levels:str|list[str]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|dict[str,pd.Series|pd.DataFrame]|None=None,method:str='conformal',interval:float|list[float]|tuple[float]=[5,95],n_boot:int=250,use_in_sample_residuals:bool=True,use_binned_residuals:bool=True,random_state:int=123,suppress_warnings:bool=False)->pd.DataFrame:""" Predict n steps ahead and estimate prediction intervals using either bootstrapping or conformal prediction methods. Refer to the References section for additional details on these methods. Parameters ---------- steps : int Number of steps to predict. levels : str, list, default None Time series to be predicted. If `None` all levels whose last window ends at the same datetime index will be predicted together. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, dict, default None Exogenous variable/s included as predictor/s. method : str, default 'conformal' Technique used to estimate prediction intervals. Available options: - 'bootstrapping': Bootstrapping is used to generate prediction intervals [1]_. - 'conformal': Employs the conformal prediction split method for interval estimation [2]_. interval : float, list, tuple, default [5, 95] Confidence level of the prediction interval. Interpretation depends on the method used: - If `float`, represents the nominal (expected) coverage (between 0 and 1). For instance, `interval=0.95` corresponds to `[2.5, 97.5]` percentiles. - If `list` or `tuple`, defines the exact percentiles to compute, which must be between 0 and 100 inclusive. For example, interval of 95% should be as `interval = [2.5, 97.5]`. - When using `method='conformal'`, the interval must be a float or a list/tuple defining a symmetric interval. n_boot : int, default 250 Number of bootstrapping iterations to perform when estimating prediction intervals. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. use_binned_residuals : bool, default True If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly. **New in version 0.15.0** random_state : int, default 123 Seed for the random number generator to ensure reproducibility. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. Returns ------- predictions : pandas DataFrame Long-format DataFrame with the predictions and the lower and upper bounds of the estimated interval. The columns are `level`, `pred`, `lower_bound`, `upper_bound`. References ---------- .. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html .. [2] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method """set_skforecast_warnings(suppress_warnings,action='ignore')ifmethod=="bootstrapping":ifisinstance(interval,(list,tuple)):check_interval(interval=interval,ensure_symmetric_intervals=False)interval=np.array(interval)/100else:check_interval(alpha=interval,alpha_literal='interval')interval=np.array([0.5-interval/2,0.5+interval/2])boot_predictions=self.predict_bootstrapping(steps=steps,levels=levels,last_window=last_window,exog=exog,n_boot=n_boot,use_in_sample_residuals=use_in_sample_residuals,use_binned_residuals=use_binned_residuals,random_state=random_state,suppress_warnings=suppress_warnings)predictions=self.predict(steps=steps,levels=levels,last_window=last_window,exog=exog,suppress_warnings=suppress_warnings,check_inputs=False)boot_predictions[['lower_bound','upper_bound']]=(boot_predictions.iloc[:,1:].quantile(q=interval,axis=1).transpose())predictions=pd.concat([predictions,boot_predictions[['lower_bound','upper_bound']]],axis=1)elifmethod=='conformal':ifisinstance(interval,(list,tuple)):check_interval(interval=interval,ensure_symmetric_intervals=True)nominal_coverage=(interval[1]-interval[0])/100else:check_interval(alpha=interval,alpha_literal='interval')nominal_coverage=intervalpredictions=self._predict_interval_conformal(steps=steps,levels=levels,last_window=last_window,exog=exog,nominal_coverage=nominal_coverage,use_in_sample_residuals=use_in_sample_residuals,use_binned_residuals=use_binned_residuals)else:raiseValueError(f"Invalid `method` '{method}'. Choose 'bootstrapping' or 'conformal'.")set_skforecast_warnings(suppress_warnings,action='default')returnpredictions
Calculate the specified quantiles for each step. After generating
multiple forecasting predictions through a bootstrapping process, each
quantile is calculated for each step.
Time series to be predicted. If None all levels whose last window
ends at the same datetime index will be predicted together.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed in the
first iteration of the prediction (t + 1).
If last_window = None, the values stored in self.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
Sequence of quantiles to compute, which must be between 0 and 1
inclusive. For example, quantiles of 0.05, 0.5 and 0.95 should be as
quantiles = [0.05, 0.5, 0.95].
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
Returns:
Name
Type
Description
predictions
pandas DataFrame
Long-format DataFrame with the quantiles predicted by the forecaster.
For example, if quantiles = [0.05, 0.5, 0.95], the columns are
level, q_0.05, q_0.5, q_0.95.
defpredict_quantiles(self,steps:int,levels:str|list[str]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|dict[str,pd.Series|pd.DataFrame]|None=None,quantiles:list[float]|tuple[float]=[0.05,0.5,0.95],n_boot:int=250,use_in_sample_residuals:bool=True,use_binned_residuals:bool=True,random_state:int=123,suppress_warnings:bool=False)->pd.DataFrame:""" Calculate the specified quantiles for each step. After generating multiple forecasting predictions through a bootstrapping process, each quantile is calculated for each step. Parameters ---------- steps : int Number of steps to predict. levels : str, list, default None Time series to be predicted. If `None` all levels whose last window ends at the same datetime index will be predicted together. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, dict, default None Exogenous variable/s included as predictor/s. quantiles : list, tuple, default [0.05, 0.5, 0.95] Sequence of quantiles to compute, which must be between 0 and 1 inclusive. For example, quantiles of 0.05, 0.5 and 0.95 should be as `quantiles = [0.05, 0.5, 0.95]`. n_boot : int, default 250 Number of bootstrapping iterations to perform when estimating quantiles. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. use_binned_residuals : bool, default True If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly. **New in version 0.15.0** random_state : int, default 123 Seed for the random number generator to ensure reproducibility. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. Returns ------- predictions : pandas DataFrame Long-format DataFrame with the quantiles predicted by the forecaster. For example, if `quantiles = [0.05, 0.5, 0.95]`, the columns are `level`, `q_0.05`, `q_0.5`, `q_0.95`. References ---------- .. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html """set_skforecast_warnings(suppress_warnings,action='ignore')check_interval(quantiles=quantiles)predictions=self.predict_bootstrapping(steps=steps,levels=levels,last_window=last_window,exog=exog,n_boot=n_boot,use_in_sample_residuals=use_in_sample_residuals,use_binned_residuals=use_binned_residuals,random_state=random_state,suppress_warnings=suppress_warnings)quantiles_cols=[f'q_{q}'forqinquantiles]predictions[quantiles_cols]=(predictions.iloc[:,1:].quantile(q=quantiles,axis=1).transpose())predictions=predictions[['level']+quantiles_cols]set_skforecast_warnings(suppress_warnings,action='default')returnpredictions
Fit a given probability distribution for each step. After generating
multiple forecasting predictions through a bootstrapping process, each
step is fitted to the given distribution.
Time series to be predicted. If None all levels whose last window
ends at the same datetime index will be predicted together.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed in the
first iteration of the prediction (t + 1).
If last_window = None, the values stored in self.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
Returns:
Name
Type
Description
predictions
pandas DataFrame
Long-format DataFrame with the parameters of the fitted distribution
for each step. The columns are level, param_0, param_1, ...,
param_n, where param_i are the parameters of the distribution.
defpredict_dist(self,steps:int,distribution:object,levels:str|list[str]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|dict[str,pd.Series|pd.DataFrame]|None=None,n_boot:int=250,use_in_sample_residuals:bool=True,use_binned_residuals:bool=True,random_state:int=123,suppress_warnings:bool=False)->pd.DataFrame:""" Fit a given probability distribution for each step. After generating multiple forecasting predictions through a bootstrapping process, each step is fitted to the given distribution. Parameters ---------- steps : int Number of steps to predict. distribution : object A distribution object from scipy.stats with methods `_pdf` and `fit`. For example scipy.stats.norm. levels : str, list, default None Time series to be predicted. If `None` all levels whose last window ends at the same datetime index will be predicted together. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, dict, default None Exogenous variable/s included as predictor/s. n_boot : int, default 250 Number of bootstrapping iterations to perform when estimating prediction intervals. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. use_binned_residuals : bool, default True If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly. **New in version 0.15.0** random_state : int, default 123 Seed for the random number generator to ensure reproducibility. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. Returns ------- predictions : pandas DataFrame Long-format DataFrame with the parameters of the fitted distribution for each step. The columns are `level`, `param_0`, `param_1`, ..., `param_n`, where `param_i` are the parameters of the distribution. References ---------- .. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html """ifnothasattr(distribution,"_pdf")ornotcallable(getattr(distribution,"fit",None)):raiseTypeError("`distribution` must be a valid probability distribution object ""from scipy.stats, with methods `_pdf` and `fit`.")set_skforecast_warnings(suppress_warnings,action='ignore')predictions=self.predict_bootstrapping(steps=steps,levels=levels,last_window=last_window,exog=exog,n_boot=n_boot,use_in_sample_residuals=use_in_sample_residuals,use_binned_residuals=use_binned_residuals,random_state=random_state,suppress_warnings=suppress_warnings)param_names=[pforpininspect.signature(distribution._pdf).parametersifnotp=="x"]+["loc","scale"]predictions[param_names]=(predictions.iloc[:,1:].apply(lambdax:distribution.fit(x),axis=1,result_type='expand'))predictions=predictions[['level']+param_names]set_skforecast_warnings(suppress_warnings,action='default')returnpredictions
defset_params(self,params:dict[str,object])->None:""" Set new values to the parameters of the scikit-learn model stored in the forecaster. Parameters ---------- params : dict Parameters values. Returns ------- None """self.estimator=clone(self.estimator)self.estimator.set_params(**params)
defset_fit_kwargs(self,fit_kwargs:dict[str,object])->None:""" Set new values for the additional keyword arguments passed to the `fit` method of the estimator. Parameters ---------- fit_kwargs : dict Dict of the form {"argument": new_value}. Returns ------- None """self.fit_kwargs=check_select_fit_kwargs(self.estimator,fit_kwargs=fit_kwargs)
defset_lags(self,lags:int|list[int]|np.ndarray[int]|range[int]|None=None)->None:""" Set new value to the attribute `lags`. Attributes `lags_names`, `max_lag` and `window_size` are also updated. Parameters ---------- lags : int, list, numpy ndarray, range, default None Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. - `int`: include lags from 1 to `lags` (included). - `list`, `1d numpy ndarray` or `range`: include only lags present in `lags`, all elements must be int. - `None`: no lags are included as predictors. Returns ------- None """ifself.window_featuresisNoneandlagsisNone:raiseValueError("At least one of the arguments `lags` or `window_features` ""must be different from None. This is required to create the ""predictors used in training the forecaster.")self.lags,self.lags_names,self.max_lag=initialize_lags(type(self).__name__,lags)self.window_size=max([wsforwsin[self.max_lag,self.max_size_window_features]ifwsisnotNone])ifself.differentiationisnotNone:self.window_size+=self.differentiation_maxifisinstance(self.differentiator,dict):forseriesinself.differentiator.keys():ifself.differentiator[series]isnotNone:self.differentiator[series].set_params(window_size=self.window_size)else:self.differentiator.set_params(window_size=self.window_size)
Set new value to the attribute window_features. Attributes
max_size_window_features, window_features_names,
window_features_class_names and window_size are also updated.
Instance or list of instances used to create window features. Window features
are created from the original time series and are included as predictors.
None
Returns:
Type
Description
None
Source code in skforecast/recursive/_forecaster_recursive_multiseries.py
defset_window_features(self,window_features:object|list[object]|None=None)->None:""" Set new value to the attribute `window_features`. Attributes `max_size_window_features`, `window_features_names`, `window_features_class_names` and `window_size` are also updated. Parameters ---------- window_features : object, list, default None Instance or list of instances used to create window features. Window features are created from the original time series and are included as predictors. Returns ------- None """ifwindow_featuresisNoneandself.lagsisNone:raiseValueError("At least one of the arguments `lags` or `window_features` ""must be different from None. This is required to create the ""predictors used in training the forecaster.")self.window_features,self.window_features_names,self.max_size_window_features=(initialize_window_features(window_features))self.window_features_class_names=Noneifwindow_featuresisnotNone:self.window_features_class_names=[type(wf).__name__forwfinself.window_features]self.window_size=max([wsforwsin[self.max_lag,self.max_size_window_features]ifwsisnotNone])ifself.differentiationisnotNone:self.window_size+=self.differentiation_maxifisinstance(self.differentiator,dict):forseriesinself.differentiator.keys():ifself.differentiator[series]isnotNone:self.differentiator[series].set_params(window_size=self.window_size)else:self.differentiator.set_params(window_size=self.window_size)
Set in-sample residuals in case they were not calculated during the
training process.
In-sample residuals are calculated as the difference between the true
values and the predictions made by the forecaster using the training
data. The following internal attributes are updated:
in_sample_residuals_: Dictionary containing a numpy ndarray with the
residuals for each series in the form {series: residuals}.
binner_intervals_: intervals used to bin the residuals are calculated
using the quantiles of the predicted values.
in_sample_residuals_by_bin_: residuals are binned according to the
predicted value they are associated with and stored in a dictionary per
series, where the keys are the intervals of the predicted values and the
values are the residuals associated with that range.
A total of 10_000 residuals are stored in the attribute in_sample_residuals_.
If the number of residuals is greater than 10_000, a random sample of
10_000 residuals is stored. The number of residuals stored per bin is
limited to 10_000 // self.binner.n_bins_.
If True, skforecast warnings will be suppressed during the sampling
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
Returns:
Type
Description
None
Source code in skforecast/recursive/_forecaster_recursive_multiseries.py
defset_in_sample_residuals(self,series:pd.DataFrame|dict[str,pd.Series|pd.DataFrame],exog:pd.Series|pd.DataFrame|dict[str,pd.Series|pd.DataFrame]|None=None,random_state:int=123,suppress_warnings:bool=False)->None:""" Set in-sample residuals in case they were not calculated during the training process. In-sample residuals are calculated as the difference between the true values and the predictions made by the forecaster using the training data. The following internal attributes are updated: + `in_sample_residuals_`: Dictionary containing a numpy ndarray with the residuals for each series in the form `{series: residuals}`. + `binner_intervals_`: intervals used to bin the residuals are calculated using the quantiles of the predicted values. + `in_sample_residuals_by_bin_`: residuals are binned according to the predicted value they are associated with and stored in a dictionary per series, where the keys are the intervals of the predicted values and the values are the residuals associated with that range. A total of 10_000 residuals are stored in the attribute `in_sample_residuals_`. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. The number of residuals stored per bin is limited to `10_000 // self.binner.n_bins_`. Parameters ---------- series : pandas DataFrame, dict Training time series. exog : pandas Series, pandas DataFrame, dict, default None Exogenous variable/s included as predictor/s. random_state : int, default 123 Sets a seed to the random sampling for reproducible output. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the sampling process. See skforecast.exceptions.warn_skforecast_categories for more information. Returns ------- None """set_skforecast_warnings(suppress_warnings,action='ignore')ifnotself.is_fitted:raiseNotFittedError("This forecaster is not fitted yet. Call `fit` with appropriate ""arguments before using `set_in_sample_residuals()`.")(X_train,y_train,series_indexes,_,X_train_series_names_in_,*_)=self._create_train_X_y(series=series,exog=exog,store_last_window=False)# NOTE: Same series names as training is checked in _create_train_X_y.series_index_range={k:v[[0,-1]]fork,vinseries_indexes.items()}forlevelinself.training_range_.keys():ifnotseries_index_range[level].equals(self.training_range_[level]):raiseIndexError(f"The index range for series '{level}' does not match the range "f"used during training. Please ensure the index is aligned "f"with the training data.\n"f" Expected : {self.training_range_[level]}\n"f" Received : {series_index_range[level]}")X_train_estimator=(X_trainifself.encodingisnotNoneelseX_train.drop(columns="_level_skforecast"))X_train_features_names_out_=X_train_estimator.columns.to_list()ifnotX_train_features_names_out_==self.X_train_features_names_out_:raiseValueError(f"Feature mismatch detected after matrix creation. The features "f"generated from the provided data do not match those used during "f"the training process. To correctly set in-sample residuals, "f"ensure that the same data and preprocessing steps are applied.\n"f" Expected output : {self.X_train_features_names_out_}\n"f" Current output : {X_train_features_names_out_}")self.in_sample_residuals_={}self.in_sample_residuals_by_bin_={}y_pred=self.estimator.predict(X_train_estimator)ifself.encodingisnotNone:forlevelinX_train_series_names_in_:ifself.encoding=='onehot':mask=X_train[level].to_numpy()==1.else:encoded_value=self.encoding_mapping_[level]mask=X_train['_level_skforecast'].to_numpy()==encoded_valueself._binning_in_sample_residuals(level=level,y_true=y_train[mask],y_pred=y_pred[mask],store_in_sample_residuals=True,random_state=random_state)# NOTE: the _unknown_level is a random sample of 10_000 residuals of all levels.self._binning_in_sample_residuals(level='_unknown_level',y_true=y_train,y_pred=y_pred,store_in_sample_residuals=True,random_state=random_state)set_skforecast_warnings(suppress_warnings,action='default')
Set new values to the attribute out_sample_residuals_. Out of sample
residuals are meant to be calculated using observations that did not
participate in the training process. y_true and y_pred are expected
to be in the original scale of the time series. Residuals are calculated
as y_true - y_pred, after applying the necessary transformations and
differentiations if the forecaster includes them (self.transformer_series
and self.differentiation).
A total of 10_000 residuals are stored in the attribute out_sample_residuals_.
If the number of residuals is greater than 10_000, a random sample of
10_000 residuals is stored.
If True, new residuals are added to the once already stored in the
attribute out_sample_residuals_. If after appending the new residuals,
the limit of 10_000 samples is exceeded, a random sample of 10_000 is
kept.
defset_out_sample_residuals(self,y_true:dict[str,np.ndarray|pd.Series],y_pred:dict[str,np.ndarray|pd.Series],append:bool=False,random_state:int=123)->None:""" Set new values to the attribute `out_sample_residuals_`. Out of sample residuals are meant to be calculated using observations that did not participate in the training process. `y_true` and `y_pred` are expected to be in the original scale of the time series. Residuals are calculated as `y_true` - `y_pred`, after applying the necessary transformations and differentiations if the forecaster includes them (`self.transformer_series` and `self.differentiation`). A total of 10_000 residuals are stored in the attribute `out_sample_residuals_`. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. Parameters ---------- y_true : dict Dictionary of numpy ndarrays or pandas Series with the true values of the time series for each series in the form {series: y_true}. y_pred : dict Dictionary of numpy ndarrays or pandas Series with the predicted values of the time series for each series in the form {series: y_pred}. append : bool, default False If `True`, new residuals are added to the once already stored in the attribute `out_sample_residuals_`. If after appending the new residuals, the limit of 10_000 samples is exceeded, a random sample of 10_000 is kept. random_state : int, default 123 Sets a seed to the random sampling for reproducible output. Returns ------- None Notes ----- Out-of-sample residuals can only be stored for series seen during fit. To save residuals for unseen levels use the key '_unknown_level'. """ifnotself.is_fitted:raiseNotFittedError("This forecaster is not fitted yet. Call `fit` with appropriate ""arguments before using `set_out_sample_residuals()`.")ifnotisinstance(y_true,dict):raiseTypeError(f"`y_true` must be a dictionary of numpy ndarrays or pandas Series. "f"Got {type(y_true)}.")ifnotisinstance(y_pred,dict):raiseTypeError(f"`y_pred` must be a dictionary of numpy ndarrays or pandas Series. "f"Got {type(y_pred)}.")ifnotset(y_true.keys())==set(y_pred.keys()):raiseValueError(f"`y_true` and `y_pred` must have the same keys. "f"Got {set(y_true.keys())} and {set(y_pred.keys())}.")forkiny_true.keys():ifnotisinstance(y_true[k],(np.ndarray,pd.Series)):raiseTypeError(f"Values of `y_true` must be numpy ndarrays or pandas Series. "f"Got {type(y_true[k])} for series '{k}'.")ifnotisinstance(y_pred[k],(np.ndarray,pd.Series)):raiseTypeError(f"Values of `y_pred` must be numpy ndarrays or pandas Series. "f"Got {type(y_pred[k])} for series '{k}'.")iflen(y_true[k])!=len(y_pred[k]):raiseValueError(f"`y_true` and `y_pred` must have the same length. "f"Got {len(y_true[k])} and {len(y_pred[k])} for series '{k}'.")ifisinstance(y_true[k],pd.Series)andisinstance(y_pred[k],pd.Series):ifnoty_true[k].index.equals(y_pred[k].index):raiseValueError(f"When containing pandas Series, elements in `y_true` and "f"`y_pred` must have the same index. Error with series '{k}'.")# NOTE: Out-of-sample residuals can only be stored for series seen during # fit. To save residuals for unseen levels use the key '_unknown_level'. series_names_in_=self.series_names_in_+['_unknown_level']series_to_update=set(y_pred.keys()).intersection(set(series_names_in_))ifnotseries_to_update:raiseValueError("Provided keys in `y_pred` and `y_true` do not match any series ""seen during `fit`. Residuals cannot be updated.")ifself.out_sample_residuals_isNone:ifself.encodingisnotNone:self.out_sample_residuals_={level:Noneforlevelinseries_names_in_}self.out_sample_residuals_by_bin_={level:{}forlevelinseries_names_in_}else:self.out_sample_residuals_={'_unknown_level':None}self.out_sample_residuals_by_bin_={'_unknown_level':{}}forlevelinseries_to_update:residuals_level,residuals_by_bin_level=(self._binning_out_sample_residuals(level=level,y_true=y_true[level],y_pred=y_pred[level],append=append,random_state=random_state))self.out_sample_residuals_[level]=residuals_levelself.out_sample_residuals_by_bin_[level]=residuals_by_bin_levelifself.encodingisNoneor'_unknown_level'notinseries_to_update:ifself.encodingisNoneandlist(y_true.keys())!=['_unknown_level']:warnings.warn("As `encoding` is set to `None`, no distinction between levels ""is made. All residuals are stored in the '_unknown_level' key.",UnknownLevelWarning)# NOTE: when encoding is None, all levels are combined in '_unknown_level'.iflist(self.out_sample_residuals_.keys())!=['_unknown_level']:# To update completely _unknown_level laterself.out_sample_residuals_.pop('_unknown_level',None)residuals_all_levels=np.concatenate([valueforvalueinself.out_sample_residuals_.values()ifvalueisnotNone])rng=np.random.default_rng(seed=random_state)iflen(residuals_all_levels)>10_000:residuals_all_levels=rng.choice(a=residuals_all_levels,size=10_000,replace=False)all_bins_keys=set(bin_keyfordict_level_binsinself.out_sample_residuals_by_bin_.values()forbin_keyindict_level_bins.keys())residuals_by_bin_all_levels={bin_key:np.concatenate([dict_level_bins.get(bin_key,np.array([]))fordict_level_binsinself.out_sample_residuals_by_bin_.values()])forbin_keyinall_bins_keys}forkeyinresiduals_by_bin_all_levels.keys():iflen(residuals_by_bin_all_levels[key])>10_000:residuals_by_bin_all_levels[key]=rng.choice(a=residuals_by_bin_all_levels[key],size=10_000,replace=False)ifself.encodingisNone:self.out_sample_residuals_={'_unknown_level':residuals_all_levels}self.out_sample_residuals_by_bin_={'_unknown_level':residuals_by_bin_all_levels}else:self.out_sample_residuals_['_unknown_level']=residuals_all_levelsself.out_sample_residuals_by_bin_['_unknown_level']=residuals_by_bin_all_levels
Bin out sample residuals using the already fitted binner.
y_true and y_pred are expected to be in the original scale of the
time series. Residuals are calculated as y_true - y_pred, after
applying the necessary transformations and differentiations if the
forecaster includes them (self.transformer_series and self.differentiation).
If True, new residuals are added to the once already stored in the
attribute out_sample_residuals_. If after appending the new residuals,
the limit of 10_000 samples is exceeded, a random sample of 10_000 is
kept.
def_binning_out_sample_residuals(self,level:str,y_true:np.ndarray,y_pred:np.ndarray,append:bool=False,random_state:int=123)->tuple[np.ndarray,dict[int,np.ndarray]]:""" Bin out sample residuals using the already fitted binner. `y_true` and `y_pred` are expected to be in the original scale of the time series. Residuals are calculated as `y_true` - `y_pred`, after applying the necessary transformations and differentiations if the forecaster includes them (`self.transformer_series` and `self.differentiation`). Parameters ---------- level : str Name of the level y_true and y_pred belong to. y_true : numpy ndarray True values of the time series. y_pred : numpy ndarray Predicted values of the time series. append : bool, default False If `True`, new residuals are added to the once already stored in the attribute `out_sample_residuals_`. If after appending the new residuals, the limit of 10_000 samples is exceeded, a random sample of 10_000 is kept. random_state : int, default 123 Sets a seed to the random sampling for reproducible output. Returns ------- out_sample_residuals : numpy ndarray Array with the residual for `level`. out_sample_residuals_by_bin : dict Dictionary with the residuals binned by the fitted binner for `level`. """# NOTE: if the level is not known or encoding is None, then transformer,# differentiator and binner used are the ones of "_unknown_level"transformer=self.transformer_series_.get(level,self.transformer_series_['_unknown_level'])differentiator=copy(self.differentiator_.get(level,self.differentiator_["_unknown_level"]))ifdifferentiatorisnotNone:differentiator.set_params(window_size=None)ifisinstance(y_true,pd.Series):y_true=y_true.to_numpy()ifisinstance(y_pred,pd.Series):y_pred=y_pred.to_numpy()ifself.transformer_series:y_true=transform_numpy(array=y_true,transformer=transformer,fit=False,inverse_transform=False)y_pred=transform_numpy(array=y_pred,transformer=transformer,fit=False,inverse_transform=False)ifdifferentiatorisnotNone:y_true=differentiator.fit_transform(y_true)[differentiator.order:]y_pred=differentiator.fit_transform(y_pred)[differentiator.order:]data=pd.DataFrame({'prediction':y_pred,'residuals':y_true-y_pred}).dropna()y_pred=data['prediction'].to_numpy()residuals=data['residuals'].to_numpy()binner=self.binner.get(level,self.binner['_unknown_level'])data['bin']=binner.transform(y_pred).astype(int)residuals_by_bin=data.groupby('bin')['residuals'].apply(np.array).to_dict()out_sample_residuals=self.out_sample_residuals_.get(level,np.array([]))out_sample_residuals_by_bin=deepcopy(self.out_sample_residuals_by_bin_.get(level,{}))out_sample_residuals=(np.array([])ifout_sample_residualsisNoneelseout_sample_residuals)out_sample_residuals_by_bin=({}ifout_sample_residuals_by_binisNoneelseout_sample_residuals_by_bin)ifappend:out_sample_residuals=np.concatenate([out_sample_residuals,residuals])fork,vinresiduals_by_bin.items():ifkinout_sample_residuals_by_bin:out_sample_residuals_by_bin[k]=np.concatenate((out_sample_residuals_by_bin[k],v))else:out_sample_residuals_by_bin[k]=velse:out_sample_residuals=residualsout_sample_residuals_by_bin=residuals_by_binmax_samples=10_000//binner.n_bins_rng=np.random.default_rng(seed=random_state)fork,vinout_sample_residuals_by_bin.items():iflen(v)>max_samples:sample=rng.choice(a=v,size=max_samples,replace=False)out_sample_residuals_by_bin[k]=sampleforkinself.binner_intervals_.get(level,{}).keys():ifknotinout_sample_residuals_by_bin:out_sample_residuals_by_bin[k]=np.array([])empty_bins=[kfork,vinout_sample_residuals_by_bin.items()ifv.size==0]ifempty_bins:warnings.warn(f"The following bins of level '{level}' have no out of sample residuals: "f"{empty_bins}. No predicted values fall in the interval "f"{[self.binner_intervals_[level][bin]forbininempty_bins]}. "f"Empty bins will be filled with a random sample of residuals.",ResidualsUsageWarning)empty_bin_size=min(max_samples,len(out_sample_residuals))forkinempty_bins:out_sample_residuals_by_bin[k]=rng.choice(a=out_sample_residuals,size=empty_bin_size,replace=False)iflen(out_sample_residuals)>10_000:out_sample_residuals=rng.choice(a=out_sample_residuals,size=10_000,replace=False)returnout_sample_residuals,out_sample_residuals_by_bin
Return feature importances of the estimator stored in the
forecaster. Only valid when estimator stores internally the feature
importances in the attribute feature_importances_ or coef_.
defget_feature_importances(self,sort_importance:bool=True)->pd.DataFrame:""" Return feature importances of the estimator stored in the forecaster. Only valid when estimator stores internally the feature importances in the attribute `feature_importances_` or `coef_`. Parameters ---------- sort_importance: bool, default True If `True`, sorts the feature importances in descending order. Returns ------- feature_importances : pandas DataFrame Feature importances associated with each predictor. """ifnotself.is_fitted:raiseNotFittedError("This forecaster is not fitted yet. Call `fit` with appropriate ""arguments before using `get_feature_importances()`.")ifisinstance(self.estimator,Pipeline):estimator=self.estimator[-1]else:estimator=self.estimatorifhasattr(estimator,'feature_importances_'):feature_importances=estimator.feature_importances_elifhasattr(estimator,'coef_'):feature_importances=estimator.coef_else:warnings.warn(f"Impossible to access feature importances for estimator of type "f"{type(estimator)}. This method is only valid when the "f"estimator stores internally the feature importances in the "f"attribute `feature_importances_` or `coef_`.")feature_importances=Noneiffeature_importancesisnotNone:feature_importances=pd.DataFrame({'feature':self.X_train_features_names_out_,'importance':feature_importances})ifsort_importance:feature_importances=feature_importances.sort_values(by='importance',ascending=False)returnfeature_importances