Additional keyword arguments for the recurrent layers [1], [2], [3]_.
Can be a single dictionary for all layers or a list of dictionaries
specifying different parameters for each recurrent layer.
Additional keyword arguments for the dense layers [4]_. Can be a single
dictionary for all layers or a list of dictionaries specifying different
parameters for each dense layer.
defcreate_and_compile_model(series:pd.DataFrame,lags:int|list[int]|np.ndarray[int]|range[int],steps:int,levels:str|list[str]|tuple[str]|None=None,exog:pd.Series|pd.DataFrame|None=None,recurrent_layer:str="LSTM",recurrent_units:int|list[int]|tuple[int]=100,recurrent_layers_kwargs:dict[str,Any]|list[dict[str,Any]]|None={"activation":"tanh"},dense_units:int|list[int]|tuple[int]|None=64,dense_layers_kwargs:dict[str,Any]|list[dict[str,Any]]|None={"activation":"relu"},output_dense_layer_kwargs:dict[str,Any]|None={"activation":"linear"},compile_kwargs:dict[str,Any]={"optimizer":Adam(),"loss":MeanSquaredError()},model_name:str|None=None)->keras.models.Model:""" Build and compile a RNN-based Keras model for time series prediction, supporting exogenous variables. Parameters ---------- series : pandas DataFrame Input time series with shape (n_obs, n_series). Each column is a time series. lags : int, list, numpy ndarray, range Number of lagged time steps to consider in the input, index starts at 1, so lag 1 is equal to t-1. - `int`: include lags from 1 to `lags` (included). - `list`, `1d numpy ndarray` or `range`: include only lags present in `lags`, all elements must be int. steps : int Number of steps to predict. levels : str, list, default None Output level(s) (features) to predict. If None, defaults to the names of input series. exog : pandas Series, pandas DataFrame, default None Exogenous variables to be included as input, should have the same number of rows as `series`. recurrent_layer : str, default 'LSTM' Type of recurrent layer to be used, 'LSTM' [1]_, 'GRU' [2]_, or 'RNN' [3]_. recurrent_units : int, list, default 100 Number of units in the recurrent layer(s). Can be an integer for single recurrent layer, or a list of integers for multiple recurrent layers. recurrent_layers_kwargs : dict, list, default {'activation': 'tanh'} Additional keyword arguments for the recurrent layers [1]_, [2]_, [3]_. Can be a single dictionary for all layers or a list of dictionaries specifying different parameters for each recurrent layer. dense_units : int, list, tuple, None, default 64 Number of units in the dense layer(s) [4]_. Can be an integer for single dense layer, or a list of integers for multiple dense layers. dense_layers_kwargs : dict, list, default {'activation': 'relu'} Additional keyword arguments for the dense layers [4]_. Can be a single dictionary for all layers or a list of dictionaries specifying different parameters for each dense layer. output_dense_layer_kwargs : dict, default {'activation': 'linear'} Additional keyword arguments for the output dense layer. compile_kwargs : dict, default {'optimizer': Adam(), 'loss': MeanSquaredError()} Additional keyword arguments for the model compilation, such as optimizer and loss function. [5]_ model_name : str, default None Name of the model. Returns ------- model : keras.models.Model Compiled Keras model ready for training. References ---------- .. [1] LSTM layer Keras documentation. https://keras.io/api/layers/recurrent_layers/lstm/ .. [2] GRU layer Keras documentation. https://keras.io/api/layers/recurrent_layers/gru/ .. [3] SimpleRNN layer Keras documentation. https://keras.io/api/layers/recurrent_layers/simple_rnn/ .. [4] Dense layer Keras documentation. https://keras.io/api/layers/core_layers/dense/ .. [5] Model training APIs: compile method. https://keras.io/api/models/model_training_apis/ """keras_backend=keras.backend.backend()print(f"keras version: {keras.__version__}")print(f"Using backend: {keras_backend}")ifkeras_backend=="tensorflow":importtensorflowprint(f"tensorflow version: {tensorflow.__version__}")elifkeras_backend=="torch":importtorchprint(f"torch version: {torch.__version__}")elifkeras_backend=="jax":importjaxprint(f"jax version: {jax.__version__}")else:print("Backend not recognized")print("")ifexogisNone:model=_create_and_compile_model_no_exog(series=series,lags=lags,steps=steps,levels=levels,recurrent_layer=recurrent_layer,recurrent_units=recurrent_units,recurrent_layers_kwargs=recurrent_layers_kwargs,dense_units=dense_units,dense_layers_kwargs=dense_layers_kwargs,output_dense_layer_kwargs=output_dense_layer_kwargs,compile_kwargs=compile_kwargs,model_name=model_name)else:model=_create_and_compile_model_exog(series=series,lags=lags,steps=steps,levels=levels,exog=exog,recurrent_layer=recurrent_layer,recurrent_units=recurrent_units,recurrent_layers_kwargs=recurrent_layers_kwargs,dense_units=dense_units,dense_layers_kwargs=dense_layers_kwargs,output_dense_layer_kwargs=output_dense_layer_kwargs,compile_kwargs=compile_kwargs,model_name=model_name)returnmodel
This class turns any estimator compatible with the Keras API into a
Keras RNN multi-series multi-step forecaster. A unique model is created
to forecast all time steps and series. Keras enables workflows on top of
either JAX, TensorFlow, or PyTorch. See documentation for more details.
Parameters:
Name
Type
Description
Default
estimator
estimator or pipeline compatible with the Keras API
An instance of a estimator or pipeline compatible with the Keras API.
Name of one or more time series to be predicted. This determine the series
the forecaster will be handling. If None, all series used during training
will be available for prediction.
required
lags
int, list, numpy ndarray, range
Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1.
int: include lags from 1 to lags (included).
list, 1d numpy ndarray or range: include only lags present in
lags, all elements must be int.
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API with methods: fit, transform, fit_transform and
inverse_transform. Transformation is applied to each series before training
the forecaster. ColumnTransformers are not allowed since they do not have
inverse_transform method.
If single transformer: it is cloned and applied to all series.
If dict of transformers: a different transformer can be used for each series.
None
transformer_exog
transformer
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API. The transformation is applied to exog before training the
forecaster. inverse_transform is not available when using ColumnTransformers.
estimator or pipeline compatible with the Keras API
An instance of a estimator or pipeline compatible with the Keras API.
An instance of this estimator is trained for each step. All of them
are stored in self.estimators_.
Name of one or more time series to be predicted. This determine the series
the forecaster will be handling. If None, all series used during training
will be available for prediction.
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API with methods: fit, transform, fit_transform and
inverse_transform. Transformation is applied to each series before training
the forecaster. ColumnTransformers are not allowed since they do not have
inverse_transform method.
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API. The transformation is applied to exog before training the
forecaster. inverse_transform is not available when using ColumnTransformers.
This window represents the most recent data observed by the predictor
during its training phase. It contains the values needed to predict the
next step immediately after the training data. These values are stored
in the original scale of the time series before undergoing any transformations
or differentiation.
Type of each exogenous variable/s used in training before the transformation
applied by transformer_exog. If transformer_exog is not used, it
is equal to exog_dtypes_out_.
Type of each exogenous variable/s used in training after the transformation
applied by transformer_exog. If transformer_exog is not used, it
is equal to exog_dtypes_in_.
Residuals of the model when predicting training data. Only stored up
to 10_000 values per step in the form {step: residuals}. If
transformer_series is not None, residuals are stored in the
transformed scale.
Residuals of the model when predicting non-training data. Only stored up
to 10_000 values per step in the form {step: residuals}. Use
set_out_sample_residuals() method to set values. If transformer_series
is not None, residuals are stored in the transformed scale.
def__init__(self,levels:str|list[str],lags:int|list[int]|np.ndarray[int]|range[int],estimator:object=None,transformer_series:object|dict[str,object]|None=MinMaxScaler(feature_range=(0,1)),transformer_exog:object|None=MinMaxScaler(feature_range=(0,1)),fit_kwargs:dict[str,object]|None={},forecaster_id:str|int|None=None,regressor:object=None)->None:self.estimator=deepcopy(initialize_estimator(estimator,regressor))self.levels=Noneself.transformer_series=transformer_seriesself.transformer_series_=Noneself.transformer_exog=transformer_exogself.max_lag=Noneself.window_size=Noneself.last_window_=Noneself.index_type_=Noneself.index_freq_=Noneself.training_range_=Noneself.series_names_in_=Noneself.exog_names_in_=Noneself.exog_type_in_=Noneself.exog_dtypes_in_=Noneself.X_train_dim_names_=Noneself.y_train_dim_names_=Noneself.history_=Noneself.is_fitted=Falseself.creation_date=pd.Timestamp.today().strftime("%Y-%m-%d %H:%M:%S")self.fit_date=Noneself.keras_backend_=Noneself.skforecast_version=__version__self.python_version=sys.version.split(" ")[0]self.forecaster_id=forecaster_idself._probabilistic_mode="no_binned"self.weight_func=None# Ignored in this forecasterself.source_code_weight_func=None# Ignored in this forecasterself.dropna_from_series=False# Ignored in this forecasterself.encoding=None# Ignored in this forecasterself.in_sample_residuals_=None# Ignored in this forecasterself.in_sample_residuals_by_bin_=None# Ignored in this forecasterself.out_sample_residuals_=None# Ignored in this forecasterself.out_sample_residuals_by_bin_=None# Ignored in this forecasterself.differentiation=None# Ignored in this forecasterself.differentiation_max=None# Ignored in this forecasterself.differentiator=None# Ignored in this forecasterself.differentiator_=None# Ignored in this forecasterlayer_init=self.estimator.layers[0]layer_end=self.estimator.layers[-1]self.layers_names=[layer.nameforlayerinself.estimator.layers]self.lags,self.lags_names,self.max_lag=initialize_lags(type(self).__name__,lags)n_lags_estimator=layer_init.output.shape[1]iflen(self.lags)!=n_lags_estimator:raiseValueError(f"Number of lags ({len(self.lags)}) does not match the number of "f"lags expected by the estimator architecture ({n_lags_estimator}).")self.window_size=self.max_lagself.steps=np.arange(layer_end.output.shape[1])+1self.max_step=np.max(self.steps)ifisinstance(levels,str):self.levels=[levels]elifisinstance(levels,list):self.levels=levelselse:raiseTypeError(f"`levels` argument must be a string or list. Got {type(levels)}.")self.n_series_in=self.estimator.get_layer('series_input').output.shape[-1]self.n_levels_out=self.estimator.get_layer('output_dense_td_layer').output.shape[-1]self.exog_in_=Trueif"exog_input"inself.layers_nameselseFalseifself.exog_in_:self.n_exog_in=self.estimator.get_layer('exog_input').output.shape[-1]else:self.n_exog_in=None# NOTE: This is needed because the Reshape layer changes the output # shape in _create_and_compile_model_no_exogself.n_levels_out=int(self.n_levels_out/self.max_step)ifnotlen(self.levels)==self.n_levels_out:raiseValueError(f"Number of levels ({len(self.levels)}) does not match the number of "f"levels expected by the estimator architecture ({self.n_levels_out}).")self.series_val=Noneself.exog_val=Noneif"series_val"infit_kwargs:ifnotisinstance(fit_kwargs["series_val"],pd.DataFrame):raiseTypeError(f"`series_val` must be a pandas DataFrame. "f"Got {type(fit_kwargs['series_val'])}.")self.series_val=fit_kwargs.pop("series_val")ifself.exog_in_:if"exog_val"notinfit_kwargs.keys():raiseValueError("If `series_val` is provided, `exog_val` must also be ""provided using the `fit_kwargs` argument when the ""estimator has exogenous variables.")else:ifnotisinstance(fit_kwargs["exog_val"],(pd.Series,pd.DataFrame)):raiseTypeError(f"`exog_val` must be a pandas Series or DataFrame. "f"Got {type(fit_kwargs['exog_val'])}.")self.exog_val=input_to_frame(data=fit_kwargs.pop("exog_val"),input_name='exog_val')self.fit_kwargs=check_select_fit_kwargs(estimator=self.estimator,fit_kwargs=fit_kwargs)self.__skforecast_tags__={"library":"skforecast","forecaster_name":"ForecasterRNN","forecaster_task":"regression","forecasting_scope":"global",# single-series | global"forecasting_strategy":"deep_learning",# recursive | direct | deep_learning"index_types_supported":["pandas.RangeIndex","pandas.DatetimeIndex"],"requires_index_frequency":True,"allowed_input_types_series":["pandas.DataFrame"],"supports_exog":True,"allowed_input_types_exog":["pandas.Series","pandas.DataFrame"],"handles_missing_values_series":False,"handles_missing_values_exog":False,"supports_lags":True,"supports_window_features":False,"supports_transformer_series":True,"supports_transformer_exog":True,"supports_weight_func":False,"supports_series_weights":False,"supports_differentiation":False,"prediction_types":["point","interval"],"supports_probabilistic":True,"probabilistic_methods":["conformal"],"handles_binned_residuals":False}
Transforms a 1d array into a 3d array (X) and a 3d array (y). Each row
in X is associated with a value of y and it represents the lags that
precede it.
Notice that, the returned matrix X_data, contains the lag 1 in the first
column, the lag 2 in the second column and so on.
Parameters:
Name
Type
Description
Default
y
numpy ndarray
1d numpy ndarray Training time series.
required
Returns:
Name
Type
Description
X_data
numpy ndarray
3d numpy ndarray with the lagged values (predictors).
Shape: (samples - max(lags), len(lags))
y_data
numpy ndarray
3d numpy ndarray with the values of the time series related to each
row of X_data for each step.
Shape: (len(max_step), samples - max(lags))
Source code in skforecast/deep_learning/_forecaster_rnn.py
def_create_lags(self,y:np.ndarray)->tuple[np.ndarray,np.ndarray]:""" Transforms a 1d array into a 3d array (X) and a 3d array (y). Each row in X is associated with a value of y and it represents the lags that precede it. Notice that, the returned matrix X_data, contains the lag 1 in the first column, the lag 2 in the second column and so on. Parameters ---------- y : numpy ndarray 1d numpy ndarray Training time series. Returns ------- X_data : numpy ndarray 3d numpy ndarray with the lagged values (predictors). Shape: (samples - max(lags), len(lags)) y_data : numpy ndarray 3d numpy ndarray with the values of the time series related to each row of `X_data` for each step. Shape: (len(max_step), samples - max(lags)) """n_rows=len(y)-self.window_size-self.max_step+1# rows of y_dataX_data=np.full(shape=(n_rows,(self.window_size)),fill_value=np.nan,order="F",dtype=float)fori,laginenumerate(range(self.window_size-1,-1,-1)):X_data[:,i]=y[self.window_size-lag-1:-(lag+self.max_step)]# Get lags indexX_data=X_data[:,self.lags-1]y_data=np.full(shape=(n_rows,self.max_step),fill_value=np.nan,order="F",dtype=float)forstepinrange(self.max_step):y_data[:,step]=y[self.window_size+step:self.window_size+step+n_rows]# Get steps indexy_data=y_data[:,self.steps-1]returnX_data,y_data
Type of each exogenous variable/s used in training before the transformation
applied by transformer_exog. If transformer_exog is not used, it
is equal to exog_dtypes_out_.
Type of each exogenous variable/s used in training after the transformation
applied by transformer_exog. If transformer_exog is not used, it
is equal to exog_dtypes_in_.
Source code in skforecast/deep_learning/_forecaster_rnn.py
def_create_train_X_y(self,series:pd.DataFrame,exog:pd.Series|pd.DataFrame|None=None)->tuple[np.ndarray,np.ndarray,np.ndarray,dict[int,list],list[str],dict[str,type],dict[str,type]]:""" Create training matrices. The resulting multi-dimensional matrices contain the target variable and predictors needed to train the model. Parameters ---------- series : pandas DataFrame Training time series. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. Must have the same number of observations as `series` and their indexes must be aligned. Returns ------- X_train : numpy ndarray Training values (predictors) for each step. The resulting array has 3 dimensions: (n_observations, n_lags, n_series) exog_train: numpy ndarray Value of exogenous variables aligned with X_train. (n_observations, n_exog) y_train : numpy ndarray Values (target) of the time series related to each row of `X_train`. The resulting array has 3 dimensions: (n_observations, n_steps, n_levels) dimension_names : dict Labels for the multi-dimensional arrays created internally for training. exog_names_in_ : list Names of the exogenous variables included in the training matrices. exog_dtypes_in_ : dict Type of each exogenous variable/s used in training before the transformation applied by `transformer_exog`. If `transformer_exog` is not used, it is equal to `exog_dtypes_out_`. exog_dtypes_out_ : dict Type of each exogenous variable/s used in training after the transformation applied by `transformer_exog`. If `transformer_exog` is not used, it is equal to `exog_dtypes_in_`. """ifnotisinstance(series,pd.DataFrame):raiseTypeError(f"`series` must be a pandas DataFrame. Got {type(series)}.")_,series_index=check_extract_values_and_index(data=series,data_label="`series`",return_values=False)series_names_in_=list(series.columns)ifnotlen(series_names_in_)==self.n_series_in:raiseValueError(f"Number of series in `series` ({len(series_names_in_)}) "f"does not match the number of series expected by the model "f"architecture ({self.n_series_in}).")ifnotset(self.levels).issubset(set(series_names_in_)):raiseValueError(f"`levels` defined when initializing the forecaster must be "f"included in `series` used for training. "f"{set(self.levels)-set(series_names_in_)} not found.")iflen(series)<self.window_size+self.max_step:raiseValueError(f"Minimum length of `series` for training this forecaster is "f"{self.window_size+self.max_step}. Reduce the number of "f"predicted steps, {self.max_step}, or the maximum "f"lag, {self.max_lag}, if no more data is available.\n"f" Length `series`: {len(series)}.\n"f" Max step : {self.max_step}.\n"f" Lags window size: {self.max_lag}.")ifexogisNoneandself.exog_in_:raiseValueError("The estimator architecture expects exogenous variables during ""training. Please provide the `exog` argument. If this is ""unexpected, check your estimator architecture or the ""initialization parameters of the forecaster.")ifexogisnotNoneandnotself.exog_in_:raiseValueError("Exogenous variables (`exog`) were provided, but the model ""architecture was not built to expect exogenous variables. Please ""remove the `exog` argument or rebuild the model to include ""exogenous inputs.")fit_transformer=Falseifnotself.is_fitted:fit_transformer=Trueself.transformer_series_=initialize_transformer_series(forecaster_name=type(self).__name__,series_names_in_=series_names_in_,transformer_series=self.transformer_series)# Step 1: Create lags for all columnsX_train=[]y_train=[]# TODO: Add method argument to calculate lags and/or stepsforserieinseries_names_in_:x=series[serie]check_y(y=x)x=transform_series(series=x,transformer=self.transformer_series_[serie],fit=fit_transformer,inverse_transform=False,)X,_=self._create_lags(x)X_train.append(X)forlevelinself.levels:y=series[level]check_y(y=y)y=transform_series(series=y,transformer=self.transformer_series_[level],fit=fit_transformer,inverse_transform=False,)_,y=self._create_lags(y)y_train.append(y)X_train=np.stack(X_train,axis=2)y_train=np.stack(y_train,axis=2)train_index=series_index[self.max_lag:(len(series_index)-self.max_step+1)]dimension_names={"X_train":{0:train_index,1:self.lags_names[::-1],2:series_names_in_,},"y_train":{0:train_index,1:[f"step_{step}"forstepinself.steps],2:self.levels,},}ifexogisnotNone:check_exog(exog=exog,allow_nan=False)exog=input_to_frame(data=exog,input_name='exog')_,exog_index=check_extract_values_and_index(data=exog,data_label='`exog`',ignore_freq=True,return_values=False)iflen(exog.columns)!=self.n_exog_in:raiseValueError(f"Number of columns in `exog` ({len(exog.columns)}) "f"does not match the number of exogenous variables expected "f"by the model architecture ({self.n_exog_in}).")series_index_no_ws=series_index[self.window_size:]len_series=len(series)len_series_no_ws=len_series-self.window_sizelen_exog=len(exog)ifnotlen_exog==len_seriesandnotlen_exog==len_series_no_ws:raiseValueError(f"Length of `exog` must be equal to the length of `series` (if "f"index is fully aligned) or length of `series` - `window_size` "f"(if `exog` starts after the first `window_size` values).\n"f" `exog` : ({exog_index[0]} -- {exog_index[-1]}) (n={len_exog})\n"f" `series` : ({series.index[0]} -- {series.index[-1]}) (n={len_series})\n"f" `series` - `window_size` : ({series_index_no_ws[0]} -- {series_index_no_ws[-1]}) (n={len_series_no_ws})")exog_names_in_=exog.columns.to_list()iflen(set(exog_names_in_)-set(series_names_in_))!=len(exog_names_in_):raiseValueError(f"`exog` cannot contain a column named the same as one of "f"the series (column names of series).\n"f" `series` columns : {series_names_in_}.\n"f" `exog` columns : {exog_names_in_}.")exog_n_dim_in=len(exog_names_in_)exog_dtypes_in_=get_exog_dtypes(exog=exog)exog=transform_dataframe(df=exog,transformer=self.transformer_exog,fit=fit_transformer,inverse_transform=False,)exog_n_dim_out=len(exog.columns)exog_dtypes_out_=get_exog_dtypes(exog=exog)ifexog_n_dim_in!=exog_n_dim_out:raiseValueError(f"Number of columns in `exog` after transformation ({exog_n_dim_out}) "f"does not match the number of columns before transformation ({exog_n_dim_in}). "f"The ForecasterRnn does not support transformations that "f"change the number of columns in `exog`. Preprocess `exog` "f"before passing it to the `create_and_compile_model` function.")iflen_exog==len_series:ifnot(exog_index==series_index).all():raiseValueError("When `exog` has the same length as `series`, the index ""of `exog` must be aligned with the index of `series` ""to ensure the correct alignment of values.")else:ifnot(exog_index==series_index_no_ws).all():raiseValueError("When `exog` doesn't contain the first `window_size` ""observations, the index of `exog` must be aligned with ""the index of `series` minus the first `window_size` ""observations to ensure the correct alignment of values.")exog_train=[]for_,exog_nameinenumerate(exog.columns):_,exog_step=self._create_lags(exog[exog_name])exog_train.append(exog_step)exog_train=np.stack(exog_train,axis=2)dimension_names["exog_train"]={0:train_index,1:[f"step_{step}"forstepinself.steps],2:exog.columns.to_list(),}else:exog_train=Noneexog_names_in_=Noneexog_dtypes_in_=Noneexog_dtypes_out_=Nonedimension_names["exog_train"]={0:None,1:None,2:None}return(X_train,exog_train,y_train,dimension_names,exog_names_in_,exog_dtypes_in_,exog_dtypes_out_)
If True, skforecast warnings will be suppressed during the creation
of the training matrices. See skforecast.exceptions.warn_skforecast_categories
for more information.
False
Returns:
Name
Type
Description
X_train
numpy ndarray
Training values (predictors) for each step. The resulting array has
3 dimensions: (n_observations, n_lags, n_series)
exog_train
numpy ndarray
Value of exogenous variables aligned with X_train. (n_observations, n_exog)
y_train
numpy ndarray
Values (target) of the time series related to each row of X_train.
The resulting array has 3 dimensions: (n_observations, n_steps, n_levels)
defcreate_train_X_y(self,series:pd.DataFrame,exog:pd.Series|pd.DataFrame|None=None,suppress_warnings:bool=False)->tuple[np.ndarray,np.ndarray,np.ndarray,dict[int,list]]:""" Create training matrices. The resulting multi-dimensional matrices contain the target variable and predictors needed to train the model. Parameters ---------- series : pandas DataFrame Training time series. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. Must have the same number of observations as `series` and their indexes must be aligned. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the creation of the training matrices. See skforecast.exceptions.warn_skforecast_categories for more information. Returns ------- X_train : numpy ndarray Training values (predictors) for each step. The resulting array has 3 dimensions: (n_observations, n_lags, n_series) exog_train: numpy ndarray Value of exogenous variables aligned with X_train. (n_observations, n_exog) y_train : numpy ndarray Values (target) of the time series related to each row of `X_train`. The resulting array has 3 dimensions: (n_observations, n_steps, n_levels) dimension_names : dict Labels for the multi-dimensional arrays created internally for training. """set_skforecast_warnings(suppress_warnings,action='ignore')output=self._create_train_X_y(series=series,exog=exog)X_train=output[0]exog_train=output[1]y_train=output[2]dimension_names=output[3]set_skforecast_warnings(suppress_warnings,action='default')returnX_train,exog_train,y_train,dimension_names
Additional arguments to be passed to the fit method of the estimator
can be added with the fit_kwargs argument when initializing the forecaster.
Parameters:
Name
Type
Description
Default
series
pandas DataFrame
Training time series.
required
exog
pandas Series, pandas DataFrame
Exogenous variable/s included as predictor/s. Must have the same
number of observations as series and their indexes must be aligned so
that series[i] is regressed on exog[i].
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
Returns:
Type
Description
None
Source code in skforecast/deep_learning/_forecaster_rnn.py
deffit(self,series:pd.DataFrame,exog:pd.Series|pd.DataFrame=None,store_last_window:bool=True,store_in_sample_residuals:bool=False,random_state:int=123,suppress_warnings:bool=False)->None:""" Training Forecaster. Additional arguments to be passed to the `fit` method of the estimator can be added with the `fit_kwargs` argument when initializing the forecaster. Parameters ---------- series : pandas DataFrame Training time series. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. Must have the same number of observations as `series` and their indexes must be aligned so that series[i] is regressed on exog[i]. store_last_window : bool, default True Whether or not to store the last window (`last_window_`) of training data. store_in_sample_residuals : bool, default False If `True`, in-sample residuals will be stored in the forecaster object after fitting (`in_sample_residuals_` attribute). random_state : int, default 123 Set a seed for the random generator so that the stored sample residuals are always deterministic. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. Returns ------- None """set_skforecast_warnings(suppress_warnings,action="ignore")# Reset values in case the forecaster has already been fitted.self.last_window_=Noneself.index_type_=Noneself.index_freq_=Noneself.training_range_=Noneself.series_names_in_=Noneself.exog_names_in_=Noneself.exog_type_in_=Noneself.exog_dtypes_in_=Noneself.exog_dtypes_out_=Noneself.X_train_dim_names_=Noneself.y_train_dim_names_=Noneself.exog_train_dim_names_=Noneself.in_sample_residuals_=Noneself.is_fitted=Falseself.fit_date=Noneself.keras_backend_=keras.backend.backend()(X_train,exog_train,y_train,dimension_names,exog_names_in_,exog_dtypes_in_,exog_dtypes_out_,)=self._create_train_X_y(series=series,exog=exog)# NOTE: Need here to avoid refitting the transformer_series_ with the # validation data.self.is_fitted=Trueseries_names_in_=dimension_names["X_train"][2]ifself.keras_backend_=="torch":importtorchdevice=torch.device("cuda"iftorch.cuda.is_available()else"cpu")print(f"Using '{self.keras_backend_}' backend with device: {device}")torch_device=torch.device(device)X_train=torch.tensor(X_train).to(torch_device)y_train=torch.tensor(y_train).to(torch_device)ifexog_trainisnotNone:exog_train=torch.tensor(exog_train).to(torch_device)ifself.series_valisnotNone:series_val=self.series_val[series_names_in_]ifexogisnotNone:exog_val=self.exog_val[exog_names_in_]else:exog_val=NoneX_val,exog_val,y_val,*_=self._create_train_X_y(series=series_val,exog=exog_val)ifself.keras_backend_=="torch":X_val=torch.tensor(X_val).to(torch_device)y_val=torch.tensor(y_val).to(torch_device)ifexog_valisnotNone:exog_val=torch.tensor(exog_val).to(torch_device)ifself.exog_valisnotNone:history=self.estimator.fit(x=[X_train,exog_train],y=y_train,validation_data=([X_val,exog_val],y_val),**self.fit_kwargs,)else:history=self.estimator.fit(x=X_train,y=y_train,validation_data=(X_val,y_val),**self.fit_kwargs,)else:history=self.estimator.fit(x=X_trainifexog_trainisNoneelse[X_train,exog_train],y=y_train,**self.fit_kwargs,)# TODO: Include binning in the forecasterself.in_sample_residuals_={}ifstore_in_sample_residuals:# NOTE: Convert to numpy array if using torch backendifself.keras_backend_=="torch":y_train=y_train.detach().cpu().numpy()residuals=y_train-self.estimator.predict(x=X_trainifexog_trainisNoneelse[X_train,exog_train],verbose=0)residuals=np.concatenate([residuals[:,i,:]fori,stepinenumerate(self.steps)])rng=np.random.default_rng(seed=random_state)fori,levelinenumerate(self.levels):residuals_level=residuals[:,i]iflen(residuals_level)>10_000:residuals_level=residuals_level[rng.integers(low=0,high=len(residuals_level),size=10_000)]self.in_sample_residuals_[level]=residuals_levelelse:forlevelinself.levels:self.in_sample_residuals_[level]=Noneself.series_names_in_=series_names_in_self.X_train_series_names_in_=series_names_in_self.X_train_dim_names_=dimension_names["X_train"]self.y_train_dim_names_=dimension_names["y_train"]self.history_=history.historyself.fit_date=pd.Timestamp.today().strftime("%Y-%m-%d %H:%M:%S")self.training_range_=series.index[[0,-1]]self.index_type_=type(series.index)ifisinstance(series.index,pd.DatetimeIndex):self.index_freq_=series.index.freqelse:self.index_freq_=series.index.stepifexogisnotNone:# NOTE: self.exog_in_ is determined by the estimator architecture and# set during initialization.self.exog_names_in_=exog_names_in_self.exog_type_in_=type(exog)self.exog_dtypes_in_=exog_dtypes_in_self.exog_dtypes_out_=exog_dtypes_out_self.exog_train_dim_names_=dimension_names["exog_train"]self.X_train_exog_names_out_=dimension_names["exog_train"][2]self.X_train_features_names_out_=dimension_names["X_train"][1]+dimension_names["exog_train"][2]else:self.X_train_features_names_out_=dimension_names["X_train"][1]ifstore_last_window:self.last_window_=series.iloc[-self.max_lag:,:].copy()set_skforecast_warnings(suppress_warnings,action="default")
Name(s) of the time series to be predicted. It must be included
in levels, defined when initializing the forecaster. If None, all
all series used during training will be available for prediction.
None
last_window
pandas Series, pandas DataFrame
Series values used to create the predictors (lags) needed to
predict steps.
If last_window = None, the values stored in self.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
If True, the input is checked for possible warnings and errors
with the check_predict_input function. This argument is created
for internal use and is not recommended to be changed.
def_create_predict_inputs(self,steps:int|list[int]|None=None,levels:str|list[str]|None=None,last_window:pd.Series|pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|None=None,predict_probabilistic:bool=False,use_in_sample_residuals:bool=True,check_inputs:bool=True)->tuple[list[np.ndarray],dict[str,dict],list[int],list[str],pd.Index]:""" Create the inputs needed for the prediction process. Parameters ---------- steps : int, list, default None Predict n steps. The value of `steps` must be less than or equal to the value of steps defined in the estimator architecture. - If `int`: Only steps within the range of 1 to int are predicted. - If `list`: List of ints. Only the steps contained in the list are predicted. - If `None`: As many steps are predicted as defined in the estimator architecture. levels : str, list, default None Name(s) of the time series to be predicted. It must be included in `levels`, defined when initializing the forecaster. If `None`, all all series used during training will be available for prediction. last_window : pandas Series, pandas DataFrame, default None Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. predict_probabilistic : bool, default False If `True`, the necessary checks for probabilistic predictions will be performed. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. check_inputs : bool, default True If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed. Returns ------- X : list List of numpy arrays needed for prediction. The first element is the matrix of lags and the second element is the matrix of exogenous variables. X_predict_dimension_names : dict Labels for the multi-dimensional arrays created internally for prediction. steps : list Steps to predict. levels : list Levels (series) to predict. prediction_index : pandas Index Index of the predictions. """levels,_=prepare_levels_multiseries(X_train_series_names_in_=self.levels,levels=levels)steps=prepare_steps_direct(max_step=self.steps,steps=steps)iflast_windowisNone:last_window=self.last_window_ifcheck_inputs:check_predict_input(forecaster_name=type(self).__name__,steps=steps,is_fitted=self.is_fitted,exog_in_=self.exog_in_,index_type_=self.index_type_,index_freq_=self.index_freq_,window_size=self.window_size,last_window=last_window,exog=exog,exog_names_in_=self.exog_names_in_,interval=None,max_step=self.max_step,levels=levels,levels_forecaster=self.levels,series_names_in_=self.series_names_in_,)ifpredict_probabilistic:check_residuals_input(forecaster_name=type(self).__name__,use_in_sample_residuals=use_in_sample_residuals,in_sample_residuals_=self.in_sample_residuals_,out_sample_residuals_=self.out_sample_residuals_,use_binned_residuals=False,in_sample_residuals_by_bin_=None,out_sample_residuals_by_bin_=None,levels=self.levels)last_window=last_window.iloc[-self.window_size:,last_window.columns.get_indexer(self.series_names_in_)].copy()last_window_values=last_window.to_numpy()last_window_matrix=np.full(shape=last_window.shape,fill_value=np.nan,order='F',dtype=float)foridx_series,seriesinenumerate(self.series_names_in_):last_window_series=last_window_values[:,idx_series]last_window_series=transform_numpy(array=last_window_series,transformer=self.transformer_series_[series],fit=False,inverse_transform=False,)last_window_matrix[:,idx_series]=last_window_seriesX=[np.reshape(last_window_matrix,(1,self.max_lag,last_window.shape[1]))]X_predict_dimension_names={"X_autoreg":{0:"batch",1:self.lags_names[::-1],2:self.X_train_series_names_in_}}ifexogisnotNone:exog=input_to_frame(data=exog,input_name='exog')exog=transform_dataframe(df=exog,transformer=self.transformer_exog,fit=False,inverse_transform=False,)exog_pred=exog.to_numpy()[:self.max_step]# NOTE: This is done to ensure that the exogenous variables# have the same number of rows as the maximum step to predict # during backtesting when the last fold is incomplete iflen(exog_pred)<self.max_step:exog_pred=np.concatenate([exog_pred,np.full(shape=(self.max_step-len(exog_pred),exog_pred.shape[1]),fill_value=0.,dtype=float)],axis=0)exog_pred=np.expand_dims(exog_pred,axis=0)X.append(exog_pred)X_predict_dimension_names["exog_pred"]={0:"batch",1:[f"step_{step}"forstepinself.steps],2:self.X_train_exog_names_out_}prediction_index=expand_index(index=last_window.index,steps=max(steps))[np.array(steps)-1]ifisinstance(last_window.index,pd.DatetimeIndex)andnp.array_equal(steps,np.arange(min(steps),max(steps)+1)):prediction_index.freq=last_window.index.freqreturnX,X_predict_dimension_names,steps,levels,prediction_index
Name(s) of the time series to be predicted. It must be included
in levels, defined when initializing the forecaster. If None, all
all series used during training will be available for prediction.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed to
predict steps.
If last_window = None, the values stored in self.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
If True, the input is checked for possible warnings and errors
with the check_predict_input function. This argument is created
for internal use and is not recommended to be changed.
True
Returns:
Name
Type
Description
X_predict
pandas DataFrame
Pandas DataFrame with the predictors for each step.
exog_predict
pandas DataFrame
Pandas DataFrame with the exogenous variables for each step.
Source code in skforecast/deep_learning/_forecaster_rnn.py
defcreate_predict_X(self,steps:int|list[int]|None=None,levels:str|list[str]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|None=None,suppress_warnings:bool=False,check_inputs:bool=True)->tuple[pd.DataFrame,pd.DataFrame|None]:""" Create the predictors needed to predict `steps` ahead. Parameters ---------- steps : int, list, default None Predict n steps. The value of `steps` must be less than or equal to the value of steps defined in the estimator architecture. - If `int`: Only steps within the range of 1 to int are predicted. - If `list`: List of ints. Only the steps contained in the list are predicted. - If `None`: As many steps are predicted as defined in the estimator architecture. levels : str, list, default None Name(s) of the time series to be predicted. It must be included in `levels`, defined when initializing the forecaster. If `None`, all all series used during training will be available for prediction. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. check_inputs : bool, default True If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed. Returns ------- X_predict : pandas DataFrame Pandas DataFrame with the predictors for each step. exog_predict : pandas DataFrame Pandas DataFrame with the exogenous variables for each step. """set_skforecast_warnings(suppress_warnings,action='ignore')(X,X_predict_dimension_names,*_)=self._create_predict_inputs(steps=steps,levels=levels,last_window=last_window,exog=exog,check_inputs=check_inputs)X_predict=pd.DataFrame(data=X[0][0],columns=X_predict_dimension_names['X_autoreg'][2],index=X_predict_dimension_names['X_autoreg'][1])exog_predict=Noneifself.exog_in_:exog_predict=pd.DataFrame(data=X[1][0],columns=X_predict_dimension_names['exog_pred'][2],index=X_predict_dimension_names['exog_pred'][1])# NOTE: not needed in this forecaster# categorical_features = any(# not pd.api.types.is_numeric_dtype(dtype) or pd.api.types.is_bool_dtype(dtype) # for dtype in set(self.exog_dtypes_out_.values())# )# if categorical_features:# X_predict = X_predict.astype(self.exog_dtypes_out_)ifself.transformer_seriesisnotNone:warnings.warn("The output matrix is in the transformed scale due to the ""inclusion of transformations in the Forecaster. ""As a result, any predictions generated using this matrix will also ""be in the transformed scale. Please refer to the documentation ""for more details: ""https://skforecast.org/latest/user_guides/training-and-prediction-matrices.html",DataTransformationWarning)set_skforecast_warnings(suppress_warnings,action='default')returnX_predict,exog_predict
Name(s) of the time series to be predicted. It must be included
in levels, defined when initializing the forecaster. If None, all
all series used during training will be available for prediction.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed in the
first iteration of the prediction (t + 1).
If last_window = None, the values stored in self.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, the input is checked for possible warnings and errors
with the check_predict_input function. This argument is created
for internal use and is not recommended to be changed.
True
Returns:
Name
Type
Description
predictions
pandas DataFrame
Predicted values.
Source code in skforecast/deep_learning/_forecaster_rnn.py
defpredict(self,steps:int|list[int]|None=None,levels:str|list[str]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|None=None,suppress_warnings:bool=False,check_inputs:bool=True)->pd.DataFrame:""" Predict n steps ahead Parameters ---------- steps : int, list, default None Predict n steps. The value of `steps` must be less than or equal to the value of steps defined in the estimator architecture. - If `int`: Only steps within the range of 1 to int are predicted. - If `list`: List of ints. Only the steps contained in the list are predicted. - If `None`: As many steps are predicted as defined in the estimator architecture. levels : str, list, default None Name(s) of the time series to be predicted. It must be included in `levels`, defined when initializing the forecaster. If `None`, all all series used during training will be available for prediction. last_window : pandas DataFrame, default `None` Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. suppress_warnings : bool, default `False` If `True`, skforecast warnings will be suppressed during the fitting process. See skforecast.exceptions.warn_skforecast_categories for more information. check_inputs : bool, default True If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed. Returns ------- predictions : pandas DataFrame Predicted values. """set_skforecast_warnings(suppress_warnings,action="ignore")(X,_,steps,levels,prediction_index)=self._create_predict_inputs(steps=steps,levels=levels,last_window=last_window,exog=exog,check_inputs=check_inputs)predictions=self.estimator.predict(X[0]ifnotself.exog_in_elseX,verbose=0)predictions=np.reshape(predictions,(predictions.shape[1],predictions.shape[2]))[np.array(steps)-1]fori,levelinenumerate(self.levels):# NOTE: The inverse transformation is applied only if the level# is included in the levels to predict.iflevelinlevels:predictions[:,i]=transform_numpy(array=predictions[:,i],transformer=self.transformer_series_[level],fit=False,inverse_transform=True)n_steps,n_levels=predictions.shapepredictions=pd.DataFrame({"level":np.tile(self.levels,n_steps),"pred":predictions.ravel()},index=np.repeat(prediction_index,n_levels),)predictions=predictions[predictions['level'].isin(levels)]set_skforecast_warnings(suppress_warnings,action="default")returnpredictions
Name(s) of the time series to be predicted. It must be included
in levels, defined when initializing the forecaster. If None, all
all series used during training will be available for prediction.
None
last_window
pandas Series, pandas DataFrame
Series values used to create the predictors (lags) needed in the
first iteration of the prediction (t + 1).
If last_window = None, the values stored inself.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
True
Returns:
Name
Type
Description
predictions
pandas DataFrame
Values predicted by the forecaster and their estimated interval.
def_predict_interval_conformal(self,steps:int|list[int]|None=None,levels:str|list[str]|None=None,last_window:pd.Series|pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|None=None,nominal_coverage:float=0.95,use_in_sample_residuals:bool=True)->pd.DataFrame:""" Generate prediction intervals using the conformal prediction split method [1]_. Parameters ---------- steps : int, list, default None Predict n steps. The value of `steps` must be less than or equal to the value of steps defined in the estimator architecture. - If `int`: Only steps within the range of 1 to int are predicted. - If `list`: List of ints. Only the steps contained in the list are predicted. - If `None`: As many steps are predicted as defined in the estimator architecture. levels : str, list, default None Name(s) of the time series to be predicted. It must be included in `levels`, defined when initializing the forecaster. If `None`, all all series used during training will be available for prediction. last_window : pandas Series, pandas DataFrame, default None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in` self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. nominal_coverage : float, default 0.95 Nominal coverage, also known as expected coverage, of the prediction intervals. Must be between 0 and 1. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. Returns ------- predictions : pandas DataFrame Values predicted by the forecaster and their estimated interval. - pred: predictions. - lower_bound: lower bound of the interval. - upper_bound: upper bound of the interval. References ---------- .. [1] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method """(X,_,steps,levels,prediction_index)=self._create_predict_inputs(steps=steps,levels=levels,last_window=last_window,exog=exog,predict_probabilistic=True,use_in_sample_residuals=use_in_sample_residuals)ifuse_in_sample_residuals:residuals=self.in_sample_residuals_else:residuals=self.out_sample_residuals_predictions=self.estimator.predict(X[0]ifnotself.exog_in_elseX,verbose=0)predictions=np.reshape(predictions,(predictions.shape[1],predictions.shape[2]))[np.array(steps)-1]n_steps=len(steps)n_levels=len(self.levels)correction_factor=np.full(shape=(n_steps,n_levels),fill_value=np.nan,order='C',dtype=float)fori,levelinenumerate(self.levels):# NOTE: The correction factor is calculated only for the levels# included in the levels to predict.iflevelinlevels:correction_factor[:,i]=np.quantile(np.abs(residuals[level]),nominal_coverage)else:correction_factor[:,i]=0.lower_bound=predictions-correction_factorupper_bound=predictions+correction_factor# NOTE: Create a 3D array with shape (n_levels, intervals, steps)predictions=np.array([predictions,lower_bound,upper_bound]).swapaxes(0,2)fori,levelinenumerate(self.levels):# NOTE: The inverse transformation is applied only if the level# is included in the levels to predict.iflevelinlevels:transformer_level=self.transformer_series_[level]iftransformer_levelisnotNone:predictions[i,:,:]=np.apply_along_axis(func1d=transform_numpy,axis=0,arr=predictions[i,:,:],transformer=transformer_level,fit=False,inverse_transform=True)predictions=pd.DataFrame(data=predictions.swapaxes(0,1).reshape(-1,3),index=np.repeat(prediction_index,len(self.levels)),columns=["pred","lower_bound","upper_bound"])predictions.insert(0,'level',np.tile(self.levels,n_steps))predictions=predictions[predictions['level'].isin(levels)]returnpredictions
Name(s) of the time series to be predicted. It must be included
in levels, defined when initializing the forecaster. If None, all
all series used during training will be available for prediction.
None
last_window
pandas DataFrame
Series values used to create the predictors (lags) needed in the
first iteration of the prediction (t + 1).
If last_window = None, the values stored in self.last_window_ are
used to calculate the initial predictors, and the predictions start
right after training data.
Confidence level of the prediction interval. Interpretation depends
on the method used:
If float, represents the nominal (expected) coverage (between 0
and 1). For instance, interval=0.95 corresponds to [2.5, 97.5]
percentiles.
If list or tuple, defines the exact percentiles to compute, which
must be between 0 and 100 inclusive. For example, interval
of 95% should be as interval = [2.5, 97.5].
When using method='conformal', the interval must be a float or
a list/tuple defining a symmetric interval.
If True, residuals from the training data are used as proxy of
prediction error to create predictions.
If False, out of sample residuals (calibration) are used.
Out-of-sample residuals must be precomputed using Forecaster's
set_out_sample_residuals() method.
If True, skforecast warnings will be suppressed during the prediction
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
n_boot
Ignored
Not used, present here for API consistency by convention.
None
use_binned_residuals
Ignored
Not used, present here for API consistency by convention.
None
random_state
Ignored
Not used, present here for API consistency by convention.
None
Returns:
Name
Type
Description
predictions
pandas DataFrame
Long-format DataFrame with the predictions and the lower and upper
bounds of the estimated interval. The columns are level, pred,
lower_bound, upper_bound.
defpredict_interval(self,steps:int|list[int]|None=None,levels:str|list[str]|None=None,last_window:pd.DataFrame|None=None,exog:pd.Series|pd.DataFrame|None=None,method:str='conformal',interval:float|list[float]|tuple[float]=[5,95],use_in_sample_residuals:bool=True,suppress_warnings:bool=False,n_boot:Any=None,use_binned_residuals:Any=None,random_state:Any=None,)->pd.DataFrame:""" Predict n steps ahead and estimate prediction intervals using conformal prediction method. Refer to the References section for additional details. Parameters ---------- steps : int, list, default None Predict n steps. The value of `steps` must be less than or equal to the value of steps defined in the estimator architecture. - If `int`: Only steps within the range of 1 to int are predicted. - If `list`: List of ints. Only the steps contained in the list are predicted. - If `None`: As many steps are predicted as defined in the estimator architecture. levels : str, list, default None Name(s) of the time series to be predicted. It must be included in `levels`, defined when initializing the forecaster. If `None`, all all series used during training will be available for prediction. last_window : pandas DataFrame, default None Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data. exog : pandas Series, pandas DataFrame, dict, default None Exogenous variable/s included as predictor/s. method : str, default 'conformal' Employs the conformal prediction split method for interval estimation [1]_. interval : float, list, tuple, default [5, 95] Confidence level of the prediction interval. Interpretation depends on the method used: - If `float`, represents the nominal (expected) coverage (between 0 and 1). For instance, `interval=0.95` corresponds to `[2.5, 97.5]` percentiles. - If `list` or `tuple`, defines the exact percentiles to compute, which must be between 0 and 100 inclusive. For example, interval of 95% should be as `interval = [2.5, 97.5]`. - When using `method='conformal'`, the interval must be a float or a list/tuple defining a symmetric interval. use_in_sample_residuals : bool, default True If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information. n_boot : Ignored Not used, present here for API consistency by convention. use_binned_residuals : Ignored Not used, present here for API consistency by convention. random_state : Ignored Not used, present here for API consistency by convention. Returns ------- predictions : pandas DataFrame Long-format DataFrame with the predictions and the lower and upper bounds of the estimated interval. The columns are `level`, `pred`, `lower_bound`, `upper_bound`. References ---------- .. [1] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method """set_skforecast_warnings(suppress_warnings,action='ignore')ifmethod=="conformal":ifisinstance(interval,(list,tuple)):check_interval(interval=interval,ensure_symmetric_intervals=True)nominal_coverage=(interval[1]-interval[0])/100else:check_interval(alpha=interval,alpha_literal='interval')nominal_coverage=intervalpredictions=self._predict_interval_conformal(steps=steps,levels=levels,last_window=last_window,exog=exog,nominal_coverage=nominal_coverage,use_in_sample_residuals=use_in_sample_residuals)else:raiseValueError(f"Invalid `method` '{method}'. Only 'conformal' is available.")set_skforecast_warnings(suppress_warnings,action='default')returnpredictions
defplot_history(self,ax:matplotlib.axes.Axes=None,exclude_first_iteration:bool=False,**fig_kw,)->matplotlib.figure.Figure:""" Plots the training and validation loss curves from the given history object stored in the ForecasterRnn. Parameters ---------- ax : matplotlib.axes.Axes, default `None` Pre-existing ax for the plot. Otherwise, call matplotlib.pyplot.subplots() internally. exclude_first_iteration : bool, default `False` Whether to exclude the first epoch from the plot. fig_kw : dict Other keyword arguments are passed to matplotlib.pyplot.subplots(). Returns ------- fig: matplotlib.figure.Figure Matplotlib Figure. """ifaxisNone:fig,ax=plt.subplots(1,1,**fig_kw)else:fig=ax.get_figure()# Setting up the plot styleifself.history_isNone:raiseValueError("ForecasterRnn has not been fitted yet.")# Determine the range of epochs to plot, excluding the first one if specifiedepoch_range=range(1,len(self.history_["loss"])+1)ifexclude_first_iteration:epoch_range=range(2,len(self.history_["loss"])+1)# Plotting training lossax.plot(epoch_range,self.history_["loss"][exclude_first_iteration:],# Skip first element if specifiedcolor="b",label="Training Loss",)# Plotting validation lossif"val_loss"inself.history_:ax.plot(epoch_range,self.history_["val_loss"][exclude_first_iteration:],# Skip first element if specifiedcolor="r",label="Validation Loss",)# Labeling the axes and adding a titleax.set_xlabel("Epochs")ax.set_ylabel("Loss")ax.set_title("Training and Validation Loss")# Adding a legendax.legend()# Displaying grid for better readabilityax.grid(True,linestyle="--",alpha=0.7)# Setting x-axis ticks to integers onlyax.set_xticks(epoch_range)returnfig
Set new values to the parameters of the scikit-learn model stored in the
forecaster. It is important to note that all models share the same
configuration of parameters and hyperparameters.
defset_params(self,params:dict)->None:""" Set new values to the parameters of the scikit-learn model stored in the forecaster. It is important to note that all models share the same configuration of parameters and hyperparameters. Parameters ---------- params : dict Parameters values. Returns ------- None """self.estimator=clone(self.estimator)self.estimator.reset_states()self.estimator.compile(**params)
defset_fit_kwargs(self,fit_kwargs:dict)->None:""" Set new values for the additional keyword arguments passed to the `fit` method of the estimator. Parameters ---------- fit_kwargs : dict Dict of the form {"argument": new_value}. Returns ------- None """self.series_val=Noneself.exog_val=Noneif"series_val"infit_kwargs:ifnotisinstance(fit_kwargs["series_val"],pd.DataFrame):raiseTypeError(f"`series_val` must be a pandas DataFrame. "f"Got {type(fit_kwargs['series_val'])}.")self.series_val=fit_kwargs.pop("series_val")ifself.exog_in_:if"exog_val"notinfit_kwargs.keys():raiseValueError("If `series_val` is provided, `exog_val` must also be ""provided using the `fit_kwargs` argument when the ""estimator has exogenous variables.")else:ifnotisinstance(fit_kwargs["exog_val"],(pd.Series,pd.DataFrame)):raiseTypeError(f"`exog_val` must be a pandas Series or DataFrame. "f"Got {type(fit_kwargs['exog_val'])}.")self.exog_val=input_to_frame(data=fit_kwargs.pop("exog_val"),input_name='exog_val')self.fit_kwargs=check_select_fit_kwargs(self.estimator,fit_kwargs=fit_kwargs)
Set in-sample residuals in case they were not calculated during the
training process.
In-sample residuals are calculated as the difference between the true
values and the predictions made by the forecaster using the training
data. The following internal attributes are updated:
in_sample_residuals_: Dictionary containing a numpy ndarray with the
residuals for each series in the form {series: residuals}.
A total of 10_000 residuals are stored in the attribute in_sample_residuals_.
If the number of residuals is greater than 10_000, a random sample of
10_000 residuals is stored. The number of residuals stored per bin is
limited to 10_000 // self.binner.n_bins_.
If True, skforecast warnings will be suppressed during the sampling
process. See skforecast.exceptions.warn_skforecast_categories for more
information.
False
Returns:
Type
Description
None
Source code in skforecast/deep_learning/_forecaster_rnn.py
defset_in_sample_residuals(self,series:pd.DataFrame,exog:pd.Series|pd.DataFrame=None,random_state:int=123,suppress_warnings:bool=False)->None:""" Set in-sample residuals in case they were not calculated during the training process. In-sample residuals are calculated as the difference between the true values and the predictions made by the forecaster using the training data. The following internal attributes are updated: + `in_sample_residuals_`: Dictionary containing a numpy ndarray with the residuals for each series in the form `{series: residuals}`. A total of 10_000 residuals are stored in the attribute `in_sample_residuals_`. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. The number of residuals stored per bin is limited to `10_000 // self.binner.n_bins_`. Parameters ---------- series : pandas DataFrame Training time series. exog : pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. random_state : int, default 123 Sets a seed to the random sampling for reproducible output. suppress_warnings : bool, default False If `True`, skforecast warnings will be suppressed during the sampling process. See skforecast.exceptions.warn_skforecast_categories for more information. Returns ------- None """set_skforecast_warnings(suppress_warnings,action='ignore')ifnotself.is_fitted:raiseNotFittedError("This forecaster is not fitted yet. Call `fit` with appropriate ""arguments before using `set_in_sample_residuals()`.")ifnotisinstance(series,pd.DataFrame):raiseTypeError(f"`series` must be a pandas DataFrame. Got {type(series)}.")series_index_range=check_extract_values_and_index(data=series,data_label='`series`',return_values=False)[1][[0,-1]]ifnotseries_index_range.equals(self.training_range_):raiseIndexError(f"The index range of `series` does not match the range "f"used during training. Please ensure the index is aligned "f"with the training data.\n"f" Expected : {self.training_range_}\n"f" Received : {series_index_range}")(X_train,exog_train,y_train,dimension_names,*_)=self._create_train_X_y(series=series,exog=exog)ifexogisnotNone:X_train_features_names_out_=dimension_names["X_train"][1]+dimension_names["exog_train"][2]else:X_train_features_names_out_=dimension_names["X_train"][1]ifnotX_train_features_names_out_==self.X_train_features_names_out_:raiseValueError(f"Feature mismatch detected after matrix creation. The features "f"generated from the provided data do not match those used during "f"the training process. To correctly set in-sample residuals, "f"ensure that the same data and preprocessing steps are applied.\n"f" Expected output : {self.X_train_features_names_out_}\n"f" Current output : {X_train_features_names_out_}")# TODO: Include binning in the forecasterself.in_sample_residuals_={}residuals=y_train-self.estimator.predict(x=X_trainifexog_trainisNoneelse[X_train,exog_train],verbose=0)residuals=np.concatenate([residuals[:,i,:]fori,stepinenumerate(self.steps)])rng=np.random.default_rng(seed=random_state)fori,levelinenumerate(self.levels):residuals_level=residuals[:,i]iflen(residuals_level)>10_000:residuals_level=residuals_level[rng.integers(low=0,high=len(residuals_level),size=10_000)]self.in_sample_residuals_[level]=residuals_levelset_skforecast_warnings(suppress_warnings,action='default')
Set new values to the attribute out_sample_residuals_. Out of sample
residuals are meant to be calculated using observations that did not
participate in the training process. y_true and y_pred are expected
to be in the original scale of the time series. Residuals are calculated
as y_true - y_pred, after applying the necessary transformations and
differentiations if the forecaster includes them (self.transformer_series
and self.differentiation).
A total of 10_000 residuals are stored in the attribute out_sample_residuals_.
If the number of residuals is greater than 10_000, a random sample of
10_000 residuals is stored.
If True, new residuals are added to the once already stored in the
attribute out_sample_residuals_. If after appending the new residuals,
the limit of 10_000 samples is exceeded, a random sample of 10_000 is
kept.
defset_out_sample_residuals(self,y_true:dict[str,np.ndarray|pd.Series],y_pred:dict[str,np.ndarray|pd.Series],append:bool=False,random_state:int=123)->None:""" Set new values to the attribute `out_sample_residuals_`. Out of sample residuals are meant to be calculated using observations that did not participate in the training process. `y_true` and `y_pred` are expected to be in the original scale of the time series. Residuals are calculated as `y_true` - `y_pred`, after applying the necessary transformations and differentiations if the forecaster includes them (`self.transformer_series` and `self.differentiation`). A total of 10_000 residuals are stored in the attribute `out_sample_residuals_`. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. Parameters ---------- y_true : dict Dictionary of numpy ndarrays or pandas Series with the true values of the time series for each series in the form {series: y_true}. y_pred : dict Dictionary of numpy ndarrays or pandas Series with the predicted values of the time series for each series in the form {series: y_pred}. append : bool, default False If `True`, new residuals are added to the once already stored in the attribute `out_sample_residuals_`. If after appending the new residuals, the limit of 10_000 samples is exceeded, a random sample of 10_000 is kept. random_state : int, default 123 Sets a seed to the random sampling for reproducible output. Returns ------- None """ifnotself.is_fitted:raiseNotFittedError("This forecaster is not fitted yet. Call `fit` with appropriate ""arguments before using `set_out_sample_residuals()`.")ifnotisinstance(y_true,dict):raiseTypeError(f"`y_true` must be a dictionary of numpy ndarrays or pandas Series. "f"Got {type(y_true)}.")ifnotisinstance(y_pred,dict):raiseTypeError(f"`y_pred` must be a dictionary of numpy ndarrays or pandas Series. "f"Got {type(y_pred)}.")ifnotset(y_true.keys())==set(y_pred.keys()):raiseValueError(f"`y_true` and `y_pred` must have the same keys. "f"Got {set(y_true.keys())} and {set(y_pred.keys())}.")forkiny_true.keys():ifnotisinstance(y_true[k],(np.ndarray,pd.Series)):raiseTypeError(f"Values of `y_true` must be numpy ndarrays or pandas Series. "f"Got {type(y_true[k])} for series {k}.")ifnotisinstance(y_pred[k],(np.ndarray,pd.Series)):raiseTypeError(f"Values of `y_pred` must be numpy ndarrays or pandas Series. "f"Got {type(y_pred[k])} for series {k}.")iflen(y_true[k])!=len(y_pred[k]):raiseValueError(f"`y_true` and `y_pred` must have the same length. "f"Got {len(y_true[k])} and {len(y_pred[k])} for series {k}.")ifisinstance(y_true[k],pd.Series)andisinstance(y_pred[k],pd.Series):ifnoty_true[k].index.equals(y_pred[k].index):raiseValueError(f"When containing pandas Series, elements in `y_true` and "f"`y_pred` must have the same index. Error in series {k}.")series_to_update=set(y_pred.keys()).intersection(set(self.levels))ifnotseries_to_update:raiseValueError(f"Provided keys in `y_pred` and `y_true` do not match any of the "f"target time series in the forecaster, {self.levels}. Residuals "f"cannot be updated.")ifself.out_sample_residuals_isNone:self.out_sample_residuals_={level:Noneforlevelinself.levels}rng=np.random.default_rng(seed=random_state)forlevelinseries_to_update:y_true_level=deepcopy(y_true[level])y_pred_level=deepcopy(y_pred[level])ifnotisinstance(y_true_level,np.ndarray):y_true_level=y_true_level.to_numpy()ifnotisinstance(y_pred_level,np.ndarray):y_pred_level=y_pred_level.to_numpy()ifself.transformer_series:y_true_level=transform_numpy(array=y_true_level,transformer=self.transformer_series_[level],fit=False,inverse_transform=False)y_pred_level=transform_numpy(array=y_pred_level,transformer=self.transformer_series_[level],fit=False,inverse_transform=False)data=pd.DataFrame({'prediction':y_pred_level,'residuals':y_true_level-y_pred_level}).dropna()residuals=data['residuals'].to_numpy()out_sample_residuals=self.out_sample_residuals_.get(level,np.array([]))out_sample_residuals=(np.array([])ifout_sample_residualsisNoneelseout_sample_residuals)ifappend:out_sample_residuals=np.concatenate([out_sample_residuals,residuals])else:out_sample_residuals=residualsiflen(out_sample_residuals)>10_000:out_sample_residuals=rng.choice(a=out_sample_residuals,size=10_000,replace=False)self.out_sample_residuals_[level]=out_sample_residuals