Skip to content

ForecasterAutoregMultiSeries

ForecasterAutoregMultiSeries (ForecasterBase)

This class turns any regressor compatible with the scikit-learn API into a

recursive autoregressive (multi-step) forecaster for multiple series.

Parameters:

Name Type Description Default
regressor object

An instance of a regressor or pipeline compatible with the scikit-learn API.

required
lags Union[int, numpy.ndarray, list]

Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. int: include lags from 1 to lags (included). list, numpy ndarray or range: include only lags present in lags, all elements must be int.

required
transformer_series Union[object, dict]

An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API with methods: fit, transform, fit_transform and inverse_transform. If a single transformer is passed, it is cloned and applied to all series. If a dict, a different transformer can be used for each series. Transformation is applied to each series before training the forecaster. ColumnTransformers are not allowed since they do not have inverse_transform method.

None
transformer_exog Optional[object]

An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API. The transformation is applied to exog before training the forecaster. inverse_transform is not available when using ColumnTransformers.

None
weight_func Union[Callable, dict]

Function that defines the individual weights for each sample based on the index. For example, a function that assigns a lower weight to certain dates. If dict {'series_column_name' : Callable} a different function can be used for each series, a weight of 1 is given to all series not present in weight_func. Ignored if regressor does not have the argument sample_weight in its fit method. See Notes section for more details on the use of the weights. New in version 0.6.0

None
series_weights Optional[dict]

Weights associated with each series {'series_column_name' : float}. It is only applied if the regressor used accepts sample_weight in its fit method. If series_weights is provided, a weight of 1 is given to all series not present in series_weights. If None, all levels have the same weight. See Notes section for more details on the use of the weights. New in version 0.6.0

None
fit_kwargs Optional[dict]

Additional arguments to be passed to the fit method of the regressor. New in version 0.8.0

None
forecaster_id Union[str, int]

Name used as an identifier of the forecaster. New in version 0.7.0

None

Attributes:

Name Type Description
regressor regressor or pipeline compatible with the scikit-learn API

An instance of a regressor or pipeline compatible with the scikit-learn API.

lags numpy ndarray

Lags used as predictors.

transformer_series transformer (preprocessor) or dict of transformers, default `None`

An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API with methods: fit, transform, fit_transform and inverse_transform. If a single transformer is passed, it is cloned and applied to all series. If a dict, a different transformer can be used for each series. Transformation is applied to each series before training the forecaster. ColumnTransformers are not allowed since they do not have inverse_transform method.

transformer_series_ dict

Dictionary with the transformer for each series. It is created cloning the objects in transformer_series and is used internally to avoid overwriting.

transformer_exog transformer (preprocessor), default `None`

An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API. The transformation is applied to exog before training the forecaster. inverse_transform is not available when using ColumnTransformers.

weight_func Callable, dict, default `None`

Function that defines the individual weights of each sample based on the index. For example, a function that assigns a lower weight to certain dates. If dict {'series_column_name': Callable} a different function can be used for each series, a weight of 1 is given to all series not present in weight_func. Ignored if regressor does not have the argument sample_weight in its fit method. See Notes section for more details on the use of the weights. New in version 0.6.0

weight_func_ dict

Dictionary with the weight_func for each series. It is created cloning the objects in weight_func and is used internally to avoid overwriting. New in version 0.6.0

source_code_weight_func str, dict

Source code of the custom function(s) used to create weights. New in version 0.6.0

series_weights dict, default `None`

Weights associated with each series {'series_column_name': float}. It is only applied if the regressor used accepts sample_weight in its fit method. If series_weights is provided, a weight of 1 is given to all series not present in series_weights. If None, all levels have the same weight. See Notes section for more details on the use of the weights. New in version 0.6.0

series_weights_ dict

Weights associated with each series.It is created as a clone of series_weights and is used internally to avoid overwriting. New in version 0.6.0

max_lag int

Maximum value of lag included in lags.

window_size int

Size of the window needed to create the predictors. It is equal to max_lag.

last_window pandas Series

Last window seen by the forecaster during training. It stores the values needed to predict the next step immediately after the training data.

index_type type

Type of index of the input used in training.

index_freq str

Frequency of Index of the input used in training.

index_values pandas Index

Values of Index of the input used in training.

training_range pandas Index

First and last values of index of the data used during training.

included_exog bool

If the forecaster has been trained using exogenous variable/s.

exog_type type

Type of exogenous variable/s used in training.

exog_dtypes dict

Type of each exogenous variable/s used in training. If transformer_exog is used, the dtypes are calculated after the transformation.

exog_col_names list

Names of columns of exog if exog used in training was a pandas DataFrame.

series_col_names list

Names of the series (levels) used during training.

X_train_col_names list

Names of columns of the matrix created internally for training.

fit_kwargs dict

Additional arguments to be passed to the fit method of the regressor. New in version 0.8.0

in_sample_residuals dict

Residuals of the model when predicting training data. Only stored up to 1000 values in the form {level: residuals}. If transformer_series is not None, residuals are stored in the transformed scale.

out_sample_residuals dict

Residuals of the models when predicting non training data. Only stored up to 1000 values in the form {level: residuals}. If transformer_series is not None, residuals are assumed to be in the transformed scale. Use set_out_sample_residuals() method to set values.

fitted bool

Tag to identify if the regressor has been fitted (trained).

creation_date str

Date of creation.

fit_date str

Date of last fit.

skforcast_version str

Version of skforecast library used to create the forecaster.

python_version str

Version of python used to create the forecaster.

forecaster_id str, int default `None`

Name used as an identifier of the forecaster. New in version 0.7.0

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
class ForecasterAutoregMultiSeries(ForecasterBase):
    """
    This class turns any regressor compatible with the scikit-learn API into a
    recursive autoregressive (multi-step) forecaster for multiple series.

    Parameters
    ----------
    regressor : regressor or pipeline compatible with the scikit-learn API
        An instance of a regressor or pipeline compatible with the scikit-learn API.

    lags : int, list, numpy ndarray, range
        Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1.
            `int`: include lags from 1 to `lags` (included).
            `list`, `numpy ndarray` or `range`: include only lags present in `lags`,
            all elements must be int.

    transformer_series : transformer (preprocessor) or dict of transformers, default `None`
        An instance of a transformer (preprocessor) compatible with the scikit-learn
        preprocessing API with methods: fit, transform, fit_transform and inverse_transform.
        If a single transformer is passed, it is cloned and applied to all series. If a
        dict, a different transformer can be used for each series. Transformation is
        applied to each `series` before training the forecaster.
        ColumnTransformers are not allowed since they do not have inverse_transform method.

    transformer_exog : transformer, default `None`
        An instance of a transformer (preprocessor) compatible with the scikit-learn
        preprocessing API. The transformation is applied to `exog` before training the
        forecaster. `inverse_transform` is not available when using ColumnTransformers.

    weight_func : Callable, dict, default `None`
        Function that defines the individual weights for each sample based on the
        index. For example, a function that assigns a lower weight to certain dates.
        If dict {'series_column_name' : Callable} a different function can be
        used for each series, a weight of 1 is given to all series not present
        in `weight_func`. Ignored if `regressor` does not have the argument 
        `sample_weight` in its `fit` method. See Notes section for more details 
        on the use of the weights.
        **New in version 0.6.0**

    series_weights : dict, default `None`
        Weights associated with each series {'series_column_name' : float}. It is only
        applied if the `regressor` used accepts `sample_weight` in its `fit` method.
        If `series_weights` is provided, a weight of 1 is given to all series not present
        in `series_weights`. If `None`, all levels have the same weight. See Notes section
        for more details on the use of the weights.
        **New in version 0.6.0**

    fit_kwargs : dict, default `None`
        Additional arguments to be passed to the `fit` method of the regressor.
        **New in version 0.8.0**

    forecaster_id : str, int, default `None`
        Name used as an identifier of the forecaster.
        **New in version 0.7.0**


    Attributes
    ----------
    regressor : regressor or pipeline compatible with the scikit-learn API
        An instance of a regressor or pipeline compatible with the scikit-learn API.

    lags : numpy ndarray
        Lags used as predictors.

    transformer_series : transformer (preprocessor) or dict of transformers, default `None`
        An instance of a transformer (preprocessor) compatible with the scikit-learn
        preprocessing API with methods: fit, transform, fit_transform and inverse_transform.
        If a single transformer is passed, it is cloned and applied to all series. If a
        dict, a different transformer can be used for each series. Transformation is
        applied to each `series` before training the forecaster.
        ColumnTransformers are not allowed since they do not have inverse_transform method.

    transformer_series_ : dict
        Dictionary with the transformer for each series. It is created cloning the objects
        in `transformer_series` and is used internally to avoid overwriting.

    transformer_exog : transformer (preprocessor), default `None`
        An instance of a transformer (preprocessor) compatible with the scikit-learn
        preprocessing API. The transformation is applied to `exog` before training the
        forecaster. `inverse_transform` is not available when using ColumnTransformers.

    weight_func : Callable, dict, default `None`
        Function that defines the individual weights of each sample based on the
        index. For example, a function that assigns a lower weight to certain dates.
        If dict {'series_column_name': Callable} a different function can be
        used for each series, a weight of 1 is given to all series not present
        in `weight_func`. Ignored if `regressor` does not have the argument 
        `sample_weight` in its `fit` method. See Notes section for more details 
        on the use of the weights.
        **New in version 0.6.0**

    weight_func_ : dict
        Dictionary with the `weight_func` for each series. It is created cloning the objects
        in `weight_func` and is used internally to avoid overwriting.
        **New in version 0.6.0**

    source_code_weight_func : str, dict
        Source code of the custom function(s) used to create weights.
        **New in version 0.6.0**

    series_weights : dict, default `None`
        Weights associated with each series {'series_column_name': float}. It is only
        applied if the `regressor` used accepts `sample_weight` in its `fit` method.
        If `series_weights` is provided, a weight of 1 is given to all series not present
        in `series_weights`. If `None`, all levels have the same weight. See Notes section
        for more details on the use of the weights.
        **New in version 0.6.0**

    series_weights_ : dict
        Weights associated with each series.It is created as a clone of `series_weights`
        and is used internally to avoid overwriting.
        **New in version 0.6.0**

    max_lag : int
        Maximum value of lag included in `lags`.

    window_size : int
        Size of the window needed to create the predictors. It is equal to
        `max_lag`.

    last_window : pandas Series
        Last window seen by the forecaster during training. It stores the values 
        needed to predict the next `step` immediately after the training data.

    index_type : type
        Type of index of the input used in training.

    index_freq : str
        Frequency of Index of the input used in training.

    index_values : pandas Index
        Values of Index of the input used in training.

    training_range: pandas Index
        First and last values of index of the data used during training.

    included_exog : bool
        If the forecaster has been trained using exogenous variable/s.

    exog_type : type
        Type of exogenous variable/s used in training.

    exog_dtypes : dict
        Type of each exogenous variable/s used in training. If `transformer_exog` 
        is used, the dtypes are calculated after the transformation.

    exog_col_names : list
        Names of columns of `exog` if `exog` used in training was a pandas
        DataFrame.

    series_col_names : list
        Names of the series (levels) used during training.

    X_train_col_names : list
        Names of columns of the matrix created internally for training.

    fit_kwargs : dict
        Additional arguments to be passed to the `fit` method of the regressor.
        **New in version 0.8.0**

    in_sample_residuals : dict
        Residuals of the model when predicting training data. Only stored up to
        1000 values in the form `{level: residuals}`. If `transformer_series` 
        is not `None`, residuals are stored in the transformed scale.

    out_sample_residuals : dict
        Residuals of the models when predicting non training data. Only stored
        up to 1000 values in the form `{level: residuals}`. If `transformer_series` 
        is not `None`, residuals are assumed to be in the transformed scale. Use 
        `set_out_sample_residuals()` method to set values.

    fitted : bool
        Tag to identify if the regressor has been fitted (trained).

    creation_date : str
        Date of creation.

    fit_date : str
        Date of last fit.

    skforcast_version : str
        Version of skforecast library used to create the forecaster.

    python_version : str
        Version of python used to create the forecaster.

    forecaster_id : str, int default `None`
        Name used as an identifier of the forecaster.
        **New in version 0.7.0**


    Notes
    -----

    The weights are used to control the influence that each observation has on the
    training of the model. `ForecasterAutoregMultiseries` accepts two types of weights:

    + series_weights : controls the relative importance of each series. If a series has
    twice as much weight as the others, the observations of that series influence the
    training twice as much. The higher the weight of a series relative to the others,
    the more the model will focus on trying to learn that series.

    + weight_func : controls the relative importance of each observation according to its
    index value. For example, a function that assigns a lower weight to certain dates.

    If the two types of weights are indicated, they are multiplied to create the final
    weights. The resulting `sample_weight` cannot have negative values.

    """

    def __init__(
        self,
        regressor: object,
        lags: Union[int, np.ndarray, list],
        transformer_series: Optional[Union[object, dict]]=None,
        transformer_exog: Optional[object]=None,
        weight_func: Optional[Union[Callable, dict]]=None,
        series_weights: Optional[dict]=None,
        fit_kwargs: Optional[dict]=None,
        forecaster_id: Optional[Union[str, int]]=None
    ) -> None:

        self.regressor               = regressor
        self.transformer_series      = transformer_series
        self.transformer_series_     = None
        self.transformer_exog        = transformer_exog
        self.weight_func             = weight_func
        self.weight_func_            = None
        self.source_code_weight_func = None
        self.series_weights          = series_weights
        self.series_weights_         = None
        self.index_type              = None
        self.index_freq              = None
        self.index_values            = None
        self.training_range          = None
        self.last_window             = None
        self.included_exog           = False
        self.exog_type               = None
        self.exog_dtypes             = None
        self.exog_col_names          = None
        self.series_col_names        = None
        self.X_train_col_names       = None
        self.in_sample_residuals     = None
        self.out_sample_residuals    = None
        self.fitted                  = False
        self.creation_date           = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
        self.fit_date                = None
        self.skforcast_version       = skforecast.__version__
        self.python_version          = sys.version.split(" ")[0]
        self.forecaster_id           = forecaster_id

        self.lags = initialize_lags(type(self).__name__, lags)
        self.max_lag = max(self.lags)
        self.window_size = self.max_lag

        self.weight_func, self.source_code_weight_func, self.series_weights = initialize_weights(
            forecaster_name = type(self).__name__, 
            regressor       = regressor, 
            weight_func     = weight_func, 
            series_weights  = series_weights
        )

        self.fit_kwargs = check_select_fit_kwargs(
                              regressor  = regressor,
                              fit_kwargs = fit_kwargs
                          )


    def __repr__(
        self
    ) -> str:
        """
        Information displayed when a ForecasterAutoregMultiSeries object is printed.
        """

        if isinstance(self.regressor, sklearn.pipeline.Pipeline):
            name_pipe_steps = tuple(name + "__" for name in self.regressor.named_steps.keys())
            params = {key : value for key, value in self.regressor.get_params().items() \
                      if key.startswith(name_pipe_steps)}
        else:
            params = self.regressor.get_params()

        info = (
            f"{'=' * len(type(self).__name__)} \n"
            f"{type(self).__name__} \n"
            f"{'=' * len(type(self).__name__)} \n"
            f"Regressor: {self.regressor} \n"
            f"Lags: {self.lags} \n"
            f"Transformer for series: {self.transformer_series} \n"
            f"Transformer for exog: {self.transformer_exog} \n"
            f"Window size: {self.window_size} \n"
            f"Series levels (names): {self.series_col_names} \n"
            f"Series weights: {self.series_weights} \n"
            f"Weight function included: {True if self.weight_func is not None else False} \n"
            f"Exogenous included: {self.included_exog} \n"
            f"Type of exogenous variable: {self.exog_type} \n"
            f"Exogenous variables names: {self.exog_col_names} \n"
            f"Training range: {self.training_range.to_list() if self.fitted else None} \n"
            f"Training index type: {str(self.index_type).split('.')[-1][:-2] if self.fitted else None} \n"
            f"Training index frequency: {self.index_freq if self.fitted else None} \n"
            f"Regressor parameters: {params} \n"
            f"fit_kwargs: {self.fit_kwargs} \n"
            f"Creation date: {self.creation_date} \n"
            f"Last fit date: {self.fit_date} \n"
            f"Skforecast version: {self.skforcast_version} \n"
            f"Python version: {self.python_version} \n"
            f"Forecaster id: {self.forecaster_id} \n"
        )

        return info


    def _create_lags(
        self, 
        y: np.ndarray
    ) -> Tuple[np.ndarray, np.ndarray]:
        """       
        Transforms a 1d array into a 2d array (X) and a 1d array (y). Each row
        in X is associated with a value of y and it represents the lags that
        precede it.

        Notice that, the returned matrix X_data, contains the lag 1 in the first
        column, the lag 2 in the second column and so on.

        Parameters
        ----------        
        y : 1d numpy ndarray
            Training time series.

        Returns 
        -------
        X_data : 2d numpy ndarray, shape (samples - max(self.lags), len(self.lags))
            2d numpy array with the lagged values (predictors).

        y_data : 1d numpy ndarray, shape (samples - max(self.lags),)
            Values of the time series related to each row of `X_data`.

        """

        n_splits = len(y) - self.max_lag
        if n_splits <= 0:
            raise ValueError(
                f"The maximum lag ({self.max_lag}) must be less than the length "
                f"of the series ({len(y)})."
            )

        X_data = np.full(shape=(n_splits, len(self.lags)), fill_value=np.nan, dtype=float)

        for i, lag in enumerate(self.lags):
            X_data[:, i] = y[self.max_lag - lag: -lag]

        y_data = y[self.max_lag:]

        return X_data, y_data


    def create_train_X_y(
        self,
        series: pd.DataFrame,
        exog: Optional[Union[pd.Series, pd.DataFrame]]=None
    ) -> Tuple[pd.DataFrame, pd.Series, pd.Index, pd.Index]:
        """
        Create training matrices from multiple time series and exogenous
        variables.

        Parameters
        ----------        
        series : pandas DataFrame
            Training time series.

        exog : pandas Series, pandas DataFrame, default `None`
            Exogenous variable/s included as predictor/s. Must have the same
            number of observations as `series` and their indexes must be aligned.

        Returns 
        -------
        X_train : pandas DataFrame
            Pandas DataFrame with the training values (predictors).

        y_train : pandas Series, shape (len(series) - self.max_lag, )
            Values (target) of the time series related to each row of `X_train`.

        y_index : pandas Index
            Index of `series`.

        y_train_index: pandas Index
            Index of `y_train`.

        """

        if not isinstance(series, pd.DataFrame):
            raise TypeError(f"`series` must be a pandas DataFrame. Got {type(series)}.")

        series_col_names = list(series.columns)

        if self.transformer_series is None:
            self.transformer_series_ = {serie: None for serie in series_col_names}
        elif not isinstance(self.transformer_series, dict):
            self.transformer_series_ = {serie: clone(self.transformer_series) 
                                        for serie in series_col_names}
        else:
            self.transformer_series_ = {serie: None for serie in series_col_names}
            # Only elements already present in transformer_series_ are updated
            self.transformer_series_.update(
                (k, v) for k, v in deepcopy(self.transformer_series).items() if k in self.transformer_series_
            )
            series_not_in_transformer_series = set(series.columns) - set(self.transformer_series.keys())
            if series_not_in_transformer_series:
                warnings.warn(
                    (f"{series_not_in_transformer_series} not present in `transformer_series`."
                     f" No transformation is applied to these series."),
                     IgnoredArgumentWarning
                )

        if exog is not None:
            if len(exog) != len(series):
                raise ValueError(
                    (f"`exog` must have same number of samples as `series`. "
                     f"length `exog`: ({len(exog)}), length `series`: ({len(series)})")
                )
            check_exog(exog=exog, allow_nan=True)
            if isinstance(exog, pd.Series):
                exog = transform_series(
                           series            = exog,
                           transformer       = self.transformer_exog,
                           fit               = True,
                           inverse_transform = False
                       )
            else:
                exog = transform_dataframe(
                           df                = exog,
                           transformer       = self.transformer_exog,
                           fit               = True,
                           inverse_transform = False
                       )

            check_exog(exog=exog, allow_nan=False)
            check_exog_dtypes(exog)
            self.exog_dtypes = get_exog_dtypes(exog=exog)

            _, exog_index = preprocess_exog(exog=exog, return_values=False)
            if not (exog_index[:len(series.index)] == series.index).all():
                raise ValueError(
                    ("Different index for `series` and `exog`. They must be equal "
                     "to ensure the correct alignment of values.")
                )

        X_levels = []
        X_train_col_names = [f"lag_{lag}" for lag in self.lags]

        for i, serie in enumerate(series.columns):

            y = series[serie]
            check_y(y=y)
            y = transform_series(
                    series            = y,
                    transformer       = self.transformer_series_[serie],
                    fit               = True,
                    inverse_transform = False
                )

            y_values, y_index = preprocess_y(y=y)
            X_train_values, y_train_values = self._create_lags(y=y_values)

            if i == 0:
                X_train = X_train_values
                y_train = y_train_values
            else:
                X_train = np.vstack((X_train, X_train_values))
                y_train = np.append(y_train, y_train_values)

            X_level = [serie]*len(X_train_values)
            X_levels.extend(X_level)

        X_levels = pd.Series(X_levels)
        X_levels = pd.get_dummies(X_levels, dtype=float)

        X_train = pd.DataFrame(
                      data    = X_train,
                      columns = X_train_col_names
                  )

        if exog is not None:
            # The first `self.max_lag` positions have to be removed from exog
            # since they are not in X_train. Then exog is cloned as many times
            # as series.
            exog_to_train = exog.iloc[self.max_lag:, ]
            exog_to_train = pd.concat([exog_to_train]*len(series_col_names)).reset_index(drop=True)
        else:
            exog_to_train = None

        X_train = pd.concat([X_train, exog_to_train, X_levels], axis=1)
        self.X_train_col_names = X_train.columns.to_list()

        y_train = pd.Series(
                      data = y_train,
                      name = 'y'
                  )

        y_train_index = pd.Index(
                            np.tile(
                                y_index[self.max_lag: ].to_numpy(),
                                reps = len(series_col_names)
                            )
                        )

        return X_train, y_train, y_index, y_train_index


    def create_sample_weights(
        self,
        series: pd.DataFrame,
        X_train: pd.DataFrame,
        y_train_index: pd.Index,
    )-> np.ndarray:
        """
        Crate weights for each observation according to the forecaster's attributes
        `series_weights` and `weight_func`. The resulting weights are product of both
        types of weights.

        Parameters
        ----------
        series : pandas DataFrame
            Time series used to create `X_train` with the method `create_train_X_y`.
        X_train : pandas DataFrame
            Dataframe generated with the method `create_train_X_y`, first return.
        y_train_index : pandas Index
            Index of `y_train` generated with the method `create_train_X_y`, fourth return.

        Returns
        -------
        weights : numpy ndarray
            Weights to use in `fit` method.

        """

        weights = None
        weights_samples = None
        weights_series = None

        if self.series_weights is not None:
            # Series not present in series_weights have a weight of 1 in all their samples.
            # Keys in series_weights not present in series are ignored.
            series_not_in_series_weights = set(series.columns) - set(self.series_weights.keys())
            if series_not_in_series_weights:
                warnings.warn(
                    (f"{series_not_in_series_weights} not present in `series_weights`. "
                     f"A weight of 1 is given to all their samples."),
                     IgnoredArgumentWarning
                )
            self.series_weights_ = {col: 1. for col in series.columns}
            self.series_weights_.update((k, v) for k, v in self.series_weights.items() if k in self.series_weights_)
            weights_series = [np.repeat(self.series_weights_[serie], sum(X_train[serie])) 
                              for serie in series.columns]
            weights_series = np.concatenate(weights_series)

        if self.weight_func is not None:
            if isinstance(self.weight_func, Callable):
                self.weight_func_ = {col: copy(self.weight_func) for col in series.columns}
            else:
                # Series not present in weight_func have a weight of 1 in all their samples
                series_not_in_weight_func = set(series.columns) - set(self.weight_func.keys())
                if series_not_in_weight_func:
                    warnings.warn(
                        (f"{series_not_in_weight_func} not present in `weight_func`. "
                         f"A weight of 1 is given to all their samples."),
                         IgnoredArgumentWarning
                    )
                self.weight_func_ = {col: lambda x: np.ones_like(x, dtype=float) for col in series.columns}
                self.weight_func_.update((k, v) for k, v in self.weight_func.items() if k in self.weight_func_)

            weights_samples = []
            for key in self.weight_func_.keys():
                idx = y_train_index[X_train[X_train[key] == 1.0].index]
                weights_samples.append(self.weight_func_[key](idx))
            weights_samples = np.concatenate(weights_samples)

        if weights_series is not None:
            weights = weights_series
            if weights_samples is not None:
                weights = weights * weights_samples
        else:
            if weights_samples is not None:
                weights = weights_samples

        if weights is not None:
            if np.isnan(weights).any():
                raise ValueError(
                    "The resulting `weights` cannot have NaN values."
                )
            if np.any(weights < 0):
                raise ValueError(
                    "The resulting `weights` cannot have negative values."
                )
            if np.sum(weights) == 0:
                raise ValueError(
                    ("The resulting `weights` cannot be normalized because "
                     "the sum of the weights is zero.")
                )

        return weights


    def fit(
        self,
        series: pd.DataFrame,
        exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
        store_in_sample_residuals: bool=True
    ) -> None:
        """
        Training Forecaster.

        Parameters
        ----------        
        series : pandas DataFrame
            Training time series.

        exog : pandas Series, pandas DataFrame, default `None`
            Exogenous variable/s included as predictor/s. Must have the same
            number of observations as `series` and their indexes must be aligned so
            that series[i] is regressed on exog[i].

        store_in_sample_residuals : bool, default `True`
            if True, in_sample_residuals are stored.

        Returns 
        -------
        None

        """

        # Reset values in case the forecaster has already been fitted.
        self.index_type          = None
        self.index_freq          = None
        self.index_values        = None
        self.last_window         = None
        self.included_exog       = False
        self.exog_type           = None
        self.exog_dtypes         = None
        self.exog_col_names      = None
        self.series_col_names    = None
        self.X_train_col_names   = None
        self.in_sample_residuals = None
        self.fitted              = False
        self.training_range      = None

        self.series_col_names = list(series.columns)        

        if exog is not None:
            self.included_exog = True
            self.exog_type = type(exog)
            self.exog_col_names = \
                 exog.columns.to_list() if isinstance(exog, pd.DataFrame) else [exog.name]

            if len(set(self.exog_col_names) - set(self.series_col_names)) != len(self.exog_col_names):
                raise ValueError(
                    (f"`exog` cannot contain a column named the same as one of the series"
                     f" (column names of series).\n"
                     f"    `series` columns : {self.series_col_names}.\n"
                     f"    `exog`   columns : {self.exog_col_names}.")
                )

        X_train, y_train, y_index, y_train_index = self.create_train_X_y(series=series, exog=exog)
        sample_weight = self.create_sample_weights(
                            series        = series,
                            X_train       = X_train,
                            y_train_index = y_train_index,
                        )

        if sample_weight is not None:
            self.regressor.fit(
                X             = X_train,
                y             = y_train,
                sample_weight = sample_weight,
                **self.fit_kwargs
            )
        else:
            self.regressor.fit(X=X_train, y=y_train, **self.fit_kwargs)

        self.fitted = True
        self.fit_date = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
        self.training_range = y_index[[0, -1]]
        self.index_type = type(y_index)
        if isinstance(y_index, pd.DatetimeIndex):
            self.index_freq = y_index.freqstr
        else: 
            self.index_freq = y_index.step
        self.index_values = y_index

        in_sample_residuals = {}

        # This is done to save time during fit in functions such as backtesting()
        if store_in_sample_residuals:

            residuals = y_train - self.regressor.predict(X_train)

            for serie in series.columns:
                in_sample_residuals[serie] = residuals.loc[X_train[serie] == 1.].to_numpy()
                if len(in_sample_residuals[serie]) > 1000:
                    # Only up to 1000 residuals are stored
                    rng = np.random.default_rng(seed=123)
                    in_sample_residuals[serie] = rng.choice(
                                                     a       = in_sample_residuals[serie], 
                                                     size    = 1000, 
                                                     replace = False
                                                 )
        else:
            for serie in series.columns:
                in_sample_residuals[serie] = None

        self.in_sample_residuals = in_sample_residuals

        # The last time window of training data is stored so that lags needed as
        # predictors in the first iteration of `predict()` can be calculated.
        self.last_window = series.iloc[-self.max_lag:, ].copy()


    def _recursive_predict(
        self,
        steps: int,
        level: str,
        last_window: np.ndarray,
        exog: Optional[np.ndarray]=None
    ) -> np.ndarray:
        """
        Predict n steps ahead. It is an iterative process in which, each prediction,
        is used as a predictor for the next step.

        Parameters
        ----------
        steps : int
            Number of future steps predicted.

        level : str
            Time series to be predicted.

        last_window : numpy ndarray
            Series values used to create the predictors (lags) needed in the 
            first iteration of the prediction (t + 1).

        exog : numpy ndarray, default `None`
            Exogenous variable/s included as predictor/s.

        Returns 
        -------
        predictions : numpy ndarray
            Predicted values.

        """

        predictions = np.full(shape=steps, fill_value=np.nan)

        for i in range(steps):
            X = last_window[-self.lags].reshape(1, -1)
            if exog is not None:
                X = np.column_stack((X, exog[i, ].reshape(1, -1)))

            levels_dummies = np.zeros(shape=(1, len(self.series_col_names)), dtype=float)
            levels_dummies[0][self.series_col_names.index(level)] = 1.

            X = np.column_stack((X, levels_dummies.reshape(1, -1)))

            with warnings.catch_warnings():
                # Suppress scikit-learn warning: "X does not have valid feature names,
                # but NoOpTransformer was fitted with feature names".
                warnings.simplefilter("ignore")
                prediction = self.regressor.predict(X)
                predictions[i] = prediction.ravel()[0]

            # Update `last_window` values. The first position is discarded and 
            # the new prediction is added at the end.
            last_window = np.append(last_window[1:], prediction)

        return predictions


    def predict(
        self,
        steps: int,
        levels: Optional[Union[str, list]]=None,
        last_window: Optional[pd.DataFrame]=None,
        exog: Optional[Union[pd.Series, pd.DataFrame]]=None
    ) -> pd.DataFrame:
        """
        Predict n steps ahead. It is an recursive process in which, each prediction,
        is used as a predictor for the next step.

        Parameters
        ----------
        steps : int
            Number of future steps predicted.

        levels : str, list, default `None`
            Time series to be predicted. If `None` all levels will be predicted.
            **New in version 0.6.0**

        last_window : pandas DataFrame, default `None`
            Series values used to create the predictors (lags) needed in the 
            first iteration of the prediction (t + 1).

            If `last_window = None`, the values stored in `self.last_window` are
            used to calculate the initial predictors, and the predictions start
            right after training data.

        exog : pandas Series, pandas DataFrame, default `None`
            Exogenous variable/s included as predictor/s.

        Returns
        -------
        predictions : pandas DataFrame
            Predicted values, one column for each level.

        """

        if levels is None:
            levels = self.series_col_names
        elif isinstance(levels, str):
            levels = [levels]

        if last_window is None:
            last_window = deepcopy(self.last_window)

        check_predict_input(
            forecaster_name  = type(self).__name__,
            steps            = steps,
            fitted           = self.fitted,
            included_exog    = self.included_exog,
            index_type       = self.index_type,
            index_freq       = self.index_freq,
            window_size      = self.window_size,
            last_window      = last_window,
            last_window_exog = None,
            exog             = exog,
            exog_type        = self.exog_type,
            exog_col_names   = self.exog_col_names,
            interval         = None,
            alpha            = None,
            max_steps        = None,
            levels           = levels,
            series_col_names = self.series_col_names
        )

        if exog is not None:
            if isinstance(exog, pd.DataFrame):
                exog = transform_dataframe(
                           df                = exog,
                           transformer       = self.transformer_exog,
                           fit               = False,
                           inverse_transform = False
                       )
            else:
                exog = transform_series(
                           series            = exog,
                           transformer       = self.transformer_exog,
                           fit               = False,
                           inverse_transform = False
                       )
            check_exog_dtypes(exog=exog)
            exog_values = exog.iloc[:steps, ].to_numpy()
        else:
            exog_values = None

        predictions = []

        for level in levels:

            last_window_level = transform_series(
                                    series            = last_window[level],
                                    transformer       = self.transformer_series_[level],
                                    fit               = False,
                                    inverse_transform = False
                                )
            last_window_values, last_window_index = preprocess_last_window(
                                                        last_window = last_window_level
                                                    )

            preds_level = self._recursive_predict(
                              steps       = steps,
                              level       = level,
                              last_window = copy(last_window_values),
                              exog        = copy(exog_values)
                          )

            preds_level = pd.Series(
                              data  = preds_level,
                              index = expand_index(
                                          index = last_window_index,
                                          steps = steps
                                      ),
                              name = level
                          )

            preds_level = transform_series(
                              series            = preds_level,
                              transformer       = self.transformer_series_[level],
                              fit               = False,
                              inverse_transform = True
                          )

            predictions.append(preds_level)    

        predictions = pd.concat(predictions, axis=1)

        return predictions


    def predict_bootstrapping(
        self,
        steps: int,
        levels: Optional[Union[str, list]]=None,
        last_window: Optional[pd.DataFrame]=None,
        exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
        n_boot: int=500,
        random_state: int=123,
        in_sample_residuals: bool=True
    ) -> dict:
        """
        Generate multiple forecasting predictions using a bootstrapping process. 
        By sampling from a collection of past observed errors (the residuals),
        each iteration of bootstrapping generates a different set of predictions. 
        See the Notes section for more information. 

        Parameters
        ----------   
        steps : int
            Number of future steps predicted.

        levels : str, list, default `None`
            Time series to be predicted. If `None` all levels will be predicted.

        last_window : pandas DataFrame, default `None`
            Series values used to create the predictors (lags) needed in the 
            first iteration of the prediction (t + 1).

            If `last_window = None`, the values stored in `self.last_window` are
            used to calculate the initial predictors, and the predictions start
            right after training data.

        exog : pandas Series, pandas DataFrame, default `None`
            Exogenous variable/s included as predictor/s.

        n_boot : int, default `500`
            Number of bootstrapping iterations used to estimate prediction
            intervals.

        random_state : int, default `123`
            Sets a seed to the random generator, so that boot intervals are always 
            deterministic.

        in_sample_residuals : bool, default `True`
            If `True`, residuals from the training data are used as proxy of
            prediction error to create prediction intervals. If `False`, out of
            sample residuals are used. In the latter case, the user should have
            calculated and stored the residuals within the forecaster (see
            `set_out_sample_residuals()`).

        Returns 
        -------
        boot_predictions : dict
            Predictions generated by bootstrapping for each level. 
            {level: pandas DataFrame, shape (steps, n_boot)}

        Notes
        -----
        More information about prediction intervals in forecasting:
        https://otexts.com/fpp3/prediction-intervals.html#prediction-intervals-from-bootstrapped-residuals
        Forecasting: Principles and Practice (3nd ed) Rob J Hyndman and George Athanasopoulos.

        """

        if levels is None:
            levels = self.series_col_names
        elif isinstance(levels, str):
            levels = [levels]

        if in_sample_residuals:
            if not set(levels).issubset(set(self.in_sample_residuals.keys())):
                raise ValueError(
                    (f"Not `forecaster.in_sample_residuals` for levels: "
                     f"{set(levels) - set(self.in_sample_residuals.keys())}.")
                )
            residuals_levels = self.in_sample_residuals
        else:
            if self.out_sample_residuals is None:
                raise ValueError(
                    ("`forecaster.out_sample_residuals` is `None`. Use "
                     "`in_sample_residuals=True` or method `set_out_sample_residuals()` "
                     "before `predict_interval()`, `predict_bootstrapping()` or "
                     "`predict_dist()`.")
                )
            else:
                if not set(levels).issubset(set(self.out_sample_residuals.keys())):
                    raise ValueError(
                        (f"Not `forecaster.out_sample_residuals` for levels: "
                         f"{set(levels) - set(self.out_sample_residuals.keys())}. "
                         f"Use method `set_out_sample_residuals()`.")
                    )
            residuals_levels = self.out_sample_residuals

        check_residuals = 'forecaster.in_sample_residuals' if in_sample_residuals else 'forecaster.out_sample_residuals'
        for level in levels:
            if residuals_levels[level] is None:
                raise ValueError(
                    (f"forecaster residuals for level '{level}' are `None`. Check `{check_residuals}`.")
                )
            elif (residuals_levels[level] == None).any():
                raise ValueError(
                    (f"forecaster residuals for level '{level}' contains `None` values. Check `{check_residuals}`.")
                )

        if last_window is None:
            last_window = deepcopy(self.last_window)

        check_predict_input(
            forecaster_name  = type(self).__name__,
            steps            = steps,
            fitted           = self.fitted,
            included_exog    = self.included_exog,
            index_type       = self.index_type,
            index_freq       = self.index_freq,
            window_size      = self.window_size,
            last_window      = last_window,
            last_window_exog = None,
            exog             = exog,
            exog_type        = self.exog_type,
            exog_col_names   = self.exog_col_names,
            interval         = None,
            alpha            = None,
            max_steps        = None,
            levels           = levels,
            series_col_names = self.series_col_names
        )

        if exog is not None:
            if isinstance(exog, pd.DataFrame):
                exog = transform_dataframe(
                           df                = exog,
                           transformer       = self.transformer_exog,
                           fit               = False,
                           inverse_transform = False
                       )
            else:
                exog = transform_series(
                           series            = exog,
                           transformer       = self.transformer_exog,
                           fit               = False,
                           inverse_transform = False
                       )
            exog_values = exog.iloc[:steps, ].to_numpy()
        else:
            exog_values = None

        boot_predictions = {}

        for level in levels:

            last_window_level = transform_series(
                                    series            = last_window[level],
                                    transformer       = self.transformer_series_[level],
                                    fit               = False,
                                    inverse_transform = False
                                )
            last_window_values, last_window_index = preprocess_last_window(
                                                        last_window = last_window_level
                                                    )

            level_boot_predictions = np.full(
                                         shape      = (steps, n_boot),
                                         fill_value = np.nan,
                                         dtype      = float
                                     )
            rng = np.random.default_rng(seed=random_state)
            seeds = rng.integers(low=0, high=10000, size=n_boot)

            residuals = residuals_levels[level]

            for i in range(n_boot):
                # In each bootstraping iteration the initial last_window and exog 
                # need to be restored.
                last_window_boot = last_window_values.copy()
                exog_boot = exog_values.copy() if exog is not None else None

                rng = np.random.default_rng(seed=seeds[i])
                sample_residuals = rng.choice(
                                       a       = residuals,
                                       size    = steps,
                                       replace = True
                                   )

                for step in range(steps):

                    prediction = self._recursive_predict(
                                     steps       = 1,
                                     level       = level,
                                     last_window = last_window_boot,
                                     exog        = exog_boot 
                                 )

                    prediction_with_residual = prediction + sample_residuals[step]
                    level_boot_predictions[step, i] = prediction_with_residual

                    last_window_boot = np.append(
                                           last_window_boot[1:],
                                           prediction_with_residual
                                       )
                    if exog is not None:
                        exog_boot = exog_boot[1:]

            level_boot_predictions = pd.DataFrame(
                                         data    = level_boot_predictions,
                                         index   = expand_index(last_window_index, steps=steps),
                                         columns = [f"pred_boot_{i}" for i in range(n_boot)]
                                     )

            if self.transformer_series_[level]:
                for col in level_boot_predictions.columns:
                    level_boot_predictions[col] = transform_series(
                                                      series            = level_boot_predictions[col],
                                                      transformer       = self.transformer_series_[level],
                                                      fit               = False,
                                                      inverse_transform = True
                                                  )

            boot_predictions[level] = level_boot_predictions

        return boot_predictions


    def predict_interval(
        self,
        steps: int,
        levels: Optional[Union[str, list]]=None,
        last_window: Optional[pd.DataFrame]=None,
        exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
        interval: list=[5, 95],
        n_boot: int=500,
        random_state: int=123,
        in_sample_residuals: bool=True
    ) -> pd.DataFrame:
        """
        Iterative process in which, each prediction, is used as a predictor
        for the next step and bootstrapping is used to estimate prediction
        intervals. Both predictions and intervals are returned.

        Parameters
        ---------- 
        steps : int
            Number of future steps predicted.

        levels : str, list, default `None`
            Time series to be predicted. If `None` all levels will be predicted.  
            **New in version 0.6.0**  

        last_window : pandas DataFrame, default `None`
            Series values used to create the predictors (lags) needed in the 
            first iteration of the prediction (t + 1).

            If `last_window = None`, the values stored in` self.last_window` are
            used to calculate the initial predictors, and the predictions start
            right after training data.

        exog : pandas Series, pandas DataFrame, default `None`
            Exogenous variable/s included as predictor/s.

        interval : list, default `[5, 95]`
            Confidence of the prediction interval estimated. Sequence of 
            percentiles to compute, which must be between 0 and 100 inclusive. 
            For example, interval of 95% should be as `interval = [2.5, 97.5]`.

        n_boot : int, default `500`
            Number of bootstrapping iterations used to estimate prediction
            intervals.

        random_state : int, default `123`
            Sets a seed to the random generator, so that boot intervals are always 
            deterministic.

        in_sample_residuals : bool, default `True`
            If `True`, residuals from the training data are used as proxy of
            prediction error to create prediction intervals. If `False`, out of
            sample residuals are used. In the latter case, the user should have
            calculated and stored the residuals within the forecaster (see
            `set_out_sample_residuals()`).

        Returns 
        -------
        predictions : pandas DataFrame
            Values predicted by the forecaster and their estimated interval.
                level: predictions.
                level_lower_bound: lower bound of the interval.
                level_upper_bound: upper bound interval of the interval.

        Notes
        -----
        More information about prediction intervals in forecasting:
        https://otexts.com/fpp2/prediction-intervals.html
        Forecasting: Principles and Practice (2nd ed) Rob J Hyndman and
        George Athanasopoulos.

        """

        if levels is None:
            levels = self.series_col_names
        elif isinstance(levels, str):
            levels = [levels]

        check_interval(interval=interval)

        preds = self.predict(
                    steps       = steps,
                    levels      = levels,
                    last_window = last_window,
                    exog        = exog
                )

        boot_predictions = self.predict_bootstrapping(
                               steps               = steps,
                               levels              = levels,
                               last_window         = last_window,
                               exog                = exog,
                               n_boot              = n_boot,
                               random_state        = random_state,
                               in_sample_residuals = in_sample_residuals
                           )

        interval = np.array(interval)/100
        predictions = []

        for level in levels:
            preds_interval = boot_predictions[level].quantile(q=interval, axis=1).transpose()
            preds_interval.columns = [f'{level}_lower_bound', f'{level}_upper_bound']
            predictions.append(preds[level])
            predictions.append(preds_interval)

        predictions = pd.concat(predictions, axis=1)

        return predictions


    def predict_dist(
        self,
        steps: int,
        distribution: object,
        levels: Optional[Union[str, list]]=None,
        last_window: Optional[pd.DataFrame]=None,
        exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
        n_boot: int=500,
        random_state: int=123,
        in_sample_residuals: bool=True
    ) -> pd.DataFrame:
        """
        Fit a given probability distribution for each step. After generating 
        multiple forecasting predictions through a bootstrapping process, each 
        step is fitted to the given distribution.

        Parameters
        ---------- 
        steps : int
            Number of future steps predicted.

        distribution : Object
            A distribution object from scipy.stats.

        levels : str, list, default `None`
            Time series to be predicted. If `None` all levels will be predicted.  
            **New in version 0.6.0**  

        last_window : pandas DataFrame, default `None`
            Series values used to create the predictors (lags) needed in the 
            first iteration of the prediction (t + 1).

            If `last_window = None`, the values stored in` self.last_window` are
            used to calculate the initial predictors, and the predictions start
            right after training data.

        exog : pandas Series, pandas DataFrame, default `None`
            Exogenous variable/s included as predictor/s.

        n_boot : int, default `500`
            Number of bootstrapping iterations used to estimate prediction
            intervals.

        random_state : int, default `123`
            Sets a seed to the random generator, so that boot intervals are always 
            deterministic.

        in_sample_residuals : bool, default `True`
            If `True`, residuals from the training data are used as proxy of
            prediction error to create prediction intervals. If `False`, out of
            sample residuals are used. In the latter case, the user should have
            calculated and stored the residuals within the forecaster (see
            `set_out_sample_residuals()`).

        Returns 
        -------
        predictions : pandas DataFrame
            Distribution parameters estimated for each step and level.

        """

        if levels is None:
            levels = self.series_col_names
        elif isinstance(levels, str):
            levels = [levels]

        boot_samples = self.predict_bootstrapping(
                           steps               = steps,
                           levels              = levels,
                           last_window         = last_window,
                           exog                = exog,
                           n_boot              = n_boot,
                           random_state        = random_state,
                           in_sample_residuals = in_sample_residuals
                       )

        param_names = [p for p in inspect.signature(distribution._pdf).parameters if not p=='x'] + ["loc","scale"]
        predictions = []

        for level in levels:
            param_values = np.apply_along_axis(lambda x: distribution.fit(x), axis=1, arr=boot_samples[level])
            level_param_names = [f'{level}_{p}' for p in param_names]

            pred_level = pd.DataFrame(
                             data    = param_values,
                             columns = level_param_names,
                             index   = boot_samples[level].index
                         )

            predictions.append(pred_level)

        predictions = pd.concat(predictions, axis=1)

        return predictions


    def set_params(
        self, 
        params: dict
    ) -> None:
        """
        Set new values to the parameters of the scikit learn model stored in the
        forecaster.

        Parameters
        ----------
        params : dict
            Parameters values.

        Returns 
        -------
        self

        """

        self.regressor = clone(self.regressor)
        self.regressor.set_params(**params)


    def set_fit_kwargs(
        self, 
        fit_kwargs: dict
    ) -> None:
        """
        Set new values for the additional keyword arguments passed to the `fit` 
        method of the regressor.

        Parameters
        ----------
        fit_kwargs : dict
            Dict of the form {"argument": new_value}.

        Returns 
        -------
        None

        """

        self.fit_kwargs = check_select_fit_kwargs(self.regressor, fit_kwargs=fit_kwargs)


    def set_lags(
        self, 
        lags: Union[int, list, np.ndarray, range]
    ) -> None:
        """      
        Set new value to the attribute `lags`.
        Attributes `max_lag` and `window_size` are also updated.

        Parameters
        ----------
        lags : int, list, 1D np.array, range
            Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1.
                `int`: include lags from 1 to `lags`.
                `list` or `np.array`: include only lags present in `lags`.

        Returns 
        -------
        None

        """

        self.lags = initialize_lags(type(self).__name__, lags)            
        self.max_lag  = max(self.lags)
        self.window_size = max(self.lags)


    def set_out_sample_residuals(
        self, 
        residuals: dict,
        append: bool=True,
        transform: bool=True,
        random_state: int=123
    )-> None:
        """
        Set new values to the attribute `out_sample_residuals`. Out of sample
        residuals are meant to be calculated using observations that did not
        participate in the training process.

        Parameters
        ----------
        residuals : dict
            Dictionary of numpy ndarrays with the residuals of each level in the
            form {level: residuals}. If len(residuals) > 1000, only a random 
            sample of 1000 values are stored. Keys must be the same as `levels`.

        append : bool, default `True`
            If `True`, new residuals are added to the once already stored in the
            attribute `out_sample_residuals`. Once the limit of 1000 values is
            reached, no more values are appended. If False, `out_sample_residuals`
            is overwritten with the new residuals.

        transform : bool, default `True`
            If `True`, new residuals are transformed using self.transformer_series.

        random_state : int, default `123`
            Sets a seed to the random sampling for reproducible output.

        Returns 
        -------
        self

        """

        if not isinstance(residuals, dict) or not all(isinstance(x, np.ndarray) for x in residuals.values()):
            raise TypeError(
                f"`residuals` argument must be a dict of numpy ndarrays in the form "
                "`{level: residuals}`. " 
                f"Got {type(residuals)}."
            )

        if not self.fitted:
            raise sklearn.exceptions.NotFittedError(
                ("This forecaster is not fitted yet. Call `fit` with appropriate "
                 "arguments before using `set_out_sample_residuals()`.")
            )

        if self.out_sample_residuals is None:
            self.out_sample_residuals = {level: None for level in self.series_col_names}

        if not set(self.out_sample_residuals.keys()).issubset(set(residuals.keys())):
            warnings.warn(
                (f"""
                Only residuals of levels 
                {set(self.out_sample_residuals.keys()).intersection(set(residuals.keys()))} 
                are updated.
                """), IgnoredArgumentWarning
            )

        residuals = {key: value for key, value in residuals.items() if key in self.out_sample_residuals.keys()}

        for level, value in residuals.items():

            residuals_level = value

            if not transform and self.transformer_series_[level] is not None:
                warnings.warn(
                    ("Argument `transform` is set to `False` but forecaster was "
                    f"trained using a transformer {self.transformer_series_[level]} "
                    f"for level {level}. Ensure that the new residuals are "
                     "already transformed or set `transform=True`.")
                )

            if transform and self.transformer_series_ and self.transformer_series_[level]:
                warnings.warn(
                    ("Residuals will be transformed using the same transformer used "
                    f"when training the forecaster for level {level} : "
                    f"({self.transformer_series_[level]}). Ensure that the new "
                     "residuals are on the same scale as the original time series.")
                )
                residuals_level = transform_series(
                                      series            = pd.Series(residuals_level, name='residuals'),
                                      transformer       = self.transformer_series_[level],
                                      fit               = False,
                                      inverse_transform = False
                                  ).to_numpy()

            if len(residuals_level) > 1000:
                rng = np.random.default_rng(seed=random_state)
                residuals_level = rng.choice(a=residuals_level, size=1000, replace=False)

            if append and self.out_sample_residuals[level] is not None:
                free_space = max(0, 1000 - len(self.out_sample_residuals[level]))
                if len(residuals_level) < free_space:
                    residuals_level = np.hstack((
                                            self.out_sample_residuals[level],
                                            residuals_level
                                        ))
                else:
                    residuals_level = np.hstack((
                                            self.out_sample_residuals[level],
                                            residuals_level[:free_space]
                                        ))

            self.out_sample_residuals[level] = residuals_level


    def get_feature_importances(
        self
    ) -> pd.DataFrame:
        """
        Return feature importances of the regressor stored in the
        forecaster. Only valid when regressor stores internally the feature
        importances in the attribute `feature_importances_` or `coef_`.

        Parameters
        ----------
        self

        Returns
        -------
        feature_importances : pandas DataFrame
            Feature importances associated with each predictor.

        """

        if not self.fitted:
            raise sklearn.exceptions.NotFittedError(
                ("This forecaster is not fitted yet. Call `fit` with appropriate "
                 "arguments before using `get_feature_importances()`.")
            )

        if isinstance(self.regressor, sklearn.pipeline.Pipeline):
            estimator = self.regressor[-1]
        else:
            estimator = self.regressor

        if hasattr(estimator, 'feature_importances_'):
            feature_importances = estimator.feature_importances_
        elif hasattr(estimator, 'coef_'):
            feature_importances = estimator.coef_
        else:
            warnings.warn(
                (f"Impossible to access feature importances for regressor of type "
                 f"{type(estimator)}. This method is only valid when the "
                 f"regressor stores internally the feature importances in the "
                 f"attribute `feature_importances_` or `coef_`.")
            )
            feature_importances = None

        if feature_importances is not None:
            feature_importances = pd.DataFrame({
                                      'feature': self.X_train_col_names,
                                      'importance': feature_importances
                                  })

        return feature_importances


    def get_feature_importance(
        self
    ) -> pd.DataFrame:
        """
        This method has been replaced by `get_feature_importances()`.

        Return feature importances of the regressor stored in the
        forecaster. Only valid when regressor stores internally the feature
        importances in the attribute `feature_importances_` or `coef_`.

        Parameters
        ----------
        self

        Returns
        -------
        feature_importances : pandas DataFrame
            Feature importances associated with each predictor.

        """

        warnings.warn(
            ("get_feature_importance() method has been renamed to get_feature_importances()."
             "This method will be removed in skforecast 0.9.0.")
        )

        return self.get_feature_importances()

create_sample_weights(self, series, X_train, y_train_index)

Crate weights for each observation according to the forecaster's attributes

series_weights and weight_func. The resulting weights are product of both types of weights.

Parameters:

Name Type Description Default
series DataFrame

Time series used to create X_train with the method create_train_X_y.

required
X_train DataFrame

Dataframe generated with the method create_train_X_y, first return.

required
y_train_index Index

Index of y_train generated with the method create_train_X_y, fourth return.

required

Returns:

Type Description
ndarray

Weights to use in fit method.

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def create_sample_weights(
    self,
    series: pd.DataFrame,
    X_train: pd.DataFrame,
    y_train_index: pd.Index,
)-> np.ndarray:
    """
    Crate weights for each observation according to the forecaster's attributes
    `series_weights` and `weight_func`. The resulting weights are product of both
    types of weights.

    Parameters
    ----------
    series : pandas DataFrame
        Time series used to create `X_train` with the method `create_train_X_y`.
    X_train : pandas DataFrame
        Dataframe generated with the method `create_train_X_y`, first return.
    y_train_index : pandas Index
        Index of `y_train` generated with the method `create_train_X_y`, fourth return.

    Returns
    -------
    weights : numpy ndarray
        Weights to use in `fit` method.

    """

    weights = None
    weights_samples = None
    weights_series = None

    if self.series_weights is not None:
        # Series not present in series_weights have a weight of 1 in all their samples.
        # Keys in series_weights not present in series are ignored.
        series_not_in_series_weights = set(series.columns) - set(self.series_weights.keys())
        if series_not_in_series_weights:
            warnings.warn(
                (f"{series_not_in_series_weights} not present in `series_weights`. "
                 f"A weight of 1 is given to all their samples."),
                 IgnoredArgumentWarning
            )
        self.series_weights_ = {col: 1. for col in series.columns}
        self.series_weights_.update((k, v) for k, v in self.series_weights.items() if k in self.series_weights_)
        weights_series = [np.repeat(self.series_weights_[serie], sum(X_train[serie])) 
                          for serie in series.columns]
        weights_series = np.concatenate(weights_series)

    if self.weight_func is not None:
        if isinstance(self.weight_func, Callable):
            self.weight_func_ = {col: copy(self.weight_func) for col in series.columns}
        else:
            # Series not present in weight_func have a weight of 1 in all their samples
            series_not_in_weight_func = set(series.columns) - set(self.weight_func.keys())
            if series_not_in_weight_func:
                warnings.warn(
                    (f"{series_not_in_weight_func} not present in `weight_func`. "
                     f"A weight of 1 is given to all their samples."),
                     IgnoredArgumentWarning
                )
            self.weight_func_ = {col: lambda x: np.ones_like(x, dtype=float) for col in series.columns}
            self.weight_func_.update((k, v) for k, v in self.weight_func.items() if k in self.weight_func_)

        weights_samples = []
        for key in self.weight_func_.keys():
            idx = y_train_index[X_train[X_train[key] == 1.0].index]
            weights_samples.append(self.weight_func_[key](idx))
        weights_samples = np.concatenate(weights_samples)

    if weights_series is not None:
        weights = weights_series
        if weights_samples is not None:
            weights = weights * weights_samples
    else:
        if weights_samples is not None:
            weights = weights_samples

    if weights is not None:
        if np.isnan(weights).any():
            raise ValueError(
                "The resulting `weights` cannot have NaN values."
            )
        if np.any(weights < 0):
            raise ValueError(
                "The resulting `weights` cannot have negative values."
            )
        if np.sum(weights) == 0:
            raise ValueError(
                ("The resulting `weights` cannot be normalized because "
                 "the sum of the weights is zero.")
            )

    return weights

create_train_X_y(self, series, exog=None)

Create training matrices from multiple time series and exogenous

variables.

Parameters:

Name Type Description Default
series DataFrame

Training time series.

required
exog Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Exogenous variable/s included as predictor/s. Must have the same number of observations as series and their indexes must be aligned.

None

Returns:

Type Description
Tuple[pandas.core.frame.DataFrame, pandas.core.series.Series, pandas.core.indexes.base.Index, pandas.core.indexes.base.Index]

Pandas DataFrame with the training values (predictors).

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def create_train_X_y(
    self,
    series: pd.DataFrame,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None
) -> Tuple[pd.DataFrame, pd.Series, pd.Index, pd.Index]:
    """
    Create training matrices from multiple time series and exogenous
    variables.

    Parameters
    ----------        
    series : pandas DataFrame
        Training time series.

    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `series` and their indexes must be aligned.

    Returns 
    -------
    X_train : pandas DataFrame
        Pandas DataFrame with the training values (predictors).

    y_train : pandas Series, shape (len(series) - self.max_lag, )
        Values (target) of the time series related to each row of `X_train`.

    y_index : pandas Index
        Index of `series`.

    y_train_index: pandas Index
        Index of `y_train`.

    """

    if not isinstance(series, pd.DataFrame):
        raise TypeError(f"`series` must be a pandas DataFrame. Got {type(series)}.")

    series_col_names = list(series.columns)

    if self.transformer_series is None:
        self.transformer_series_ = {serie: None for serie in series_col_names}
    elif not isinstance(self.transformer_series, dict):
        self.transformer_series_ = {serie: clone(self.transformer_series) 
                                    for serie in series_col_names}
    else:
        self.transformer_series_ = {serie: None for serie in series_col_names}
        # Only elements already present in transformer_series_ are updated
        self.transformer_series_.update(
            (k, v) for k, v in deepcopy(self.transformer_series).items() if k in self.transformer_series_
        )
        series_not_in_transformer_series = set(series.columns) - set(self.transformer_series.keys())
        if series_not_in_transformer_series:
            warnings.warn(
                (f"{series_not_in_transformer_series} not present in `transformer_series`."
                 f" No transformation is applied to these series."),
                 IgnoredArgumentWarning
            )

    if exog is not None:
        if len(exog) != len(series):
            raise ValueError(
                (f"`exog` must have same number of samples as `series`. "
                 f"length `exog`: ({len(exog)}), length `series`: ({len(series)})")
            )
        check_exog(exog=exog, allow_nan=True)
        if isinstance(exog, pd.Series):
            exog = transform_series(
                       series            = exog,
                       transformer       = self.transformer_exog,
                       fit               = True,
                       inverse_transform = False
                   )
        else:
            exog = transform_dataframe(
                       df                = exog,
                       transformer       = self.transformer_exog,
                       fit               = True,
                       inverse_transform = False
                   )

        check_exog(exog=exog, allow_nan=False)
        check_exog_dtypes(exog)
        self.exog_dtypes = get_exog_dtypes(exog=exog)

        _, exog_index = preprocess_exog(exog=exog, return_values=False)
        if not (exog_index[:len(series.index)] == series.index).all():
            raise ValueError(
                ("Different index for `series` and `exog`. They must be equal "
                 "to ensure the correct alignment of values.")
            )

    X_levels = []
    X_train_col_names = [f"lag_{lag}" for lag in self.lags]

    for i, serie in enumerate(series.columns):

        y = series[serie]
        check_y(y=y)
        y = transform_series(
                series            = y,
                transformer       = self.transformer_series_[serie],
                fit               = True,
                inverse_transform = False
            )

        y_values, y_index = preprocess_y(y=y)
        X_train_values, y_train_values = self._create_lags(y=y_values)

        if i == 0:
            X_train = X_train_values
            y_train = y_train_values
        else:
            X_train = np.vstack((X_train, X_train_values))
            y_train = np.append(y_train, y_train_values)

        X_level = [serie]*len(X_train_values)
        X_levels.extend(X_level)

    X_levels = pd.Series(X_levels)
    X_levels = pd.get_dummies(X_levels, dtype=float)

    X_train = pd.DataFrame(
                  data    = X_train,
                  columns = X_train_col_names
              )

    if exog is not None:
        # The first `self.max_lag` positions have to be removed from exog
        # since they are not in X_train. Then exog is cloned as many times
        # as series.
        exog_to_train = exog.iloc[self.max_lag:, ]
        exog_to_train = pd.concat([exog_to_train]*len(series_col_names)).reset_index(drop=True)
    else:
        exog_to_train = None

    X_train = pd.concat([X_train, exog_to_train, X_levels], axis=1)
    self.X_train_col_names = X_train.columns.to_list()

    y_train = pd.Series(
                  data = y_train,
                  name = 'y'
              )

    y_train_index = pd.Index(
                        np.tile(
                            y_index[self.max_lag: ].to_numpy(),
                            reps = len(series_col_names)
                        )
                    )

    return X_train, y_train, y_index, y_train_index

fit(self, series, exog=None, store_in_sample_residuals=True)

Training Forecaster.

Parameters:

Name Type Description Default
series DataFrame

Training time series.

required
exog Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Exogenous variable/s included as predictor/s. Must have the same number of observations as series and their indexes must be aligned so that series[i] is regressed on exog[i].

None
store_in_sample_residuals bool

if True, in_sample_residuals are stored.

True
Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def fit(
    self,
    series: pd.DataFrame,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
    store_in_sample_residuals: bool=True
) -> None:
    """
    Training Forecaster.

    Parameters
    ----------        
    series : pandas DataFrame
        Training time series.

    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `series` and their indexes must be aligned so
        that series[i] is regressed on exog[i].

    store_in_sample_residuals : bool, default `True`
        if True, in_sample_residuals are stored.

    Returns 
    -------
    None

    """

    # Reset values in case the forecaster has already been fitted.
    self.index_type          = None
    self.index_freq          = None
    self.index_values        = None
    self.last_window         = None
    self.included_exog       = False
    self.exog_type           = None
    self.exog_dtypes         = None
    self.exog_col_names      = None
    self.series_col_names    = None
    self.X_train_col_names   = None
    self.in_sample_residuals = None
    self.fitted              = False
    self.training_range      = None

    self.series_col_names = list(series.columns)        

    if exog is not None:
        self.included_exog = True
        self.exog_type = type(exog)
        self.exog_col_names = \
             exog.columns.to_list() if isinstance(exog, pd.DataFrame) else [exog.name]

        if len(set(self.exog_col_names) - set(self.series_col_names)) != len(self.exog_col_names):
            raise ValueError(
                (f"`exog` cannot contain a column named the same as one of the series"
                 f" (column names of series).\n"
                 f"    `series` columns : {self.series_col_names}.\n"
                 f"    `exog`   columns : {self.exog_col_names}.")
            )

    X_train, y_train, y_index, y_train_index = self.create_train_X_y(series=series, exog=exog)
    sample_weight = self.create_sample_weights(
                        series        = series,
                        X_train       = X_train,
                        y_train_index = y_train_index,
                    )

    if sample_weight is not None:
        self.regressor.fit(
            X             = X_train,
            y             = y_train,
            sample_weight = sample_weight,
            **self.fit_kwargs
        )
    else:
        self.regressor.fit(X=X_train, y=y_train, **self.fit_kwargs)

    self.fitted = True
    self.fit_date = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
    self.training_range = y_index[[0, -1]]
    self.index_type = type(y_index)
    if isinstance(y_index, pd.DatetimeIndex):
        self.index_freq = y_index.freqstr
    else: 
        self.index_freq = y_index.step
    self.index_values = y_index

    in_sample_residuals = {}

    # This is done to save time during fit in functions such as backtesting()
    if store_in_sample_residuals:

        residuals = y_train - self.regressor.predict(X_train)

        for serie in series.columns:
            in_sample_residuals[serie] = residuals.loc[X_train[serie] == 1.].to_numpy()
            if len(in_sample_residuals[serie]) > 1000:
                # Only up to 1000 residuals are stored
                rng = np.random.default_rng(seed=123)
                in_sample_residuals[serie] = rng.choice(
                                                 a       = in_sample_residuals[serie], 
                                                 size    = 1000, 
                                                 replace = False
                                             )
    else:
        for serie in series.columns:
            in_sample_residuals[serie] = None

    self.in_sample_residuals = in_sample_residuals

    # The last time window of training data is stored so that lags needed as
    # predictors in the first iteration of `predict()` can be calculated.
    self.last_window = series.iloc[-self.max_lag:, ].copy()

get_feature_importance(self)

This method has been replaced by get_feature_importances().

Return feature importances of the regressor stored in the forecaster. Only valid when regressor stores internally the feature importances in the attribute feature_importances_ or coef_.

Parameters:

Name Type Description Default
self None required

Returns:

Type Description
DataFrame

Feature importances associated with each predictor.

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def get_feature_importance(
    self
) -> pd.DataFrame:
    """
    This method has been replaced by `get_feature_importances()`.

    Return feature importances of the regressor stored in the
    forecaster. Only valid when regressor stores internally the feature
    importances in the attribute `feature_importances_` or `coef_`.

    Parameters
    ----------
    self

    Returns
    -------
    feature_importances : pandas DataFrame
        Feature importances associated with each predictor.

    """

    warnings.warn(
        ("get_feature_importance() method has been renamed to get_feature_importances()."
         "This method will be removed in skforecast 0.9.0.")
    )

    return self.get_feature_importances()

get_feature_importances(self)

Return feature importances of the regressor stored in the

forecaster. Only valid when regressor stores internally the feature importances in the attribute feature_importances_ or coef_.

Parameters:

Name Type Description Default
self None required

Returns:

Type Description
DataFrame

Feature importances associated with each predictor.

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def get_feature_importances(
    self
) -> pd.DataFrame:
    """
    Return feature importances of the regressor stored in the
    forecaster. Only valid when regressor stores internally the feature
    importances in the attribute `feature_importances_` or `coef_`.

    Parameters
    ----------
    self

    Returns
    -------
    feature_importances : pandas DataFrame
        Feature importances associated with each predictor.

    """

    if not self.fitted:
        raise sklearn.exceptions.NotFittedError(
            ("This forecaster is not fitted yet. Call `fit` with appropriate "
             "arguments before using `get_feature_importances()`.")
        )

    if isinstance(self.regressor, sklearn.pipeline.Pipeline):
        estimator = self.regressor[-1]
    else:
        estimator = self.regressor

    if hasattr(estimator, 'feature_importances_'):
        feature_importances = estimator.feature_importances_
    elif hasattr(estimator, 'coef_'):
        feature_importances = estimator.coef_
    else:
        warnings.warn(
            (f"Impossible to access feature importances for regressor of type "
             f"{type(estimator)}. This method is only valid when the "
             f"regressor stores internally the feature importances in the "
             f"attribute `feature_importances_` or `coef_`.")
        )
        feature_importances = None

    if feature_importances is not None:
        feature_importances = pd.DataFrame({
                                  'feature': self.X_train_col_names,
                                  'importance': feature_importances
                              })

    return feature_importances

predict(self, steps, levels=None, last_window=None, exog=None)

Predict n steps ahead. It is an recursive process in which, each prediction,

is used as a predictor for the next step.

Parameters:

Name Type Description Default
steps int

Number of future steps predicted.

required
levels Union[str, list]

Time series to be predicted. If None all levels will be predicted. New in version 0.6.0

None
last_window Optional[pandas.core.frame.DataFrame]

Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1).

If last_window = None, the values stored in self.last_window are used to calculate the initial predictors, and the predictions start right after training data.

None
exog Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Exogenous variable/s included as predictor/s.

None

Returns:

Type Description
DataFrame

Predicted values, one column for each level.

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def predict(
    self,
    steps: int,
    levels: Optional[Union[str, list]]=None,
    last_window: Optional[pd.DataFrame]=None,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None
) -> pd.DataFrame:
    """
    Predict n steps ahead. It is an recursive process in which, each prediction,
    is used as a predictor for the next step.

    Parameters
    ----------
    steps : int
        Number of future steps predicted.

    levels : str, list, default `None`
        Time series to be predicted. If `None` all levels will be predicted.
        **New in version 0.6.0**

    last_window : pandas DataFrame, default `None`
        Series values used to create the predictors (lags) needed in the 
        first iteration of the prediction (t + 1).

        If `last_window = None`, the values stored in `self.last_window` are
        used to calculate the initial predictors, and the predictions start
        right after training data.

    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s.

    Returns
    -------
    predictions : pandas DataFrame
        Predicted values, one column for each level.

    """

    if levels is None:
        levels = self.series_col_names
    elif isinstance(levels, str):
        levels = [levels]

    if last_window is None:
        last_window = deepcopy(self.last_window)

    check_predict_input(
        forecaster_name  = type(self).__name__,
        steps            = steps,
        fitted           = self.fitted,
        included_exog    = self.included_exog,
        index_type       = self.index_type,
        index_freq       = self.index_freq,
        window_size      = self.window_size,
        last_window      = last_window,
        last_window_exog = None,
        exog             = exog,
        exog_type        = self.exog_type,
        exog_col_names   = self.exog_col_names,
        interval         = None,
        alpha            = None,
        max_steps        = None,
        levels           = levels,
        series_col_names = self.series_col_names
    )

    if exog is not None:
        if isinstance(exog, pd.DataFrame):
            exog = transform_dataframe(
                       df                = exog,
                       transformer       = self.transformer_exog,
                       fit               = False,
                       inverse_transform = False
                   )
        else:
            exog = transform_series(
                       series            = exog,
                       transformer       = self.transformer_exog,
                       fit               = False,
                       inverse_transform = False
                   )
        check_exog_dtypes(exog=exog)
        exog_values = exog.iloc[:steps, ].to_numpy()
    else:
        exog_values = None

    predictions = []

    for level in levels:

        last_window_level = transform_series(
                                series            = last_window[level],
                                transformer       = self.transformer_series_[level],
                                fit               = False,
                                inverse_transform = False
                            )
        last_window_values, last_window_index = preprocess_last_window(
                                                    last_window = last_window_level
                                                )

        preds_level = self._recursive_predict(
                          steps       = steps,
                          level       = level,
                          last_window = copy(last_window_values),
                          exog        = copy(exog_values)
                      )

        preds_level = pd.Series(
                          data  = preds_level,
                          index = expand_index(
                                      index = last_window_index,
                                      steps = steps
                                  ),
                          name = level
                      )

        preds_level = transform_series(
                          series            = preds_level,
                          transformer       = self.transformer_series_[level],
                          fit               = False,
                          inverse_transform = True
                      )

        predictions.append(preds_level)    

    predictions = pd.concat(predictions, axis=1)

    return predictions

predict_bootstrapping(self, steps, levels=None, last_window=None, exog=None, n_boot=500, random_state=123, in_sample_residuals=True)

Generate multiple forecasting predictions using a bootstrapping process.

By sampling from a collection of past observed errors (the residuals), each iteration of bootstrapping generates a different set of predictions. See the Notes section for more information.

Parameters:

Name Type Description Default
steps int

Number of future steps predicted.

required
levels Union[str, list]

Time series to be predicted. If None all levels will be predicted.

None
last_window Optional[pandas.core.frame.DataFrame]

Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1).

If last_window = None, the values stored in self.last_window are used to calculate the initial predictors, and the predictions start right after training data.

None
exog Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Exogenous variable/s included as predictor/s.

None
n_boot int

Number of bootstrapping iterations used to estimate prediction intervals.

500
random_state int

Sets a seed to the random generator, so that boot intervals are always deterministic.

123
in_sample_residuals bool

If True, residuals from the training data are used as proxy of prediction error to create prediction intervals. If False, out of sample residuals are used. In the latter case, the user should have calculated and stored the residuals within the forecaster (see set_out_sample_residuals()).

True

Returns:

Type Description
dict

Predictions generated by bootstrapping for each level. {level: pandas DataFrame, shape (steps, n_boot)}

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def predict_bootstrapping(
    self,
    steps: int,
    levels: Optional[Union[str, list]]=None,
    last_window: Optional[pd.DataFrame]=None,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
    n_boot: int=500,
    random_state: int=123,
    in_sample_residuals: bool=True
) -> dict:
    """
    Generate multiple forecasting predictions using a bootstrapping process. 
    By sampling from a collection of past observed errors (the residuals),
    each iteration of bootstrapping generates a different set of predictions. 
    See the Notes section for more information. 

    Parameters
    ----------   
    steps : int
        Number of future steps predicted.

    levels : str, list, default `None`
        Time series to be predicted. If `None` all levels will be predicted.

    last_window : pandas DataFrame, default `None`
        Series values used to create the predictors (lags) needed in the 
        first iteration of the prediction (t + 1).

        If `last_window = None`, the values stored in `self.last_window` are
        used to calculate the initial predictors, and the predictions start
        right after training data.

    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s.

    n_boot : int, default `500`
        Number of bootstrapping iterations used to estimate prediction
        intervals.

    random_state : int, default `123`
        Sets a seed to the random generator, so that boot intervals are always 
        deterministic.

    in_sample_residuals : bool, default `True`
        If `True`, residuals from the training data are used as proxy of
        prediction error to create prediction intervals. If `False`, out of
        sample residuals are used. In the latter case, the user should have
        calculated and stored the residuals within the forecaster (see
        `set_out_sample_residuals()`).

    Returns 
    -------
    boot_predictions : dict
        Predictions generated by bootstrapping for each level. 
        {level: pandas DataFrame, shape (steps, n_boot)}

    Notes
    -----
    More information about prediction intervals in forecasting:
    https://otexts.com/fpp3/prediction-intervals.html#prediction-intervals-from-bootstrapped-residuals
    Forecasting: Principles and Practice (3nd ed) Rob J Hyndman and George Athanasopoulos.

    """

    if levels is None:
        levels = self.series_col_names
    elif isinstance(levels, str):
        levels = [levels]

    if in_sample_residuals:
        if not set(levels).issubset(set(self.in_sample_residuals.keys())):
            raise ValueError(
                (f"Not `forecaster.in_sample_residuals` for levels: "
                 f"{set(levels) - set(self.in_sample_residuals.keys())}.")
            )
        residuals_levels = self.in_sample_residuals
    else:
        if self.out_sample_residuals is None:
            raise ValueError(
                ("`forecaster.out_sample_residuals` is `None`. Use "
                 "`in_sample_residuals=True` or method `set_out_sample_residuals()` "
                 "before `predict_interval()`, `predict_bootstrapping()` or "
                 "`predict_dist()`.")
            )
        else:
            if not set(levels).issubset(set(self.out_sample_residuals.keys())):
                raise ValueError(
                    (f"Not `forecaster.out_sample_residuals` for levels: "
                     f"{set(levels) - set(self.out_sample_residuals.keys())}. "
                     f"Use method `set_out_sample_residuals()`.")
                )
        residuals_levels = self.out_sample_residuals

    check_residuals = 'forecaster.in_sample_residuals' if in_sample_residuals else 'forecaster.out_sample_residuals'
    for level in levels:
        if residuals_levels[level] is None:
            raise ValueError(
                (f"forecaster residuals for level '{level}' are `None`. Check `{check_residuals}`.")
            )
        elif (residuals_levels[level] == None).any():
            raise ValueError(
                (f"forecaster residuals for level '{level}' contains `None` values. Check `{check_residuals}`.")
            )

    if last_window is None:
        last_window = deepcopy(self.last_window)

    check_predict_input(
        forecaster_name  = type(self).__name__,
        steps            = steps,
        fitted           = self.fitted,
        included_exog    = self.included_exog,
        index_type       = self.index_type,
        index_freq       = self.index_freq,
        window_size      = self.window_size,
        last_window      = last_window,
        last_window_exog = None,
        exog             = exog,
        exog_type        = self.exog_type,
        exog_col_names   = self.exog_col_names,
        interval         = None,
        alpha            = None,
        max_steps        = None,
        levels           = levels,
        series_col_names = self.series_col_names
    )

    if exog is not None:
        if isinstance(exog, pd.DataFrame):
            exog = transform_dataframe(
                       df                = exog,
                       transformer       = self.transformer_exog,
                       fit               = False,
                       inverse_transform = False
                   )
        else:
            exog = transform_series(
                       series            = exog,
                       transformer       = self.transformer_exog,
                       fit               = False,
                       inverse_transform = False
                   )
        exog_values = exog.iloc[:steps, ].to_numpy()
    else:
        exog_values = None

    boot_predictions = {}

    for level in levels:

        last_window_level = transform_series(
                                series            = last_window[level],
                                transformer       = self.transformer_series_[level],
                                fit               = False,
                                inverse_transform = False
                            )
        last_window_values, last_window_index = preprocess_last_window(
                                                    last_window = last_window_level
                                                )

        level_boot_predictions = np.full(
                                     shape      = (steps, n_boot),
                                     fill_value = np.nan,
                                     dtype      = float
                                 )
        rng = np.random.default_rng(seed=random_state)
        seeds = rng.integers(low=0, high=10000, size=n_boot)

        residuals = residuals_levels[level]

        for i in range(n_boot):
            # In each bootstraping iteration the initial last_window and exog 
            # need to be restored.
            last_window_boot = last_window_values.copy()
            exog_boot = exog_values.copy() if exog is not None else None

            rng = np.random.default_rng(seed=seeds[i])
            sample_residuals = rng.choice(
                                   a       = residuals,
                                   size    = steps,
                                   replace = True
                               )

            for step in range(steps):

                prediction = self._recursive_predict(
                                 steps       = 1,
                                 level       = level,
                                 last_window = last_window_boot,
                                 exog        = exog_boot 
                             )

                prediction_with_residual = prediction + sample_residuals[step]
                level_boot_predictions[step, i] = prediction_with_residual

                last_window_boot = np.append(
                                       last_window_boot[1:],
                                       prediction_with_residual
                                   )
                if exog is not None:
                    exog_boot = exog_boot[1:]

        level_boot_predictions = pd.DataFrame(
                                     data    = level_boot_predictions,
                                     index   = expand_index(last_window_index, steps=steps),
                                     columns = [f"pred_boot_{i}" for i in range(n_boot)]
                                 )

        if self.transformer_series_[level]:
            for col in level_boot_predictions.columns:
                level_boot_predictions[col] = transform_series(
                                                  series            = level_boot_predictions[col],
                                                  transformer       = self.transformer_series_[level],
                                                  fit               = False,
                                                  inverse_transform = True
                                              )

        boot_predictions[level] = level_boot_predictions

    return boot_predictions

predict_dist(self, steps, distribution, levels=None, last_window=None, exog=None, n_boot=500, random_state=123, in_sample_residuals=True)

Fit a given probability distribution for each step. After generating

multiple forecasting predictions through a bootstrapping process, each step is fitted to the given distribution.

Parameters:

Name Type Description Default
steps int

Number of future steps predicted.

required
distribution object

A distribution object from scipy.stats.

required
levels Union[str, list]

Time series to be predicted. If None all levels will be predicted.
New in version 0.6.0

None
last_window Optional[pandas.core.frame.DataFrame]

Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1).

If last_window = None, the values stored inself.last_window are used to calculate the initial predictors, and the predictions start right after training data.

None
exog Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Exogenous variable/s included as predictor/s.

None
n_boot int

Number of bootstrapping iterations used to estimate prediction intervals.

500
random_state int

Sets a seed to the random generator, so that boot intervals are always deterministic.

123
in_sample_residuals bool

If True, residuals from the training data are used as proxy of prediction error to create prediction intervals. If False, out of sample residuals are used. In the latter case, the user should have calculated and stored the residuals within the forecaster (see set_out_sample_residuals()).

True

Returns:

Type Description
DataFrame

Distribution parameters estimated for each step and level.

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def predict_dist(
    self,
    steps: int,
    distribution: object,
    levels: Optional[Union[str, list]]=None,
    last_window: Optional[pd.DataFrame]=None,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
    n_boot: int=500,
    random_state: int=123,
    in_sample_residuals: bool=True
) -> pd.DataFrame:
    """
    Fit a given probability distribution for each step. After generating 
    multiple forecasting predictions through a bootstrapping process, each 
    step is fitted to the given distribution.

    Parameters
    ---------- 
    steps : int
        Number of future steps predicted.

    distribution : Object
        A distribution object from scipy.stats.

    levels : str, list, default `None`
        Time series to be predicted. If `None` all levels will be predicted.  
        **New in version 0.6.0**  

    last_window : pandas DataFrame, default `None`
        Series values used to create the predictors (lags) needed in the 
        first iteration of the prediction (t + 1).

        If `last_window = None`, the values stored in` self.last_window` are
        used to calculate the initial predictors, and the predictions start
        right after training data.

    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s.

    n_boot : int, default `500`
        Number of bootstrapping iterations used to estimate prediction
        intervals.

    random_state : int, default `123`
        Sets a seed to the random generator, so that boot intervals are always 
        deterministic.

    in_sample_residuals : bool, default `True`
        If `True`, residuals from the training data are used as proxy of
        prediction error to create prediction intervals. If `False`, out of
        sample residuals are used. In the latter case, the user should have
        calculated and stored the residuals within the forecaster (see
        `set_out_sample_residuals()`).

    Returns 
    -------
    predictions : pandas DataFrame
        Distribution parameters estimated for each step and level.

    """

    if levels is None:
        levels = self.series_col_names
    elif isinstance(levels, str):
        levels = [levels]

    boot_samples = self.predict_bootstrapping(
                       steps               = steps,
                       levels              = levels,
                       last_window         = last_window,
                       exog                = exog,
                       n_boot              = n_boot,
                       random_state        = random_state,
                       in_sample_residuals = in_sample_residuals
                   )

    param_names = [p for p in inspect.signature(distribution._pdf).parameters if not p=='x'] + ["loc","scale"]
    predictions = []

    for level in levels:
        param_values = np.apply_along_axis(lambda x: distribution.fit(x), axis=1, arr=boot_samples[level])
        level_param_names = [f'{level}_{p}' for p in param_names]

        pred_level = pd.DataFrame(
                         data    = param_values,
                         columns = level_param_names,
                         index   = boot_samples[level].index
                     )

        predictions.append(pred_level)

    predictions = pd.concat(predictions, axis=1)

    return predictions

predict_interval(self, steps, levels=None, last_window=None, exog=None, interval=[5, 95], n_boot=500, random_state=123, in_sample_residuals=True)

Iterative process in which, each prediction, is used as a predictor

for the next step and bootstrapping is used to estimate prediction intervals. Both predictions and intervals are returned.

Parameters:

Name Type Description Default
steps int

Number of future steps predicted.

required
levels Union[str, list]

Time series to be predicted. If None all levels will be predicted.
New in version 0.6.0

None
last_window Optional[pandas.core.frame.DataFrame]

Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1).

If last_window = None, the values stored inself.last_window are used to calculate the initial predictors, and the predictions start right after training data.

None
exog Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Exogenous variable/s included as predictor/s.

None
interval list

Confidence of the prediction interval estimated. Sequence of percentiles to compute, which must be between 0 and 100 inclusive. For example, interval of 95% should be as interval = [2.5, 97.5].

[5, 95]
n_boot int

Number of bootstrapping iterations used to estimate prediction intervals.

500
random_state int

Sets a seed to the random generator, so that boot intervals are always deterministic.

123
in_sample_residuals bool

If True, residuals from the training data are used as proxy of prediction error to create prediction intervals. If False, out of sample residuals are used. In the latter case, the user should have calculated and stored the residuals within the forecaster (see set_out_sample_residuals()).

True

Returns:

Type Description
DataFrame

Values predicted by the forecaster and their estimated interval. level: predictions. level_lower_bound: lower bound of the interval. level_upper_bound: upper bound interval of the interval.

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def predict_interval(
    self,
    steps: int,
    levels: Optional[Union[str, list]]=None,
    last_window: Optional[pd.DataFrame]=None,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
    interval: list=[5, 95],
    n_boot: int=500,
    random_state: int=123,
    in_sample_residuals: bool=True
) -> pd.DataFrame:
    """
    Iterative process in which, each prediction, is used as a predictor
    for the next step and bootstrapping is used to estimate prediction
    intervals. Both predictions and intervals are returned.

    Parameters
    ---------- 
    steps : int
        Number of future steps predicted.

    levels : str, list, default `None`
        Time series to be predicted. If `None` all levels will be predicted.  
        **New in version 0.6.0**  

    last_window : pandas DataFrame, default `None`
        Series values used to create the predictors (lags) needed in the 
        first iteration of the prediction (t + 1).

        If `last_window = None`, the values stored in` self.last_window` are
        used to calculate the initial predictors, and the predictions start
        right after training data.

    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s.

    interval : list, default `[5, 95]`
        Confidence of the prediction interval estimated. Sequence of 
        percentiles to compute, which must be between 0 and 100 inclusive. 
        For example, interval of 95% should be as `interval = [2.5, 97.5]`.

    n_boot : int, default `500`
        Number of bootstrapping iterations used to estimate prediction
        intervals.

    random_state : int, default `123`
        Sets a seed to the random generator, so that boot intervals are always 
        deterministic.

    in_sample_residuals : bool, default `True`
        If `True`, residuals from the training data are used as proxy of
        prediction error to create prediction intervals. If `False`, out of
        sample residuals are used. In the latter case, the user should have
        calculated and stored the residuals within the forecaster (see
        `set_out_sample_residuals()`).

    Returns 
    -------
    predictions : pandas DataFrame
        Values predicted by the forecaster and their estimated interval.
            level: predictions.
            level_lower_bound: lower bound of the interval.
            level_upper_bound: upper bound interval of the interval.

    Notes
    -----
    More information about prediction intervals in forecasting:
    https://otexts.com/fpp2/prediction-intervals.html
    Forecasting: Principles and Practice (2nd ed) Rob J Hyndman and
    George Athanasopoulos.

    """

    if levels is None:
        levels = self.series_col_names
    elif isinstance(levels, str):
        levels = [levels]

    check_interval(interval=interval)

    preds = self.predict(
                steps       = steps,
                levels      = levels,
                last_window = last_window,
                exog        = exog
            )

    boot_predictions = self.predict_bootstrapping(
                           steps               = steps,
                           levels              = levels,
                           last_window         = last_window,
                           exog                = exog,
                           n_boot              = n_boot,
                           random_state        = random_state,
                           in_sample_residuals = in_sample_residuals
                       )

    interval = np.array(interval)/100
    predictions = []

    for level in levels:
        preds_interval = boot_predictions[level].quantile(q=interval, axis=1).transpose()
        preds_interval.columns = [f'{level}_lower_bound', f'{level}_upper_bound']
        predictions.append(preds[level])
        predictions.append(preds_interval)

    predictions = pd.concat(predictions, axis=1)

    return predictions

set_fit_kwargs(self, fit_kwargs)

Set new values for the additional keyword arguments passed to the fit

method of the regressor.

Parameters:

Name Type Description Default
fit_kwargs dict

Dict of the form {"argument": new_value}.

required
Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def set_fit_kwargs(
    self, 
    fit_kwargs: dict
) -> None:
    """
    Set new values for the additional keyword arguments passed to the `fit` 
    method of the regressor.

    Parameters
    ----------
    fit_kwargs : dict
        Dict of the form {"argument": new_value}.

    Returns 
    -------
    None

    """

    self.fit_kwargs = check_select_fit_kwargs(self.regressor, fit_kwargs=fit_kwargs)

set_lags(self, lags)

Set new value to the attribute lags.

Attributes max_lag and window_size are also updated.

Parameters:

Name Type Description Default
lags Union[int, list, numpy.ndarray, range]

Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. int: include lags from 1 to lags. list or np.array: include only lags present in lags.

required
Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def set_lags(
    self, 
    lags: Union[int, list, np.ndarray, range]
) -> None:
    """      
    Set new value to the attribute `lags`.
    Attributes `max_lag` and `window_size` are also updated.

    Parameters
    ----------
    lags : int, list, 1D np.array, range
        Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1.
            `int`: include lags from 1 to `lags`.
            `list` or `np.array`: include only lags present in `lags`.

    Returns 
    -------
    None

    """

    self.lags = initialize_lags(type(self).__name__, lags)            
    self.max_lag  = max(self.lags)
    self.window_size = max(self.lags)

set_out_sample_residuals(self, residuals, append=True, transform=True, random_state=123)

Set new values to the attribute out_sample_residuals. Out of sample

residuals are meant to be calculated using observations that did not participate in the training process.

Parameters:

Name Type Description Default
residuals dict

Dictionary of numpy ndarrays with the residuals of each level in the form {level: residuals}. If len(residuals) > 1000, only a random sample of 1000 values are stored. Keys must be the same as levels.

required
append bool

If True, new residuals are added to the once already stored in the attribute out_sample_residuals. Once the limit of 1000 values is reached, no more values are appended. If False, out_sample_residuals is overwritten with the new residuals.

True
transform bool

If True, new residuals are transformed using self.transformer_series.

True
random_state int

Sets a seed to the random sampling for reproducible output.

123
Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def set_out_sample_residuals(
    self, 
    residuals: dict,
    append: bool=True,
    transform: bool=True,
    random_state: int=123
)-> None:
    """
    Set new values to the attribute `out_sample_residuals`. Out of sample
    residuals are meant to be calculated using observations that did not
    participate in the training process.

    Parameters
    ----------
    residuals : dict
        Dictionary of numpy ndarrays with the residuals of each level in the
        form {level: residuals}. If len(residuals) > 1000, only a random 
        sample of 1000 values are stored. Keys must be the same as `levels`.

    append : bool, default `True`
        If `True`, new residuals are added to the once already stored in the
        attribute `out_sample_residuals`. Once the limit of 1000 values is
        reached, no more values are appended. If False, `out_sample_residuals`
        is overwritten with the new residuals.

    transform : bool, default `True`
        If `True`, new residuals are transformed using self.transformer_series.

    random_state : int, default `123`
        Sets a seed to the random sampling for reproducible output.

    Returns 
    -------
    self

    """

    if not isinstance(residuals, dict) or not all(isinstance(x, np.ndarray) for x in residuals.values()):
        raise TypeError(
            f"`residuals` argument must be a dict of numpy ndarrays in the form "
            "`{level: residuals}`. " 
            f"Got {type(residuals)}."
        )

    if not self.fitted:
        raise sklearn.exceptions.NotFittedError(
            ("This forecaster is not fitted yet. Call `fit` with appropriate "
             "arguments before using `set_out_sample_residuals()`.")
        )

    if self.out_sample_residuals is None:
        self.out_sample_residuals = {level: None for level in self.series_col_names}

    if not set(self.out_sample_residuals.keys()).issubset(set(residuals.keys())):
        warnings.warn(
            (f"""
            Only residuals of levels 
            {set(self.out_sample_residuals.keys()).intersection(set(residuals.keys()))} 
            are updated.
            """), IgnoredArgumentWarning
        )

    residuals = {key: value for key, value in residuals.items() if key in self.out_sample_residuals.keys()}

    for level, value in residuals.items():

        residuals_level = value

        if not transform and self.transformer_series_[level] is not None:
            warnings.warn(
                ("Argument `transform` is set to `False` but forecaster was "
                f"trained using a transformer {self.transformer_series_[level]} "
                f"for level {level}. Ensure that the new residuals are "
                 "already transformed or set `transform=True`.")
            )

        if transform and self.transformer_series_ and self.transformer_series_[level]:
            warnings.warn(
                ("Residuals will be transformed using the same transformer used "
                f"when training the forecaster for level {level} : "
                f"({self.transformer_series_[level]}). Ensure that the new "
                 "residuals are on the same scale as the original time series.")
            )
            residuals_level = transform_series(
                                  series            = pd.Series(residuals_level, name='residuals'),
                                  transformer       = self.transformer_series_[level],
                                  fit               = False,
                                  inverse_transform = False
                              ).to_numpy()

        if len(residuals_level) > 1000:
            rng = np.random.default_rng(seed=random_state)
            residuals_level = rng.choice(a=residuals_level, size=1000, replace=False)

        if append and self.out_sample_residuals[level] is not None:
            free_space = max(0, 1000 - len(self.out_sample_residuals[level]))
            if len(residuals_level) < free_space:
                residuals_level = np.hstack((
                                        self.out_sample_residuals[level],
                                        residuals_level
                                    ))
            else:
                residuals_level = np.hstack((
                                        self.out_sample_residuals[level],
                                        residuals_level[:free_space]
                                    ))

        self.out_sample_residuals[level] = residuals_level

set_params(self, params)

Set new values to the parameters of the scikit learn model stored in the

forecaster.

Parameters:

Name Type Description Default
params dict

Parameters values.

required
Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def set_params(
    self, 
    params: dict
) -> None:
    """
    Set new values to the parameters of the scikit learn model stored in the
    forecaster.

    Parameters
    ----------
    params : dict
        Parameters values.

    Returns 
    -------
    self

    """

    self.regressor = clone(self.regressor)
    self.regressor.set_params(**params)

_create_lags(self, y) private

Transforms a 1d array into a 2d array (X) and a 1d array (y). Each row

in X is associated with a value of y and it represents the lags that precede it.

Notice that, the returned matrix X_data, contains the lag 1 in the first column, the lag 2 in the second column and so on.

Parameters:

Name Type Description Default
y ndarray

Training time series.

required

Returns:

Type Description
Tuple[numpy.ndarray, numpy.ndarray]

2d numpy array with the lagged values (predictors).

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def _create_lags(
    self, 
    y: np.ndarray
) -> Tuple[np.ndarray, np.ndarray]:
    """       
    Transforms a 1d array into a 2d array (X) and a 1d array (y). Each row
    in X is associated with a value of y and it represents the lags that
    precede it.

    Notice that, the returned matrix X_data, contains the lag 1 in the first
    column, the lag 2 in the second column and so on.

    Parameters
    ----------        
    y : 1d numpy ndarray
        Training time series.

    Returns 
    -------
    X_data : 2d numpy ndarray, shape (samples - max(self.lags), len(self.lags))
        2d numpy array with the lagged values (predictors).

    y_data : 1d numpy ndarray, shape (samples - max(self.lags),)
        Values of the time series related to each row of `X_data`.

    """

    n_splits = len(y) - self.max_lag
    if n_splits <= 0:
        raise ValueError(
            f"The maximum lag ({self.max_lag}) must be less than the length "
            f"of the series ({len(y)})."
        )

    X_data = np.full(shape=(n_splits, len(self.lags)), fill_value=np.nan, dtype=float)

    for i, lag in enumerate(self.lags):
        X_data[:, i] = y[self.max_lag - lag: -lag]

    y_data = y[self.max_lag:]

    return X_data, y_data

_recursive_predict(self, steps, level, last_window, exog=None) private

Predict n steps ahead. It is an iterative process in which, each prediction,

is used as a predictor for the next step.

Parameters:

Name Type Description Default
steps int

Number of future steps predicted.

required
level str

Time series to be predicted.

required
last_window ndarray

Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1).

required
exog Optional[numpy.ndarray]

Exogenous variable/s included as predictor/s.

None

Returns:

Type Description
ndarray

Predicted values.

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def _recursive_predict(
    self,
    steps: int,
    level: str,
    last_window: np.ndarray,
    exog: Optional[np.ndarray]=None
) -> np.ndarray:
    """
    Predict n steps ahead. It is an iterative process in which, each prediction,
    is used as a predictor for the next step.

    Parameters
    ----------
    steps : int
        Number of future steps predicted.

    level : str
        Time series to be predicted.

    last_window : numpy ndarray
        Series values used to create the predictors (lags) needed in the 
        first iteration of the prediction (t + 1).

    exog : numpy ndarray, default `None`
        Exogenous variable/s included as predictor/s.

    Returns 
    -------
    predictions : numpy ndarray
        Predicted values.

    """

    predictions = np.full(shape=steps, fill_value=np.nan)

    for i in range(steps):
        X = last_window[-self.lags].reshape(1, -1)
        if exog is not None:
            X = np.column_stack((X, exog[i, ].reshape(1, -1)))

        levels_dummies = np.zeros(shape=(1, len(self.series_col_names)), dtype=float)
        levels_dummies[0][self.series_col_names.index(level)] = 1.

        X = np.column_stack((X, levels_dummies.reshape(1, -1)))

        with warnings.catch_warnings():
            # Suppress scikit-learn warning: "X does not have valid feature names,
            # but NoOpTransformer was fitted with feature names".
            warnings.simplefilter("ignore")
            prediction = self.regressor.predict(X)
            predictions[i] = prediction.ravel()[0]

        # Update `last_window` values. The first position is discarded and 
        # the new prediction is added at the end.
        last_window = np.append(last_window[1:], prediction)

    return predictions

__repr__(self) special

Information displayed when a ForecasterAutoregMultiSeries object is printed.

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def __repr__(
    self
) -> str:
    """
    Information displayed when a ForecasterAutoregMultiSeries object is printed.
    """

    if isinstance(self.regressor, sklearn.pipeline.Pipeline):
        name_pipe_steps = tuple(name + "__" for name in self.regressor.named_steps.keys())
        params = {key : value for key, value in self.regressor.get_params().items() \
                  if key.startswith(name_pipe_steps)}
    else:
        params = self.regressor.get_params()

    info = (
        f"{'=' * len(type(self).__name__)} \n"
        f"{type(self).__name__} \n"
        f"{'=' * len(type(self).__name__)} \n"
        f"Regressor: {self.regressor} \n"
        f"Lags: {self.lags} \n"
        f"Transformer for series: {self.transformer_series} \n"
        f"Transformer for exog: {self.transformer_exog} \n"
        f"Window size: {self.window_size} \n"
        f"Series levels (names): {self.series_col_names} \n"
        f"Series weights: {self.series_weights} \n"
        f"Weight function included: {True if self.weight_func is not None else False} \n"
        f"Exogenous included: {self.included_exog} \n"
        f"Type of exogenous variable: {self.exog_type} \n"
        f"Exogenous variables names: {self.exog_col_names} \n"
        f"Training range: {self.training_range.to_list() if self.fitted else None} \n"
        f"Training index type: {str(self.index_type).split('.')[-1][:-2] if self.fitted else None} \n"
        f"Training index frequency: {self.index_freq if self.fitted else None} \n"
        f"Regressor parameters: {params} \n"
        f"fit_kwargs: {self.fit_kwargs} \n"
        f"Creation date: {self.creation_date} \n"
        f"Last fit date: {self.fit_date} \n"
        f"Skforecast version: {self.skforcast_version} \n"
        f"Python version: {self.python_version} \n"
        f"Forecaster id: {self.forecaster_id} \n"
    )

    return info