Skip to content

ForecasterAutoregMultiSeries

ForecasterAutoregMultiSeries (ForecasterBase)

This class turns any regressor compatible with the scikit-learn API into a

recursive autoregressive (multi-step) forecaster for multiple series.

Parameters:

Name Type Description Default
regressor regressor or pipeline compatible with the scikit-learn API

An instance of a regressor or pipeline compatible with the scikit-learn API.

required
lags Union[int, numpy.ndarray, list]

Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. int: include lags from 1 to lags (included). list, numpy ndarray or range: include only lags present in lags, all elements must be int.

required
transformer_series transformer (preprocessor) compatible with the scikit-learn

preprocessing API or dict {level: transformer}, default None An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API with methods: fit, transform, fit_transform and inverse_transform. ColumnTransformers are not allowed since they do not have inverse_transform method. The transformation is applied to each level before training the forecaster.

None
transformer_exog transformer (preprocessor) compatible with the scikit-learn

preprocessing API, default None An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API. The transformation is applied to exog before training the forecaster. inverse_transform is not available when using ColumnTransformers.

None

Attributes:

Name Type Description
regressor regressor or pipeline compatible with the scikit-learn API

An instance of a regressor or pipeline compatible with the scikit-learn API.

lags numpy ndarray

Lags used as predictors.

transformer_series transformer (preprocessor) compatible with the scikit-learn

preprocessing API, default None An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API with methods: fit, transform, fit_transform and inverse_transform. ColumnTransformers are not allowed since they do not have inverse_transform method. The transformation is applied to each level before training the forecaster.

transformer_exog transformer (preprocessor) compatible with the scikit-learn

preprocessing API, default None An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API. The transformation is applied to exog before training the forecaster. inverse_transform is not available when using ColumnTransformers.

max_lag int

Maximum value of lag included in lags.

last_window pandas Series

Last window the forecaster has seen during trained. It stores the values needed to predict the next step right after the training data.

window_size int

Size of the window needed to create the predictors. It is equal to max_lag.

fitted Bool

Tag to identify if the regressor has been fitted (trained).

index_type type

Type of index of the input used in training.

index_freq str

Frequency of Index of the input used in training.

index_values pandas Index

Values of Index of the input used in training.

training_range pandas Index

First and last values of index of the data used during training.

included_exog bool

If the forecaster has been trained using exogenous variable/s.

exog_type type

Type of exogenous variable/s used in training.

exog_col_names list

Names of columns of exog if exog used in training was a pandas DataFrame.

series_levels list

Names of the columns (levels) that can be predicted.

X_train_col_names list

Names of columns of the matrix created internally for training.

in_sample_residuals dict

Residuals of the model when predicting training data. Only stored up to 1000 values in the form {level: residuals}.

out_sample_residuals pandas Series

Residuals of the model when predicting non training data. Only stored up to 1000 values. Use set_out_sample_residuals to set values.

creation_date str

Date of creation.

fit_date str

Date of last fit.

skforcast_version str

Version of skforecast library used to create the forecaster.

python_version str

Version of python used to create the forecaster.

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
class ForecasterAutoregMultiSeries(ForecasterBase):
    """
    This class turns any regressor compatible with the scikit-learn API into a
    recursive autoregressive (multi-step) forecaster for multiple series.

    Parameters
    ----------
    regressor : regressor or pipeline compatible with the scikit-learn API
        An instance of a regressor or pipeline compatible with the scikit-learn API.

    lags : int, list, 1d numpy ndarray, range
        Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1.
            `int`: include lags from 1 to `lags` (included).
            `list`, `numpy ndarray` or `range`: include only lags present in `lags`,
            all elements must be int.

    transformer_series : transformer (preprocessor) compatible with the scikit-learn
                         preprocessing API or `dict` {level: transformer}, default `None`
        An instance of a transformer (preprocessor) compatible with the scikit-learn
        preprocessing API with methods: fit, transform, fit_transform and inverse_transform.
        ColumnTransformers are not allowed since they do not have inverse_transform method.
        The transformation is applied to each `level` before training the forecaster.

    transformer_exog : transformer (preprocessor) compatible with the scikit-learn
                       preprocessing API, default `None`
        An instance of a transformer (preprocessor) compatible with the scikit-learn
        preprocessing API. The transformation is applied to `exog` before training the
        forecaster. `inverse_transform` is not available when using ColumnTransformers.

    Attributes
    ----------
    regressor : regressor or pipeline compatible with the scikit-learn API
        An instance of a regressor or pipeline compatible with the scikit-learn API.

    lags : numpy ndarray
        Lags used as predictors.

    transformer_series : transformer (preprocessor) compatible with the scikit-learn
                         preprocessing API, default `None`
        An instance of a transformer (preprocessor) compatible with the scikit-learn
        preprocessing API with methods: fit, transform, fit_transform and inverse_transform.
        ColumnTransformers are not allowed since they do not have inverse_transform method.
        The transformation is applied to each `level` before training the forecaster.

    transformer_exog : transformer (preprocessor) compatible with the scikit-learn
                       preprocessing API, default `None`
        An instance of a transformer (preprocessor) compatible with the scikit-learn
        preprocessing API. The transformation is applied to `exog` before training the
        forecaster. `inverse_transform` is not available when using ColumnTransformers.

    max_lag : int
        Maximum value of lag included in `lags`.

    last_window : pandas Series
        Last window the forecaster has seen during trained. It stores the
        values needed to predict the next `step` right after the training data.

    window_size: int
        Size of the window needed to create the predictors. It is equal to
        `max_lag`.

    fitted: Bool
        Tag to identify if the regressor has been fitted (trained).

    index_type : type
        Type of index of the input used in training.

    index_freq : str
        Frequency of Index of the input used in training.

    index_values : pandas Index
        Values of Index of the input used in training.

    training_range: pandas Index
        First and last values of index of the data used during training.

    included_exog : bool
        If the forecaster has been trained using exogenous variable/s.

    exog_type : type
        Type of exogenous variable/s used in training.

    exog_col_names : list
        Names of columns of `exog` if `exog` used in training was a pandas
        DataFrame.

    series_levels : list
        Names of the columns (levels) that can be predicted.

    X_train_col_names : list
        Names of columns of the matrix created internally for training.

    in_sample_residuals: dict
        Residuals of the model when predicting training data. Only stored up to
        1000 values in the form `{level: residuals}`.

    out_sample_residuals: pandas Series
        Residuals of the model when predicting non training data. Only stored
        up to 1000 values. Use `set_out_sample_residuals` to set values.

    creation_date: str
        Date of creation.

    fit_date: str
        Date of last fit.

    skforcast_version: str
        Version of skforecast library used to create the forecaster.

    python_version: str
        Version of python used to create the forecaster.

    """

    def __init__(
        self,
        regressor,
        lags: Union[int, np.ndarray, list],
        transformer_series = None,
        transformer_exog = None,
    ) -> None:

        self.regressor            = regressor
        self.transformer_series   = transformer_series
        self.transformer_exog     = transformer_exog
        self.index_type           = None
        self.index_freq           = None
        self.index_values         = None
        self.training_range       = None
        self.last_window          = None
        self.included_exog        = False
        self.exog_type            = None
        self.exog_col_names       = None
        self.series_levels        = None
        self.X_train_col_names    = None
        self.in_sample_residuals  = None
        self.out_sample_residuals = None
        self.fitted               = False
        self.creation_date        = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
        self.fit_date             = None
        self.skforcast_version    = skforecast.__version__
        self.python_version       = sys.version.split(" ")[0]

        if isinstance(lags, int) and lags < 1:
            raise Exception('Minimum value of lags allowed is 1.')

        if isinstance(lags, (list, range, np.ndarray)) and min(lags) < 1:
            raise Exception('Minimum value of lags allowed is 1.')

        if isinstance(lags, (list, np.ndarray)):
            for lag in lags:
                if not isinstance(lag, (int, np.int64, np.int32)):
                    raise Exception('Values in lags must be int.')

        if isinstance(lags, int):
            self.lags = np.arange(lags) + 1
        elif isinstance(lags, (list, range)):
            self.lags = np.array(lags)
        elif isinstance(lags, np.ndarray):
            self.lags = lags
        else:
            raise Exception(
                '`lags` argument must be int, 1d numpy ndarray, range or list. '
                f"Got {type(lags)}"
            )

        self.max_lag  = max(self.lags)
        self.window_size = self.max_lag


    def __repr__(
        self
    ) -> str:
        """
        Information displayed when a ForecasterAutoregMultiSeries object is printed.
        """

        if isinstance(self.regressor, sklearn.pipeline.Pipeline):
            name_pipe_steps = tuple(name + "__" for name in self.regressor.named_steps.keys())
            params = {key : value for key, value in self.regressor.get_params().items() \
                     if key.startswith(name_pipe_steps)}
        else:
            params = self.regressor.get_params()

        info = (
            f"{'=' * len(str(type(self)).split('.')[1])} \n"
            f"{str(type(self)).split('.')[1]} \n"
            f"{'=' * len(str(type(self)).split('.')[1])} \n"
            f"Regressor: {self.regressor} \n"
            f"Lags: {self.lags} \n"
            f"Transformer for series: {self.transformer_series} \n"
            f"Transformer for exog: {self.transformer_exog} \n"
            f"Window size: {self.window_size} \n"
            f"Series levels: {self.series_levels} \n"
            f"Included exogenous: {self.included_exog} \n"
            f"Type of exogenous variable: {self.exog_type} \n"
            f"Exogenous variables names: {self.exog_col_names} \n"
            f"Training range: {self.training_range.to_list() if self.fitted else None} \n"
            f"Training index type: {str(self.index_type).split('.')[-1][:-2] if self.fitted else None} \n"
            f"Training index frequency: {self.index_freq if self.fitted else None} \n"
            f"Regressor parameters: {params} \n"
            f"Creation date: {self.creation_date} \n"
            f"Last fit date: {self.fit_date} \n"
            f"Skforecast version: {self.skforcast_version} \n"
            f"Python version: {self.python_version} \n"
        )

        return info


    def _create_lags(
        self, 
        y: np.ndarray
    ) -> Tuple[np.ndarray, np.ndarray]:
        """       
        Transforms a 1d array into a 2d array (X) and a 1d array (y). Each row
        in X is associated with a value of y and it represents the lags that
        precede it.

        Notice that, the returned matrix X_data, contains the lag 1 in the first
        column, the lag 2 in the second column and so on.

        Parameters
        ----------        
        y : 1d numpy ndarray
            Training time series.

        Returns 
        -------
        X_data : 2d numpy ndarray, shape (samples - max(self.lags), len(self.lags))
            2d numpy array with the lagged values (predictors).

        y_data : 1d numpy ndarray, shape (samples - max(self.lags),)
            Values of the time series related to each row of `X_data`.

        """

        n_splits = len(y) - self.max_lag
        if n_splits <= 0:
            raise ValueError(
                f'The maximum lag ({self.max_lag}) must be less than the length '
                f'of the series ({len(y)}).'
            )

        X_data   = np.full(shape=(n_splits, self.max_lag), fill_value=np.nan, dtype=float)
        y_data   = np.full(shape=(n_splits, 1), fill_value=np.nan, dtype=float)

        for i in range(n_splits):
            X_index = np.arange(i, self.max_lag + i)
            y_index = [self.max_lag + i]
            X_data[i, :] = y[X_index]
            y_data[i]    = y[y_index]

        X_data = X_data[:, -self.lags] # Only keep needed lags
        y_data = y_data.ravel()

        return X_data, y_data


    def create_train_X_y(
        self,
        series: pd.DataFrame,
        exog: Optional[Union[pd.Series, pd.DataFrame]]=None
    ) -> Tuple[pd.DataFrame, pd.Series]:
        """
        Create training matrices from univariate time series and exogenous
        variables.

        Parameters
        ----------        
        series : pandas DataFrame
            Training time series.

        exog : pandas Series, pandas DataFrame, default `None`
            Exogenous variable/s included as predictor/s. Must have the same
            number of observations as `y` and their indexes must be aligned.

        Returns 
        -------
        X_train : pandas DataFrame
            Pandas DataFrame with the training values (predictors).

        y_train : pandas Series, shape (len(y) - self.max_lag, )
            Values (target) of the time series related to each row of `X_train`.

        """

        if not isinstance(series, pd.DataFrame):
            raise TypeError('`series` must be a pandas DataFrame.')

        # TODO: move this part to an auxiliary function
        #-------------------------------------------------------------------------------
        self.series_levels = list(series.columns)

        if self.transformer_series is None:
            dict_transformers = {level: None for level in self.series_levels}
            self.transformer_series = dict_transformers
        elif not isinstance(self.transformer_series, dict):
            dict_transformers = {level: clone(self.transformer_series) 
                                 for level in self.series_levels}
            self.transformer_series = dict_transformers
        else:
            if list(self.transformer_series.keys()) != self.series_levels:
                raise ValueError(
                    (f'When `transformer_series` parameter is a `dict`, its keys '
                     f'must be the same as `series_levels` : {self.series_levels}')
                )
        #-------------------------------------------------------------------------------

        X_levels = []
        X_train_col_names = [f"lag_{lag}" for lag in self.lags]

        for i, serie in enumerate(series.columns):

            y = series[serie]
            check_y(y=y)
            y = transform_series(
                    series            = y,
                    transformer       = self.transformer_series[serie],
                    fit               = True,
                    inverse_transform = False
                )
            y_values, y_index = preprocess_y(y=y)
            X_train_values, y_train_values = self._create_lags(y=y_values)

            if i == 0:
                X_train = X_train_values
                y_train = y_train_values
            else:
                X_train = np.vstack((X_train, X_train_values))
                y_train = np.append(y_train, y_train_values)

            X_level = [serie]*len(X_train_values)
            X_levels.extend(X_level)

        if exog is not None:
            if len(exog) != len(series):
                raise ValueError(
                    f'`exog` must have same number of samples as `series`. '
                    f'length `exog`: ({len(exog)}), length `series`: ({len(series)})'
                )
            check_exog(exog=exog)
            if isinstance(exog, pd.Series):
                exog = transform_series(
                            series            = exog,
                            transformer       = self.transformer_exog,
                            fit               = True,
                            inverse_transform = False
                       )
            else:
                exog = transform_dataframe(
                            df                = exog,
                            transformer       = self.transformer_exog,
                            fit               = True,
                            inverse_transform = False
                       )
            exog_values, exog_index = preprocess_exog(exog=exog)
            if not (exog_index[:len(y_index)] == y_index).all():
                raise ValueError(
                    ('Different index for `series` and `exog`. They must be equal '
                     'to ensure the correct alignment of values.')      
                )
            col_names_exog = exog.columns if isinstance(exog, pd.DataFrame) else [exog.name]
            X_train_col_names.extend(col_names_exog)

            # The first `self.max_lag` positions have to be removed from exog
            # since they are not in X_train. Then exog is cloned as many times
            # as series.
            if exog_values.ndim == 1:
                X_train = np.column_stack((
                            X_train,
                            np.tile(exog_values[self.max_lag:, ], series.shape[1])
                          )) 

            else:
                X_train = np.column_stack((
                            X_train,
                            np.tile(exog_values[self.max_lag:, ], [series.shape[1], 1])
                          ))

        X_levels = pd.Series(X_levels)
        X_levels = pd.get_dummies(X_levels, dtype=float)
        X_train_col_names.extend(X_levels.columns)
        X_train = np.column_stack((X_train, X_levels.values))

        X_train = pd.DataFrame(
                    data    = X_train,
                    columns = X_train_col_names
                )

        y_train = pd.Series(
                    data  = y_train,
                    name  = 'y'
                )

        self.X_train_col_names = X_train_col_names

        return X_train, y_train, y_index


    def fit(
        self,
        series: pd.DataFrame,
        exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
        store_in_sample_residuals: bool=True
    ) -> None:
        """
        Training Forecaster.

        Parameters
        ----------        
        series : pandas DataFrame
            Training time series.

        exog : pandas Series, pandas DataFrame, default `None`
            Exogenous variable/s included as predictor/s. Must have the same
            number of observations as `y` and their indexes must be aligned so
            that y[i] is regressed on exog[i].

        store_in_sample_residuals : bool, default `True`
            if True, in_sample_residuals are stored.

        Returns 
        -------
        None

        """

        # Reset values in case the forecaster has already been fitted.
        self.index_type           = None
        self.index_freq           = None
        self.index_values         = None
        self.last_window          = None
        self.included_exog        = False
        self.exog_type            = None
        self.exog_col_names       = None
        self.series_levels        = None
        self.X_train_col_names    = None
        self.in_sample_residuals  = None
        self.fitted               = False
        self.training_range       = None

        self.series_levels = list(series.columns)

        if self.transformer_series is None:
            dict_transformers = {level: None for level in self.series_levels}
            self.transformer_series = dict_transformers
        elif not isinstance(self.transformer_series, dict):
            dict_transformers = {level: clone(self.transformer_series) 
                                 for level in self.series_levels}
            self.transformer_series = dict_transformers
        else:
            if list(self.transformer_series.keys()) != self.series_levels:
                raise ValueError(
                    (f'When `transformer_series` parameter is a `dict`, its keys '
                     f'must be the same as `series_levels` : {self.series_levels}')
                )

        if exog is not None:
            self.included_exog = True
            self.exog_type = type(exog)
            self.exog_col_names = \
                 exog.columns.to_list() if isinstance(exog, pd.DataFrame) else exog.name

        X_train, y_train, y_index = self.create_train_X_y(series=series, exog=exog)

        if not str(type(self.regressor)) == "<class 'xgboost.sklearn.XGBRegressor'>":
            self.regressor.fit(X=X_train, y=y_train)
        else:
            self.regressor.fit(X=X_train.to_numpy(), y=y_train.to_numpy())

        self.fitted = True
        self.fit_date = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
        self.training_range = y_index[[0, -1]]
        self.index_type = type(y_index)
        if isinstance(y_index, pd.DatetimeIndex):
            self.index_freq = y_index.freqstr
        else: 
            self.index_freq = y_index.step
        self.index_values = y_index

        residuals_dict = {}

        # This is done to save time during fit in functions such as backtesting()
        if store_in_sample_residuals:

            if not str(type(self.regressor)) == "<class 'xgboost.sklearn.XGBRegressor'>":
                residuals = y_train - self.regressor.predict(X_train)
            else:
                residuals = y_train - self.regressor.predict(X_train.to_numpy())

            for serie in series.columns:
                residuals_dict[serie] = residuals.values[X_train[serie] == 1.]
                if len(residuals_dict[serie]) > 1000:
                    # Only up to 1000 residuals are stored
                    rng = np.random.default_rng(seed=123)
                    residuals_dict[serie] = rng.choice(
                                                a       = residuals_dict[serie], 
                                                size    = 1000, 
                                                replace = False
                                            )
        else:
            for serie in series.columns:
                residuals_dict[serie] = np.array([None])

        self.in_sample_residuals = residuals_dict

        # The last time window of training data is stored so that lags needed as
        # predictors in the first iteration of `predict()` can be calculated.
        self.last_window = series.iloc[-self.max_lag:, ].copy()


    def _recursive_predict(
        self,
        steps: int,
        level: str,
        last_window: np.ndarray,
        exog: np.ndarray
    ) -> np.ndarray:
        """
        Predict n steps ahead. It is an iterative process in which, each prediction,
        is used as a predictor for the next step.

        Parameters
        ----------
        steps : int
            Number of future steps predicted.

        level : str
            Time series to be predicted.

        last_window : numpy ndarray
            Values of the series used to create the predictors (lags) need in the 
            first iteration of prediction (t + 1).

        exog : numpy ndarray, pandas DataFrame
            Exogenous variable/s included as predictor/s.

        Returns 
        -------
        predictions : numpy ndarray
            Predicted values.

        """

        predictions = np.full(shape=steps, fill_value=np.nan)

        for i in range(steps):
            X = last_window[-self.lags].reshape(1, -1)
            if exog is not None:
                X = np.column_stack((X, exog[i, ].reshape(1, -1)))

            levels_dummies = np.zeros(shape=(1, len(self.series_levels)), dtype=float)
            levels_dummies[0][self.series_levels.index(level)] = 1.

            X = np.column_stack((X, levels_dummies.reshape(1, -1)))

            with warnings.catch_warnings():
                # Suppress scikit-learn warning: "X does not have valid feature names,
                # but NoOpTransformer was fitted with feature names".
                warnings.simplefilter("ignore")
                prediction = self.regressor.predict(X)
                predictions[i] = prediction.ravel()[0]

            # Update `last_window` values. The first position is discarded and 
            # the new prediction is added at the end.
            last_window = np.append(last_window[1:], prediction)

        return predictions


    def predict(
        self,
        steps: int,
        level: str,
        last_window: Optional[pd.DataFrame]=None,
        exog: Optional[Union[pd.Series, pd.DataFrame]]=None
    ) -> pd.Series:
        """
        Predict n steps ahead. It is an recursive process in which, each prediction,
        is used as a predictor for the next step.

        Parameters
        ----------
        steps : int
            Number of future steps predicted.

        level : str
            Time series to be predicted.

        last_window : pandas DataFrame, default `None`
            Values of the series used to create the predictors (lags) need in the 
            first iteration of prediction (t + 1).

            If `last_window = None`, the values stored in `self.last_window` are
            used to calculate the initial predictors, and the predictions start
            right after training data.

        exog : pandas Series, pandas DataFrame, default `None`
            Exogenous variable/s included as predictor/s.

        Returns 
        -------
        predictions : pandas Series
            Predicted values.

        """

        check_predict_input(
            forecaster_type = type(self),
            steps           = steps,
            fitted          = self.fitted,
            included_exog   = self.included_exog,
            index_type      = self.index_type,
            index_freq      = self.index_freq,
            window_size     = self.window_size,
            last_window     = last_window,
            exog            = exog,
            exog_type       = self.exog_type,
            exog_col_names  = self.exog_col_names,
            interval        = None,
            max_steps       = None,
            level           = level,
            series_levels   = self.series_levels
        )

        if exog is not None:
            if isinstance(exog, pd.DataFrame):
                exog = transform_dataframe(
                            df                = exog,
                            transformer       = self.transformer_exog,
                            fit               = False,
                            inverse_transform = False
                       )
            else:
                exog = transform_series(
                            series            = exog,
                            transformer       = self.transformer_exog,
                            fit               = False,
                            inverse_transform = False
                       )

            exog_values, _ = preprocess_exog(
                                exog = exog.iloc[:steps, ]
                             )
        else:
            exog_values = None

        if last_window is None:
            last_window = self.last_window[level]

        last_window = transform_series(
                            series            = last_window,
                            transformer       = self.transformer_series[level],
                            fit               = False,
                            inverse_transform = False
                      )
        last_window_values, last_window_index = preprocess_last_window(
                                                    last_window = last_window
                                                )

        predictions = self._recursive_predict(
                        steps       = steps,
                        level       = level,
                        last_window = copy(last_window_values),
                        exog        = copy(exog_values)
                      )

        predictions = pd.Series(
                        data  = predictions,
                        index = expand_index(
                                    index = last_window_index,
                                    steps = steps
                                ),
                        name = 'pred'
                      )

        predictions = transform_series(
                        series            = predictions,
                        transformer       = self.transformer_series[level],
                        fit               = False,
                        inverse_transform = True
                      )

        return predictions


    def _estimate_boot_interval(
        self,
        steps: int,
        level: str,
        last_window: Optional[np.ndarray]=None,
        exog: Optional[np.ndarray]=None,
        interval: list=[5, 95],
        n_boot: int=500,
        random_state: int=123,
        in_sample_residuals: bool=True
    ) -> np.ndarray:
        """
        Iterative process in which, each prediction, is used as a predictor
        for the next step and bootstrapping is used to estimate prediction
        intervals. This method only returns prediction intervals.
        See predict_intervals() to calculate both, predictions and intervals.

        Parameters
        ----------   
        steps : int
            Number of future steps predicted.

        level : str
            Time series to be predicted.

        last_window : 1d numpy ndarray shape (, max_lag), default `None`
            Values of the series used to create the predictors (lags) needed in the 
            first iteration of prediction (t + 1).

            If `last_window = `None`, the values stored in` self.last_window` are
            used to calculate the initial predictors, and the predictions start
            right after training data.

        exog : numpy ndarray, default `None`
            Exogenous variable/s included as predictor/s.

        n_boot : int, default `500`
            Number of bootstrapping iterations used to estimate prediction
            intervals.

        random_state : int
            Sets a seed to the random generator, so that boot intervals are always 
            deterministic.

        interval : list, default `[5, 95]`
            Confidence of the prediction interval estimated. Sequence of 
            percentiles to compute, which must be between 0 and 100 inclusive. 
            For example, interval of 95% should be as `interval = [2.5, 97.5]`.

        in_sample_residuals : bool, default `True`
            If `True`, residuals from the training data are used as proxy of
            prediction error to create prediction intervals. If `False`, out of
            sample residuals are used. In the latter case, the user should have
            calculated and stored the residuals within the forecaster (see
            `set_out_sample_residuals()`).

        Returns 
        -------
        prediction_interval : numpy ndarray, shape (steps, 2)
            Interval estimated for each prediction by bootstrapping:
                first column = lower bound of the interval.
                second column= upper bound interval of the interval.

        Notes
        -----
        More information about prediction intervals in forecasting:
        https://otexts.com/fpp2/prediction-intervals.html
        Forecasting: Principles and Practice (2nd ed) Rob J Hyndman and
        George Athanasopoulos.

        """

        if last_window is None:
            last_window = self.last_window[level]
            last_window = last_window.values

        boot_predictions = np.full(
                                shape      = (steps, n_boot),
                                fill_value = np.nan,
                                dtype      = float
                           )
        rng = np.random.default_rng(seed=random_state)
        seeds = rng.integers(low=0, high=10000, size=n_boot)

        for i in range(n_boot):
            # In each bootstraping iteration the initial last_window and exog 
            # need to be restored.
            last_window_boot = last_window.copy()
            if exog is not None:
                exog_boot = exog.copy()
            else:
                exog_boot = None

            if in_sample_residuals:
                residuals = self.in_sample_residuals[level]
            else:
                residuals = self.out_sample_residuals

            rng = np.random.default_rng(seed=seeds[i])
            sample_residuals = rng.choice(
                                    a       = residuals,
                                    size    = steps,
                                    replace = True
                               )

            for step in range(steps):
                prediction = self._recursive_predict(
                                steps       = 1,
                                level       = level,
                                last_window = last_window_boot,
                                exog        = exog_boot 
                             )

                prediction_with_residual  = prediction + sample_residuals[step]
                boot_predictions[step, i] = prediction_with_residual

                last_window_boot = np.append(
                                       last_window_boot[1:],
                                       prediction_with_residual
                                   )

                if exog is not None:
                    exog_boot = exog_boot[1:]

        prediction_interval = np.percentile(boot_predictions, q=interval, axis=1)
        prediction_interval = prediction_interval.transpose()

        return prediction_interval


    def predict_interval(
        self,
        steps: int,
        level: str,
        last_window: Optional[pd.DataFrame]=None,
        exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
        interval: list=[5, 95],
        n_boot: int=500,
        random_state: int=123,
        in_sample_residuals: bool=True
    ) -> pd.DataFrame:
        """
        Iterative process in which, each prediction, is used as a predictor
        for the next step and bootstrapping is used to estimate prediction
        intervals. Both, predictions and intervals, are returned.

        Parameters
        ---------- 
        steps : int
            Number of future steps predicted.

        level : str
            Time series to be predicted.        

        last_window : pandas DataFrame, default `None`
            Values of the series used to create the predictors (lags) needed in the 
            first iteration of prediction (t + 1).

            If `last_window = None`, the values stored in` self.last_window` are
            used to calculate the initial predictors, and the predictions start
            right after training data.

        exog : pandas Series, pandas DataFrame, default `None`
            Exogenous variable/s included as predictor/s.

        interval : list, default `[5, 95]`
            Confidence of the prediction interval estimated. Sequence of 
            percentiles to compute, which must be between 0 and 100 inclusive. 
            For example, interval of 95% should be as `interval = [2.5, 97.5]`.

        n_boot : int, default `500`
            Number of bootstrapping iterations used to estimate prediction
            intervals.

        random_state : int, default 123
            Sets a seed to the random generator, so that boot intervals are always 
            deterministic.

        in_sample_residuals : bool, default `True`
            If `True`, residuals from the training data are used as proxy of
            prediction error to create prediction intervals. If `False`, out of
            sample residuals are used. In the latter case, the user should have
            calculated and stored the residuals within the forecaster (see
            `set_out_sample_residuals()`).

        Returns 
        -------
        predictions : pandas DataFrame
            Values predicted by the forecaster and their estimated interval:
                column pred = predictions.
                column lower_bound = lower bound of the interval.
                column upper_bound = upper bound interval of the interval.

        Notes
        -----
        More information about prediction intervals in forecasting:
        https://otexts.com/fpp2/prediction-intervals.html
        Forecasting: Principles and Practice (2nd ed) Rob J Hyndman and
        George Athanasopoulos.

        """

        if in_sample_residuals and (self.in_sample_residuals[level] == None).any():
            raise ValueError(
                ('`forecaster.in_sample_residuals[level]` contains `None` values. '
                 'Try using `fit` method with `in_sample_residuals=True` or set in '
                 '`predict_interval` method `in_sample_residuals=False` and use '
                 '`out_sample_residuals` (see `set_out_sample_residuals()`).')
            )

        check_predict_input(
            forecaster_type = type(self),
            steps           = steps,
            fitted          = self.fitted,
            included_exog   = self.included_exog,
            index_type      = self.index_type,
            index_freq      = self.index_freq,
            window_size     = self.window_size,
            last_window     = last_window,
            exog            = exog,
            exog_type       = self.exog_type,
            exog_col_names  = self.exog_col_names,
            interval        = interval,
            max_steps       = None,
            level           = level,
            series_levels   = self.series_levels
        ) 

        if exog is not None:
            if isinstance(exog, pd.DataFrame):
                exog = transform_dataframe(
                            df                = exog,
                            transformer       = self.transformer_exog,
                            fit               = False,
                            inverse_transform = False
                       )
            else:
                exog = transform_series(
                            series            = exog,
                            transformer       = self.transformer_exog,
                            fit               = False,
                            inverse_transform = False
                       )

            exog_values, _ = preprocess_exog(
                                exog = exog.iloc[:steps, ]
                                )
        else:
            exog_values = None

        if last_window is None:
            last_window = self.last_window[level]

        last_window = transform_series(
                            series            = last_window,
                            transformer       = self.transformer_series[level],
                            fit               = False,
                            inverse_transform = False
                      )
        last_window_values, last_window_index = preprocess_last_window(
                                                    last_window = last_window
                                                )

        # Since during predict() `last_window_values` and `exog_values` are modified,
        # the originals are stored to be used later.
        last_window_values_original = last_window_values.copy()
        if exog is not None:
            exog_values_original = exog_values.copy()
        else:
            exog_values_original = None

        predictions = self._recursive_predict(
                            steps       = steps,
                            level       = level,
                            last_window = last_window_values,
                            exog        = exog_values
                      )

        predictions_interval = self._estimate_boot_interval(
                                    steps       = steps,
                                    level       = level,
                                    last_window = copy(last_window_values_original),
                                    exog        = copy(exog_values_original),
                                    interval    = interval,
                                    n_boot      = n_boot,
                                    random_state = random_state,
                                    in_sample_residuals = in_sample_residuals
                               )

        predictions = np.column_stack((predictions, predictions_interval))

        predictions = pd.DataFrame(
                        data = predictions,
                        index = expand_index(
                                    index = last_window_index,
                                    steps = steps
                                ),
                        columns = ['pred', 'lower_bound', 'upper_bound']
                      )

        if self.transformer_series[level]:
            for col in predictions.columns:
                predictions[col] = self.transformer_series[level].inverse_transform(predictions[[col]])

        return predictions


    def set_params(
        self, 
        **params: dict
    ) -> None:
        """
        Set new values to the parameters of the scikit learn model stored in the
        ForecasterAutoreg.

        Parameters
        ----------
        params : dict
            Parameters values.

        Returns 
        -------
        self

        """

        self.regressor = clone(self.regressor)
        self.regressor.set_params(**params)


    def set_lags(
        self, 
        lags: Union[int, list, np.ndarray, range]
    ) -> None:
        """      
        Set new value to the attribute `lags`.
        Attributes `max_lag` and `window_size` are also updated.

        Parameters
        ----------
        lags : int, list, 1D np.array, range
            Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1.
                `int`: include lags from 1 to `lags`.
                `list` or `np.array`: include only lags present in `lags`.

        Returns 
        -------
        None

        """

        if isinstance(lags, int) and lags < 1:
            raise Exception('min value of lags allowed is 1')

        if isinstance(lags, (list, range, np.ndarray)) and min(lags) < 1:
            raise Exception('min value of lags allowed is 1')

        if isinstance(lags, int):
            self.lags = np.arange(lags) + 1
        elif isinstance(lags, (list, range)):
            self.lags = np.array(lags)
        elif isinstance(lags, np.ndarray):
            self.lags = lags
        else:
            raise Exception(
                f"`lags` argument must be `int`, `1D np.ndarray`, `range` or `list`. "
                f"Got {type(lags)}"
            )

        self.max_lag  = max(self.lags)
        self.window_size = max(self.lags)


    def set_out_sample_residuals(
        self, 
        residuals: pd.Series,
        level: str,
        append: bool=True,
        transform: bool=True
    )-> None:
        """
        Set new values to the attribute `out_sample_residuals`. Out of sample
        residuals are meant to be calculated using observations that did not
        participate in the training process.

        Parameters
        ----------
        residuals : pd.Series
            Values of residuals. If len(residuals) > 1000, only a random sample
            of 1000 values are stored.

        level : str
            Time series to which the out sample residues belong.

        append : bool, default `True`
            If `True`, new residuals are added to the once already stored in the
            attribute `out_sample_residuals`. Once the limit of 1000 values is
            reached, no more values are appended. If False, `out_sample_residuals`
            is overwritten with the new residuals.

        transform : bool, default `True`
            If `True`, new residuals are transformed using self.transformer_series.

        Returns 
        -------
        self

        """

        if not isinstance(residuals, pd.Series):
            raise TypeError(
                f"`residuals` argument must be `pd.Series`. Got {type(residuals)}"
            )

        if level not in self.series_levels:
            raise ValueError(
                f'`level` must be one of the `series_levels` : {self.series_levels}'
            )

        if not transform and self.transformer_series[level] is not None:
            warnings.warn(
                f'''
                Argument `transform` is set to `False` but forecaster was trained
                using a transformer {self.transformer_series[level]} for level {level}.
                Ensure that new residuals are already transformed or set `transform=True`.
                '''
            )

        if transform and self.transformer_series and self.transformer_series[level]:
            warnings.warn(
                f'''
                Residuals will be transformed using the same transformer used 
                when training the forecaster for level {level} ({self.transformer_y}).
                Ensure that new residuals are in the same scale as the original time
                series.
                '''
            )

            residuals = transform_series(
                            series            = residuals,
                            transformer       = self.transformer_series[level],
                            fit               = False,
                            inverse_transform = False
                        ) 

        if len(residuals) > 1000:
            rng = np.random.default_rng(seed=123)
            residuals = rng.choice(a=residuals, size=1000, replace=False)
            residuals = pd.Series(residuals)   

        if append and self.out_sample_residuals is not None:
            free_space = max(0, 1000 - len(self.out_sample_residuals))
            if len(residuals) < free_space:
                residuals = np.hstack((
                                self.out_sample_residuals,
                                residuals
                            ))
            else:
                residuals = np.hstack((
                                self.out_sample_residuals,
                                residuals[:free_space]
                            ))

        self.out_sample_residuals = pd.Series(residuals)


    def get_feature_importance(
        self
    ) -> pd.DataFrame:
        """      
        Return feature importance of the regressor stored in the
        forecaster. Only valid when regressor stores internally the feature
        importance in the attribute `feature_importances_` or `coef_`.

        Parameters
        ----------
        self

        Returns
        -------
        feature_importance : pandas DataFrame
            Feature importance associated with each predictor.

        """

        if self.fitted == False:
            raise sklearn.exceptions.NotFittedError(
                "This forecaster is not fitted yet. Call `fit` with appropriate "
                "arguments before using `get_feature_importance()`."
            )

        if isinstance(self.regressor, sklearn.pipeline.Pipeline):
            estimator = self.regressor[-1]
        else:
            estimator = self.regressor

        try:
            feature_importance = pd.DataFrame({
                                    'feature': self.X_train_col_names,
                                    'importance' : estimator.feature_importances_
                                })
        except:   
            try:
                feature_importance = pd.DataFrame({
                                        'feature': self.X_train_col_names,
                                        'importance' : estimator.coef_
                                    })
            except:
                warnings.warn(
                    f"Impossible to access feature importance for regressor of type {type(estimator)}. "
                    f"This method is only valid when the regressor stores internally "
                    f"the feature importance in the attribute `feature_importances_` "
                    f"or `coef_`."
                )

                feature_importance = None

        return feature_importance

create_train_X_y(self, series, exog=None)

Create training matrices from univariate time series and exogenous

variables.

Parameters:

Name Type Description Default
series DataFrame

Training time series.

required
exog Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Exogenous variable/s included as predictor/s. Must have the same number of observations as y and their indexes must be aligned.

None

Returns:

Type Description
Tuple[pandas.core.frame.DataFrame, pandas.core.series.Series]

Pandas DataFrame with the training values (predictors).

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def create_train_X_y(
    self,
    series: pd.DataFrame,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None
) -> Tuple[pd.DataFrame, pd.Series]:
    """
    Create training matrices from univariate time series and exogenous
    variables.

    Parameters
    ----------        
    series : pandas DataFrame
        Training time series.

    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `y` and their indexes must be aligned.

    Returns 
    -------
    X_train : pandas DataFrame
        Pandas DataFrame with the training values (predictors).

    y_train : pandas Series, shape (len(y) - self.max_lag, )
        Values (target) of the time series related to each row of `X_train`.

    """

    if not isinstance(series, pd.DataFrame):
        raise TypeError('`series` must be a pandas DataFrame.')

    # TODO: move this part to an auxiliary function
    #-------------------------------------------------------------------------------
    self.series_levels = list(series.columns)

    if self.transformer_series is None:
        dict_transformers = {level: None for level in self.series_levels}
        self.transformer_series = dict_transformers
    elif not isinstance(self.transformer_series, dict):
        dict_transformers = {level: clone(self.transformer_series) 
                             for level in self.series_levels}
        self.transformer_series = dict_transformers
    else:
        if list(self.transformer_series.keys()) != self.series_levels:
            raise ValueError(
                (f'When `transformer_series` parameter is a `dict`, its keys '
                 f'must be the same as `series_levels` : {self.series_levels}')
            )
    #-------------------------------------------------------------------------------

    X_levels = []
    X_train_col_names = [f"lag_{lag}" for lag in self.lags]

    for i, serie in enumerate(series.columns):

        y = series[serie]
        check_y(y=y)
        y = transform_series(
                series            = y,
                transformer       = self.transformer_series[serie],
                fit               = True,
                inverse_transform = False
            )
        y_values, y_index = preprocess_y(y=y)
        X_train_values, y_train_values = self._create_lags(y=y_values)

        if i == 0:
            X_train = X_train_values
            y_train = y_train_values
        else:
            X_train = np.vstack((X_train, X_train_values))
            y_train = np.append(y_train, y_train_values)

        X_level = [serie]*len(X_train_values)
        X_levels.extend(X_level)

    if exog is not None:
        if len(exog) != len(series):
            raise ValueError(
                f'`exog` must have same number of samples as `series`. '
                f'length `exog`: ({len(exog)}), length `series`: ({len(series)})'
            )
        check_exog(exog=exog)
        if isinstance(exog, pd.Series):
            exog = transform_series(
                        series            = exog,
                        transformer       = self.transformer_exog,
                        fit               = True,
                        inverse_transform = False
                   )
        else:
            exog = transform_dataframe(
                        df                = exog,
                        transformer       = self.transformer_exog,
                        fit               = True,
                        inverse_transform = False
                   )
        exog_values, exog_index = preprocess_exog(exog=exog)
        if not (exog_index[:len(y_index)] == y_index).all():
            raise ValueError(
                ('Different index for `series` and `exog`. They must be equal '
                 'to ensure the correct alignment of values.')      
            )
        col_names_exog = exog.columns if isinstance(exog, pd.DataFrame) else [exog.name]
        X_train_col_names.extend(col_names_exog)

        # The first `self.max_lag` positions have to be removed from exog
        # since they are not in X_train. Then exog is cloned as many times
        # as series.
        if exog_values.ndim == 1:
            X_train = np.column_stack((
                        X_train,
                        np.tile(exog_values[self.max_lag:, ], series.shape[1])
                      )) 

        else:
            X_train = np.column_stack((
                        X_train,
                        np.tile(exog_values[self.max_lag:, ], [series.shape[1], 1])
                      ))

    X_levels = pd.Series(X_levels)
    X_levels = pd.get_dummies(X_levels, dtype=float)
    X_train_col_names.extend(X_levels.columns)
    X_train = np.column_stack((X_train, X_levels.values))

    X_train = pd.DataFrame(
                data    = X_train,
                columns = X_train_col_names
            )

    y_train = pd.Series(
                data  = y_train,
                name  = 'y'
            )

    self.X_train_col_names = X_train_col_names

    return X_train, y_train, y_index

fit(self, series, exog=None, store_in_sample_residuals=True)

Training Forecaster.

Parameters:

Name Type Description Default
series DataFrame

Training time series.

required
exog Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Exogenous variable/s included as predictor/s. Must have the same number of observations as y and their indexes must be aligned so that y[i] is regressed on exog[i].

None
store_in_sample_residuals bool

if True, in_sample_residuals are stored.

True
Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def fit(
    self,
    series: pd.DataFrame,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
    store_in_sample_residuals: bool=True
) -> None:
    """
    Training Forecaster.

    Parameters
    ----------        
    series : pandas DataFrame
        Training time series.

    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `y` and their indexes must be aligned so
        that y[i] is regressed on exog[i].

    store_in_sample_residuals : bool, default `True`
        if True, in_sample_residuals are stored.

    Returns 
    -------
    None

    """

    # Reset values in case the forecaster has already been fitted.
    self.index_type           = None
    self.index_freq           = None
    self.index_values         = None
    self.last_window          = None
    self.included_exog        = False
    self.exog_type            = None
    self.exog_col_names       = None
    self.series_levels        = None
    self.X_train_col_names    = None
    self.in_sample_residuals  = None
    self.fitted               = False
    self.training_range       = None

    self.series_levels = list(series.columns)

    if self.transformer_series is None:
        dict_transformers = {level: None for level in self.series_levels}
        self.transformer_series = dict_transformers
    elif not isinstance(self.transformer_series, dict):
        dict_transformers = {level: clone(self.transformer_series) 
                             for level in self.series_levels}
        self.transformer_series = dict_transformers
    else:
        if list(self.transformer_series.keys()) != self.series_levels:
            raise ValueError(
                (f'When `transformer_series` parameter is a `dict`, its keys '
                 f'must be the same as `series_levels` : {self.series_levels}')
            )

    if exog is not None:
        self.included_exog = True
        self.exog_type = type(exog)
        self.exog_col_names = \
             exog.columns.to_list() if isinstance(exog, pd.DataFrame) else exog.name

    X_train, y_train, y_index = self.create_train_X_y(series=series, exog=exog)

    if not str(type(self.regressor)) == "<class 'xgboost.sklearn.XGBRegressor'>":
        self.regressor.fit(X=X_train, y=y_train)
    else:
        self.regressor.fit(X=X_train.to_numpy(), y=y_train.to_numpy())

    self.fitted = True
    self.fit_date = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
    self.training_range = y_index[[0, -1]]
    self.index_type = type(y_index)
    if isinstance(y_index, pd.DatetimeIndex):
        self.index_freq = y_index.freqstr
    else: 
        self.index_freq = y_index.step
    self.index_values = y_index

    residuals_dict = {}

    # This is done to save time during fit in functions such as backtesting()
    if store_in_sample_residuals:

        if not str(type(self.regressor)) == "<class 'xgboost.sklearn.XGBRegressor'>":
            residuals = y_train - self.regressor.predict(X_train)
        else:
            residuals = y_train - self.regressor.predict(X_train.to_numpy())

        for serie in series.columns:
            residuals_dict[serie] = residuals.values[X_train[serie] == 1.]
            if len(residuals_dict[serie]) > 1000:
                # Only up to 1000 residuals are stored
                rng = np.random.default_rng(seed=123)
                residuals_dict[serie] = rng.choice(
                                            a       = residuals_dict[serie], 
                                            size    = 1000, 
                                            replace = False
                                        )
    else:
        for serie in series.columns:
            residuals_dict[serie] = np.array([None])

    self.in_sample_residuals = residuals_dict

    # The last time window of training data is stored so that lags needed as
    # predictors in the first iteration of `predict()` can be calculated.
    self.last_window = series.iloc[-self.max_lag:, ].copy()

get_feature_importance(self)

Return feature importance of the regressor stored in the

forecaster. Only valid when regressor stores internally the feature importance in the attribute feature_importances_ or coef_.

Parameters:

Name Type Description Default
self None required

Returns:

Type Description
DataFrame

Feature importance associated with each predictor.

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def get_feature_importance(
    self
) -> pd.DataFrame:
    """      
    Return feature importance of the regressor stored in the
    forecaster. Only valid when regressor stores internally the feature
    importance in the attribute `feature_importances_` or `coef_`.

    Parameters
    ----------
    self

    Returns
    -------
    feature_importance : pandas DataFrame
        Feature importance associated with each predictor.

    """

    if self.fitted == False:
        raise sklearn.exceptions.NotFittedError(
            "This forecaster is not fitted yet. Call `fit` with appropriate "
            "arguments before using `get_feature_importance()`."
        )

    if isinstance(self.regressor, sklearn.pipeline.Pipeline):
        estimator = self.regressor[-1]
    else:
        estimator = self.regressor

    try:
        feature_importance = pd.DataFrame({
                                'feature': self.X_train_col_names,
                                'importance' : estimator.feature_importances_
                            })
    except:   
        try:
            feature_importance = pd.DataFrame({
                                    'feature': self.X_train_col_names,
                                    'importance' : estimator.coef_
                                })
        except:
            warnings.warn(
                f"Impossible to access feature importance for regressor of type {type(estimator)}. "
                f"This method is only valid when the regressor stores internally "
                f"the feature importance in the attribute `feature_importances_` "
                f"or `coef_`."
            )

            feature_importance = None

    return feature_importance

predict(self, steps, level, last_window=None, exog=None)

Predict n steps ahead. It is an recursive process in which, each prediction,

is used as a predictor for the next step.

Parameters:

Name Type Description Default
steps int

Number of future steps predicted.

required
level str

Time series to be predicted.

required
last_window Optional[pandas.core.frame.DataFrame]

Values of the series used to create the predictors (lags) need in the first iteration of prediction (t + 1).

If last_window = None, the values stored in self.last_window are used to calculate the initial predictors, and the predictions start right after training data.

None
exog Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Exogenous variable/s included as predictor/s.

None

Returns:

Type Description
Series

Predicted values.

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def predict(
    self,
    steps: int,
    level: str,
    last_window: Optional[pd.DataFrame]=None,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None
) -> pd.Series:
    """
    Predict n steps ahead. It is an recursive process in which, each prediction,
    is used as a predictor for the next step.

    Parameters
    ----------
    steps : int
        Number of future steps predicted.

    level : str
        Time series to be predicted.

    last_window : pandas DataFrame, default `None`
        Values of the series used to create the predictors (lags) need in the 
        first iteration of prediction (t + 1).

        If `last_window = None`, the values stored in `self.last_window` are
        used to calculate the initial predictors, and the predictions start
        right after training data.

    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s.

    Returns 
    -------
    predictions : pandas Series
        Predicted values.

    """

    check_predict_input(
        forecaster_type = type(self),
        steps           = steps,
        fitted          = self.fitted,
        included_exog   = self.included_exog,
        index_type      = self.index_type,
        index_freq      = self.index_freq,
        window_size     = self.window_size,
        last_window     = last_window,
        exog            = exog,
        exog_type       = self.exog_type,
        exog_col_names  = self.exog_col_names,
        interval        = None,
        max_steps       = None,
        level           = level,
        series_levels   = self.series_levels
    )

    if exog is not None:
        if isinstance(exog, pd.DataFrame):
            exog = transform_dataframe(
                        df                = exog,
                        transformer       = self.transformer_exog,
                        fit               = False,
                        inverse_transform = False
                   )
        else:
            exog = transform_series(
                        series            = exog,
                        transformer       = self.transformer_exog,
                        fit               = False,
                        inverse_transform = False
                   )

        exog_values, _ = preprocess_exog(
                            exog = exog.iloc[:steps, ]
                         )
    else:
        exog_values = None

    if last_window is None:
        last_window = self.last_window[level]

    last_window = transform_series(
                        series            = last_window,
                        transformer       = self.transformer_series[level],
                        fit               = False,
                        inverse_transform = False
                  )
    last_window_values, last_window_index = preprocess_last_window(
                                                last_window = last_window
                                            )

    predictions = self._recursive_predict(
                    steps       = steps,
                    level       = level,
                    last_window = copy(last_window_values),
                    exog        = copy(exog_values)
                  )

    predictions = pd.Series(
                    data  = predictions,
                    index = expand_index(
                                index = last_window_index,
                                steps = steps
                            ),
                    name = 'pred'
                  )

    predictions = transform_series(
                    series            = predictions,
                    transformer       = self.transformer_series[level],
                    fit               = False,
                    inverse_transform = True
                  )

    return predictions

predict_interval(self, steps, level, last_window=None, exog=None, interval=[5, 95], n_boot=500, random_state=123, in_sample_residuals=True)

Iterative process in which, each prediction, is used as a predictor

for the next step and bootstrapping is used to estimate prediction intervals. Both, predictions and intervals, are returned.

Parameters:

Name Type Description Default
steps int

Number of future steps predicted.

required
level str

Time series to be predicted.

required
last_window Optional[pandas.core.frame.DataFrame]

Values of the series used to create the predictors (lags) needed in the first iteration of prediction (t + 1).

If last_window = None, the values stored inself.last_window are used to calculate the initial predictors, and the predictions start right after training data.

None
exog Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Exogenous variable/s included as predictor/s.

None
interval list

Confidence of the prediction interval estimated. Sequence of percentiles to compute, which must be between 0 and 100 inclusive. For example, interval of 95% should be as interval = [2.5, 97.5].

[5, 95]
n_boot int

Number of bootstrapping iterations used to estimate prediction intervals.

500
random_state int

Sets a seed to the random generator, so that boot intervals are always deterministic.

123
in_sample_residuals bool

If True, residuals from the training data are used as proxy of prediction error to create prediction intervals. If False, out of sample residuals are used. In the latter case, the user should have calculated and stored the residuals within the forecaster (see set_out_sample_residuals()).

True

Returns:

Type Description
DataFrame

Values predicted by the forecaster and their estimated interval: column pred = predictions. column lower_bound = lower bound of the interval. column upper_bound = upper bound interval of the interval.

Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def predict_interval(
    self,
    steps: int,
    level: str,
    last_window: Optional[pd.DataFrame]=None,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
    interval: list=[5, 95],
    n_boot: int=500,
    random_state: int=123,
    in_sample_residuals: bool=True
) -> pd.DataFrame:
    """
    Iterative process in which, each prediction, is used as a predictor
    for the next step and bootstrapping is used to estimate prediction
    intervals. Both, predictions and intervals, are returned.

    Parameters
    ---------- 
    steps : int
        Number of future steps predicted.

    level : str
        Time series to be predicted.        

    last_window : pandas DataFrame, default `None`
        Values of the series used to create the predictors (lags) needed in the 
        first iteration of prediction (t + 1).

        If `last_window = None`, the values stored in` self.last_window` are
        used to calculate the initial predictors, and the predictions start
        right after training data.

    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s.

    interval : list, default `[5, 95]`
        Confidence of the prediction interval estimated. Sequence of 
        percentiles to compute, which must be between 0 and 100 inclusive. 
        For example, interval of 95% should be as `interval = [2.5, 97.5]`.

    n_boot : int, default `500`
        Number of bootstrapping iterations used to estimate prediction
        intervals.

    random_state : int, default 123
        Sets a seed to the random generator, so that boot intervals are always 
        deterministic.

    in_sample_residuals : bool, default `True`
        If `True`, residuals from the training data are used as proxy of
        prediction error to create prediction intervals. If `False`, out of
        sample residuals are used. In the latter case, the user should have
        calculated and stored the residuals within the forecaster (see
        `set_out_sample_residuals()`).

    Returns 
    -------
    predictions : pandas DataFrame
        Values predicted by the forecaster and their estimated interval:
            column pred = predictions.
            column lower_bound = lower bound of the interval.
            column upper_bound = upper bound interval of the interval.

    Notes
    -----
    More information about prediction intervals in forecasting:
    https://otexts.com/fpp2/prediction-intervals.html
    Forecasting: Principles and Practice (2nd ed) Rob J Hyndman and
    George Athanasopoulos.

    """

    if in_sample_residuals and (self.in_sample_residuals[level] == None).any():
        raise ValueError(
            ('`forecaster.in_sample_residuals[level]` contains `None` values. '
             'Try using `fit` method with `in_sample_residuals=True` or set in '
             '`predict_interval` method `in_sample_residuals=False` and use '
             '`out_sample_residuals` (see `set_out_sample_residuals()`).')
        )

    check_predict_input(
        forecaster_type = type(self),
        steps           = steps,
        fitted          = self.fitted,
        included_exog   = self.included_exog,
        index_type      = self.index_type,
        index_freq      = self.index_freq,
        window_size     = self.window_size,
        last_window     = last_window,
        exog            = exog,
        exog_type       = self.exog_type,
        exog_col_names  = self.exog_col_names,
        interval        = interval,
        max_steps       = None,
        level           = level,
        series_levels   = self.series_levels
    ) 

    if exog is not None:
        if isinstance(exog, pd.DataFrame):
            exog = transform_dataframe(
                        df                = exog,
                        transformer       = self.transformer_exog,
                        fit               = False,
                        inverse_transform = False
                   )
        else:
            exog = transform_series(
                        series            = exog,
                        transformer       = self.transformer_exog,
                        fit               = False,
                        inverse_transform = False
                   )

        exog_values, _ = preprocess_exog(
                            exog = exog.iloc[:steps, ]
                            )
    else:
        exog_values = None

    if last_window is None:
        last_window = self.last_window[level]

    last_window = transform_series(
                        series            = last_window,
                        transformer       = self.transformer_series[level],
                        fit               = False,
                        inverse_transform = False
                  )
    last_window_values, last_window_index = preprocess_last_window(
                                                last_window = last_window
                                            )

    # Since during predict() `last_window_values` and `exog_values` are modified,
    # the originals are stored to be used later.
    last_window_values_original = last_window_values.copy()
    if exog is not None:
        exog_values_original = exog_values.copy()
    else:
        exog_values_original = None

    predictions = self._recursive_predict(
                        steps       = steps,
                        level       = level,
                        last_window = last_window_values,
                        exog        = exog_values
                  )

    predictions_interval = self._estimate_boot_interval(
                                steps       = steps,
                                level       = level,
                                last_window = copy(last_window_values_original),
                                exog        = copy(exog_values_original),
                                interval    = interval,
                                n_boot      = n_boot,
                                random_state = random_state,
                                in_sample_residuals = in_sample_residuals
                           )

    predictions = np.column_stack((predictions, predictions_interval))

    predictions = pd.DataFrame(
                    data = predictions,
                    index = expand_index(
                                index = last_window_index,
                                steps = steps
                            ),
                    columns = ['pred', 'lower_bound', 'upper_bound']
                  )

    if self.transformer_series[level]:
        for col in predictions.columns:
            predictions[col] = self.transformer_series[level].inverse_transform(predictions[[col]])

    return predictions

set_lags(self, lags)

Set new value to the attribute lags.

Attributes max_lag and window_size are also updated.

Parameters:

Name Type Description Default
lags Union[int, list, numpy.ndarray, range]

Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. int: include lags from 1 to lags. list or np.array: include only lags present in lags.

required
Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def set_lags(
    self, 
    lags: Union[int, list, np.ndarray, range]
) -> None:
    """      
    Set new value to the attribute `lags`.
    Attributes `max_lag` and `window_size` are also updated.

    Parameters
    ----------
    lags : int, list, 1D np.array, range
        Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1.
            `int`: include lags from 1 to `lags`.
            `list` or `np.array`: include only lags present in `lags`.

    Returns 
    -------
    None

    """

    if isinstance(lags, int) and lags < 1:
        raise Exception('min value of lags allowed is 1')

    if isinstance(lags, (list, range, np.ndarray)) and min(lags) < 1:
        raise Exception('min value of lags allowed is 1')

    if isinstance(lags, int):
        self.lags = np.arange(lags) + 1
    elif isinstance(lags, (list, range)):
        self.lags = np.array(lags)
    elif isinstance(lags, np.ndarray):
        self.lags = lags
    else:
        raise Exception(
            f"`lags` argument must be `int`, `1D np.ndarray`, `range` or `list`. "
            f"Got {type(lags)}"
        )

    self.max_lag  = max(self.lags)
    self.window_size = max(self.lags)

set_out_sample_residuals(self, residuals, level, append=True, transform=True)

Set new values to the attribute out_sample_residuals. Out of sample

residuals are meant to be calculated using observations that did not participate in the training process.

Parameters:

Name Type Description Default
residuals Series

Values of residuals. If len(residuals) > 1000, only a random sample of 1000 values are stored.

required
level str

Time series to which the out sample residues belong.

required
append bool

If True, new residuals are added to the once already stored in the attribute out_sample_residuals. Once the limit of 1000 values is reached, no more values are appended. If False, out_sample_residuals is overwritten with the new residuals.

True
transform bool

If True, new residuals are transformed using self.transformer_series.

True
Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def set_out_sample_residuals(
    self, 
    residuals: pd.Series,
    level: str,
    append: bool=True,
    transform: bool=True
)-> None:
    """
    Set new values to the attribute `out_sample_residuals`. Out of sample
    residuals are meant to be calculated using observations that did not
    participate in the training process.

    Parameters
    ----------
    residuals : pd.Series
        Values of residuals. If len(residuals) > 1000, only a random sample
        of 1000 values are stored.

    level : str
        Time series to which the out sample residues belong.

    append : bool, default `True`
        If `True`, new residuals are added to the once already stored in the
        attribute `out_sample_residuals`. Once the limit of 1000 values is
        reached, no more values are appended. If False, `out_sample_residuals`
        is overwritten with the new residuals.

    transform : bool, default `True`
        If `True`, new residuals are transformed using self.transformer_series.

    Returns 
    -------
    self

    """

    if not isinstance(residuals, pd.Series):
        raise TypeError(
            f"`residuals` argument must be `pd.Series`. Got {type(residuals)}"
        )

    if level not in self.series_levels:
        raise ValueError(
            f'`level` must be one of the `series_levels` : {self.series_levels}'
        )

    if not transform and self.transformer_series[level] is not None:
        warnings.warn(
            f'''
            Argument `transform` is set to `False` but forecaster was trained
            using a transformer {self.transformer_series[level]} for level {level}.
            Ensure that new residuals are already transformed or set `transform=True`.
            '''
        )

    if transform and self.transformer_series and self.transformer_series[level]:
        warnings.warn(
            f'''
            Residuals will be transformed using the same transformer used 
            when training the forecaster for level {level} ({self.transformer_y}).
            Ensure that new residuals are in the same scale as the original time
            series.
            '''
        )

        residuals = transform_series(
                        series            = residuals,
                        transformer       = self.transformer_series[level],
                        fit               = False,
                        inverse_transform = False
                    ) 

    if len(residuals) > 1000:
        rng = np.random.default_rng(seed=123)
        residuals = rng.choice(a=residuals, size=1000, replace=False)
        residuals = pd.Series(residuals)   

    if append and self.out_sample_residuals is not None:
        free_space = max(0, 1000 - len(self.out_sample_residuals))
        if len(residuals) < free_space:
            residuals = np.hstack((
                            self.out_sample_residuals,
                            residuals
                        ))
        else:
            residuals = np.hstack((
                            self.out_sample_residuals,
                            residuals[:free_space]
                        ))

    self.out_sample_residuals = pd.Series(residuals)

set_params(self, **params)

Set new values to the parameters of the scikit learn model stored in the

ForecasterAutoreg.

Parameters:

Name Type Description Default
params dict

Parameters values.

{}
Source code in skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py
def set_params(
    self, 
    **params: dict
) -> None:
    """
    Set new values to the parameters of the scikit learn model stored in the
    ForecasterAutoreg.

    Parameters
    ----------
    params : dict
        Parameters values.

    Returns 
    -------
    self

    """

    self.regressor = clone(self.regressor)
    self.regressor.set_params(**params)
Back to top