`ForecasterRecursive`¶

skforecast.recursive._forecaster_recursive.ForecasterRecursive ¶


ForecasterRecursive(
    estimator,
    lags=None,
    window_features=None,
    calendar_features=None,
    transformer_y=None,
    transformer_exog=None,
    categorical_features="auto",
    weight_func=None,
    differentiation=None,
    dropna_from_series=False,
    fit_kwargs=None,
    binner_kwargs=None,
    forecaster_id=None,
)

Bases: ForecasterBase

This class turns any estimator compatible with the scikit-learn API into a recursive autoregressive (multi-step) forecaster.

Parameters:

Name	Type	Description	Default
`estimator`	`estimator or pipeline compatible with the scikit-learn API`	An instance of an estimator or pipeline compatible with the scikit-learn API.	required
`lags`	`int, list, numpy ndarray, range`	Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. `int`: include lags from 1 to `lags` (included). `list`, `1d numpy ndarray` or `range`: include only lags present in `lags`, all elements must be int. `None`: no lags are included as predictors.	`None`
`window_features`	`(object, list)`	Instance or list of instances used to create window features. Window features are created from the original time series and are included as predictors. Skforecast provides the `RollingFeatures` class, but a custom object can also be passed as long as it implements the required interface.	`None`
`calendar_features`	`object`	Instance of `CalendarFeatures` used to create calendar features from the datetime index. Calendar features are included as predictors and are generated automatically during both training and prediction. Only supported when the index of the input data is a `pandas.DatetimeIndex`. New in version 0.23.0	`None`
`transformer_y`	`object transformer (preprocessor)`	An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API with methods: fit, transform, fit_transform and inverse_transform. ColumnTransformers are not allowed since they do not have inverse_transform method. The transformation is applied to `y` before training the forecaster.	`None`
`transformer_exog`	`object transformer (preprocessor)`	An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API. The transformation is applied to `exog` before training the forecaster. `inverse_transform` is not available when using ColumnTransformers.	`None`
`categorical_features`	`(str, list)`	Specifies which exogenous variables should be treated as categorical features. Categorical features are encoded using an `OrdinalEncoder` internally managed by the forecaster. If `'auto'`: after applying `transformer_exog`, any column with a non-numeric dtype is treated as categorical. If `list`: a list of column names to be treated as categorical. If `None`: no categorical encoding is applied internally. New in version 0.22.0	`'auto'`
`weight_func`	`Callable`	Function that defines the individual weights for each sample based on the index. For example, a function that assigns a lower weight to certain dates. Ignored if `estimator` does not have the argument `sample_weight` in its `fit` method. The resulting `sample_weight` cannot have negative values.	`None`
`differentiation`	`int`	Order of differencing applied to the time series before training the forecaster. If `None`, no differencing is applied. The order of differentiation is the number of times the differencing operation is applied to a time series. Differencing involves computing the differences between consecutive data points in the series. Before returning a prediction, the differencing operation is reversed.	`None`
`dropna_from_series`	`bool`	Determine whether NaN detected in the training matrices will be dropped. Relevant when `y` or `exog` contain interspersed NaN values. If `True`, drop NaNs in `X_train` and same rows in `y_train`. If `False`, leave NaNs in `X_train` and warn the user.	`False`
`fit_kwargs`	`dict`	Additional arguments to be passed to the `fit` method of the estimator.	`None`
`binner_kwargs`	`dict`	Additional arguments to pass to the `QuantileBinner` used to discretize the residuals into k bins according to the predicted values associated with each residual. Available arguments are: `n_bins`, `method`, `subsample`, `random_state` and `dtype`. Argument `method` is passed internally to the function `numpy.percentile`. New in version 0.14.0	`None`
`forecaster_id`	`(str, int)`	Name used as an identifier of the forecaster.	`None`

Attributes:

Name	Type	Description
`estimator`	`estimator or pipeline compatible with the scikit-learn API`	An instance of an estimator or pipeline compatible with the scikit-learn API.
`lags`	`numpy ndarray`	Lags used as predictors.
`lags_names`	`list`	Names of the lags used as predictors.
`max_lag`	`int`	Maximum lag included in `lags`.
`window_features`	`list`	Class or list of classes used to create window features.
`window_features_names`	`list`	Names of the window features to be included in the `X_train` matrix.
`window_features_class_names`	`list`	Names of the classes used to create the window features.
`max_size_window_features`	`int`	Maximum window size required by the window features.
`calendar_features`	`object`	Instance of `CalendarFeatures` used to create calendar features from the datetime index.
`calendar_features_names`	`list`	Names of the calendar features to extract, taken from the `features` attribute of the `calendar_features` object.
`window_size`	`int`	The window size needed to create the predictors. It is calculated as the maximum value between `max_lag` and `max_size_window_features`. If differentiation is used, `window_size` is increased by n units equal to the order of differentiation so that predictors can be generated correctly.
`transformer_y`	`object transformer (preprocessor)`	An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API with methods: fit, transform, fit_transform and inverse_transform. ColumnTransformers are not allowed since they do not have inverse_transform method. The transformation is applied to `y` before training the forecaster.
`transformer_exog`	`object transformer (preprocessor)`	An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API. The transformation is applied to `exog` before training the forecaster. `inverse_transform` is not available when using ColumnTransformers.
`weight_func`	`Callable`	Function that defines the individual weights for each sample based on the index. For example, a function that assigns a lower weight to certain dates. Ignored if `estimator` does not have the argument `sample_weight` in its `fit` method. The resulting `sample_weight` cannot have negative values.
`source_code_weight_func`	`str`	Source code of the custom function used to create weights.
`differentiation`	`int`	Order of differencing applied to the time series before training the forecaster.
`differentiation_max`	`int`	Maximum order of differentiation. For this Forecaster, it is equal to the value of the `differentiation` parameter.
`differentiator`	`TimeSeriesDifferentiator`	Skforecast object used to differentiate the time series.
`dropna_from_series`	`bool`	Determine whether NaN detected in the training matrices will be dropped.
`last_window_`	`pandas DataFrame`	This window represents the most recent data observed by the predictor during its training phase. It contains the values needed to predict the next step immediately after the training data. These values are stored in the original scale of the time series before undergoing any transformations or differentiation. When `differentiation` parameter is specified, the dimensions of the `last_window_` are expanded as many values as the order of differentiation. For example, if `lags` = 7 and `differentiation` = 1, `last_window_` will have 8 values.
`index_type_`	`type`	Type of index of the input used in training.
`index_freq_`	`str`	Frequency of Index of the input used in training.
`training_range_`	`pandas Index`	First and last values of index of the data used during training.
`series_name_in_`	`str`	Name of the series provided by the user during training.
`exog_in_`	`bool`	If the forecaster has been trained using exogenous variable/s.
`exog_names_in_`	`list`	Names of the exogenous variables used during training.
`exog_type_in_`	`type`	Type of exogenous data (pandas Series or DataFrame) used in training.
`exog_dtypes_in_`	`dict`	Type of each exogenous variable/s used in training before the transformation applied by `transformer_exog`. If `transformer_exog` is not used, it is equal to `exog_dtypes_out_`.
`exog_dtypes_out_`	`dict`	Type of each exogenous variable/s used in training after the transformation applied by `transformer_exog`. If `transformer_exog` is not used, it is equal to `exog_dtypes_in_`.
`X_train_window_features_names_out_`	`list`	Names of the window features included in the matrix `X_train` created internally for training.
`X_train_calendar_features_names_out_`	`list`	Names of the calendar features included in the matrix `X_train` created internally for training.
`X_train_exog_names_out_`	`list`	Names of the exogenous variables included in the matrix `X_train` created internally for training. It can be different from `exog_names_in_` if some exogenous variables are transformed during the training process.
`X_train_features_names_out_`	`list`	Names of columns of the matrix created internally for training.
`categorical_features`	`(str, list)`	How categorical features are identified among the exogenous variables. It can be 'auto', a list of column names or `None`.
`categorical_features_names_in_`	`list`	Names of the exogenous variables considered as categorical.
`categorical_encoder`	`sklearn OrdinalEncoder`	`OrdinalEncoder` used internally to encode categorical features.
`fit_kwargs`	`dict`	Additional arguments to be passed to the `fit` method of the estimator.
`in_sample_residuals_`	`numpy ndarray`	Residuals of the model when predicting training data. Only stored up to 10_000 values. If `transformer_y` is not `None`, residuals are stored in the transformed scale. If `differentiation` is not `None`, residuals are stored after differentiation.
`in_sample_residuals_by_bin_`	`dict`	In-sample residuals binned according to the predicted value each residual is associated with. The number of residuals stored per bin is limited to `10_000 // self.binner.n_bins_` in the form `{bin: residuals}`. If `transformer_y` is not `None`, residuals are stored in the transformed scale. If `differentiation` is not `None`, residuals are stored after differentiation.
`out_sample_residuals_`	`numpy ndarray`	Residuals of the model when predicting non-training data. Only stored up to 10_000 values. Use `set_out_sample_residuals()` method to set values. If `transformer_y` is not `None`, residuals are stored in the transformed scale. If `differentiation` is not `None`, residuals are stored after differentiation.
`out_sample_residuals_by_bin_`	`dict`	Out of sample residuals binned according to the predicted value each residual is associated with. The number of residuals stored per bin is limited to `10_000 // self.binner.n_bins_` in the form `{bin: residuals}`. If `transformer_y` is not `None`, residuals are stored in the transformed scale. If `differentiation` is not `None`, residuals are stored after differentiation.
`binner`	`QuantileBinner`	`QuantileBinner` used to discretize residuals into k bins according to the predicted values associated with each residual.
`binner_intervals_`	`dict`	Intervals used to discretize residuals into k bins according to the predicted values associated with each residual.
`binner_kwargs`	`dict`	Additional arguments to pass to the `QuantileBinner`.
`creation_date`	`str`	Date of creation.
`is_fitted`	`bool`	Tag to identify if the estimator has been fitted (trained).
`fit_date`	`str`	Date of last fit.
`skforecast_version`	`str`	Version of skforecast library used to create the forecaster.
`python_version`	`str`	Version of python used to create the forecaster.
`forecaster_id`	`(str, int)`	Name used as an identifier of the forecaster.
`__skforecast_tags__`	`dict`	Tags associated with the forecaster.
`_probabilistic_mode`	`(str, bool)`	Private attribute used to indicate whether the forecaster should perform some calculations during backtesting.

Methods:

Name	Description
`create_train_X_y`	Create training matrices from univariate time series and exogenous
`create_sample_weights`	Create weights for each observation according to the forecaster's attribute
`fit`	Training Forecaster.
`create_predict_X`	Create the predictors needed to predict `steps` ahead. As it is a recursive
`predict`	Predict n steps ahead. It is a recursive process in which, each prediction,
`predict_bootstrapping`	Generate multiple forecasting predictions using a bootstrapping process.
`predict_interval`	Predict n steps ahead and estimate prediction intervals using either
`predict_quantiles`	Calculate the specified quantiles for each step. After generating
`predict_dist`	Fit a given probability distribution for each step. After generating
`set_params`	Set new values to the parameters of the scikit-learn model stored in the
`set_lags`	Set new value to the attribute `lags`. Attributes `lags_names`,
`set_window_features`	Set new value to the attribute `window_features`. Attributes
`set_fit_kwargs`	Set new values for the additional keyword arguments passed to the `fit`
`set_in_sample_residuals`	Set in-sample residuals in case they were not calculated during the
`set_out_sample_residuals`	Set new values to the attribute `out_sample_residuals_`. Out of sample
`get_feature_importances`	Return feature importances of the estimator stored in the forecaster.

Source code in skforecast\recursive\_forecaster_recursive.py

def __init__(
    self,
    estimator: object,
    lags: int | list[int] | np.ndarray[int] | range[int] | None = None,
    window_features: object | list[object] | None = None,
    calendar_features: object | None = None,
    transformer_y: object | None = None,
    transformer_exog: object | None = None,
    categorical_features: str | list[str] | None = 'auto',
    weight_func: Callable | None = None,
    differentiation: int | None = None,
    dropna_from_series: bool = False,
    fit_kwargs: dict[str, object] | None = None,
    binner_kwargs: dict[str, object] | None = None,
    forecaster_id: str | int | None = None
) -> None:

    self.estimator                            = clone(estimator)
    self.calendar_features                    = (
        clone(calendar_features) if calendar_features is not None else None
    )
    self.calendar_features_names              = getattr(calendar_features, 'features', None)
    self.transformer_y                        = transformer_y
    self.transformer_exog                     = transformer_exog
    self.categorical_features                 = categorical_features
    self.weight_func                          = weight_func
    self.source_code_weight_func              = None
    self.differentiation                      = differentiation
    self.differentiation_max                  = None
    self.differentiator                       = None
    self.dropna_from_series                   = dropna_from_series
    self.last_window_                         = None
    self.index_type_                          = None
    self.index_freq_                          = None
    self.training_range_                      = None
    self.series_name_in_                      = None
    self.exog_in_                             = False
    self.exog_names_in_                       = None
    self.exog_type_in_                        = None
    self.exog_dtypes_in_                      = None
    self.exog_dtypes_out_                     = None
    self.categorical_features_names_in_       = None
    self.X_train_window_features_names_out_   = None
    self.X_train_calendar_features_names_out_ = None
    self.X_train_exog_names_out_              = None
    self.X_train_features_names_out_          = None
    self.in_sample_residuals_                 = None
    self.out_sample_residuals_                = None
    self.in_sample_residuals_by_bin_          = None
    self.out_sample_residuals_by_bin_         = None
    self.creation_date                        = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
    self.is_fitted                            = False
    self.fit_date                             = None
    self.skforecast_version                   = __version__
    self.python_version                       = sys.version.split(" ")[0]
    self.forecaster_id                        = forecaster_id
    self._probabilistic_mode                  = "binned"

    self.lags, self.lags_names, self.max_lag = initialize_lags(type(self).__name__, lags)
    self.lags_are_contiguous = (
        self.lags is not None
        and np.array_equal(self.lags, np.arange(1, self.max_lag + 1))
    )
    self.window_features, self.window_features_names, self.max_size_window_features = (
        initialize_window_features(window_features)
    )
    if self.window_features is None and self.lags is None:
        raise ValueError(
            "At least one of the arguments `lags` or `window_features` "
            "must be different from None. This is required to create the "
            "predictors used in training the forecaster."
        )

    self.window_size = max(
        [ws for ws in [self.max_lag, self.max_size_window_features] 
         if ws is not None]
    )
    self.window_features_class_names = None
    if window_features is not None:
        self.window_features_class_names = [
            type(wf).__name__ for wf in self.window_features
        ]

    if categorical_features is not None:
        if not (
            (isinstance(categorical_features, str) and categorical_features == 'auto')
            or isinstance(categorical_features, list)
        ):
            raise ValueError(
                f"Argument `categorical_features` must be `'auto'`, a list of "
                f"column names, or `None`. Got {categorical_features}."
            )
        if isinstance(categorical_features, list):
            if len(categorical_features) == 0:
                raise ValueError(
                    "Argument `categorical_features` must not be an empty list. "
                    "Use `None` to disable categorical encoding."
                )

    self.categorical_encoder = OrdinalEncoder(
                                   dtype                 = float,
                                   handle_unknown        = 'use_encoded_value',
                                   unknown_value         = np.nan,
                                   encoded_missing_value = np.nan
                               ).set_output(transform="pandas")

    self.weight_func, self.source_code_weight_func, _ = initialize_weights(
        forecaster_name = type(self).__name__, 
        estimator       = estimator, 
        weight_func     = weight_func, 
        series_weights  = None
    )

    if differentiation is not None:
        if not isinstance(differentiation, int) or differentiation < 1:
            raise ValueError(
                f"Argument `differentiation` must be an integer equal to or "
                f"greater than 1. Got {differentiation}."
            )
        self.differentiation = differentiation
        self.differentiation_max = differentiation
        self.window_size += differentiation
        self.differentiator = TimeSeriesDifferentiator(
            order=differentiation, window_size=self.window_size
        )

    self.fit_kwargs = check_select_fit_kwargs(
                          estimator  = estimator,
                          fit_kwargs = fit_kwargs
                      )

    self.binner_kwargs = binner_kwargs
    if binner_kwargs is None:
        self.binner_kwargs = {
            'n_bins': 10, 'method': 'linear', 'subsample': 200000,
            'random_state': 789654, 'dtype': np.float64
        }
    self.binner = QuantileBinner(**self.binner_kwargs)
    self.binner_intervals_ = None

    self.__skforecast_tags__ = {
        "library": "skforecast",
        "forecaster_name": "ForecasterRecursive",
        "forecaster_task": "regression",
        "forecasting_scope": "single-series",  # single-series | global
        "forecasting_strategy": "recursive",  # recursive | direct | deep_learning | foundation
        "multiple_estimators": False, 
        "index_types_supported": ["pandas.RangeIndex", "pandas.DatetimeIndex"],
        "requires_index_frequency": True,

        "allowed_input_types_series": ["pandas.Series"],
        "supports_exog": True,
        "allowed_input_types_exog": ["pandas.Series", "pandas.DataFrame"],
        "handles_missing_values_series": True, 
        "handles_missing_values_exog": True, 

        "supports_lags": True,
        "supports_window_features": True,
        "supports_calendar_features": True,
        "supports_transformer_series": True,
        "supports_transformer_exog": True,
        "supports_categorical_features": True,
        "supports_weight_func": True,
        "supports_differentiation": True,

        "prediction_types": ["point", "interval", "bootstrapping", "quantiles", "distribution"],
        "supports_probabilistic": True,
        "probabilistic_methods": ["bootstrapping", "conformal"],
        "handles_binned_residuals": True
    }

Attributes¶

estimator `instance-attribute` ¶


estimator = clone(estimator)

calendar_features `instance-attribute` ¶


calendar_features = (
    clone(calendar_features)
    if calendar_features is not None
    else None
)

calendar_features_names `instance-attribute` ¶


calendar_features_names = getattr(
    calendar_features, "features", None
)

transformer_y `instance-attribute` ¶


transformer_y = transformer_y

transformer_exog `instance-attribute` ¶


transformer_exog = transformer_exog

categorical_features `instance-attribute` ¶


categorical_features = categorical_features

weight_func `instance-attribute` ¶


weight_func = weight_func

source_code_weight_func `instance-attribute` ¶


source_code_weight_func = None

differentiation `instance-attribute` ¶


differentiation = differentiation

differentiation_max `instance-attribute` ¶


differentiation_max = None

differentiator `instance-attribute` ¶


differentiator = None

dropna_from_series `instance-attribute` ¶


dropna_from_series = dropna_from_series

last_window_ `instance-attribute` ¶


last_window_ = None

index_type_ `instance-attribute` ¶


index_type_ = None

index_freq_ `instance-attribute` ¶


index_freq_ = None

training_range_ `instance-attribute` ¶


training_range_ = None

series_name_in_ `instance-attribute` ¶


series_name_in_ = None

exog_in_ `instance-attribute` ¶


exog_in_ = False

exog_names_in_ `instance-attribute` ¶


exog_names_in_ = None

exog_type_in_ `instance-attribute` ¶


exog_type_in_ = None

exog_dtypes_in_ `instance-attribute` ¶


exog_dtypes_in_ = None

exog_dtypes_out_ `instance-attribute` ¶


exog_dtypes_out_ = None

categorical_features_names_in_ `instance-attribute` ¶


categorical_features_names_in_ = None

X_train_window_features_names_out_ `instance-attribute` ¶


X_train_window_features_names_out_ = None

X_train_calendar_features_names_out_ `instance-attribute` ¶


X_train_calendar_features_names_out_ = None

X_train_exog_names_out_ `instance-attribute` ¶


X_train_exog_names_out_ = None

X_train_features_names_out_ `instance-attribute` ¶


X_train_features_names_out_ = None

in_sample_residuals_ `instance-attribute` ¶


in_sample_residuals_ = None

out_sample_residuals_ `instance-attribute` ¶


out_sample_residuals_ = None

in_sample_residuals_by_bin_ `instance-attribute` ¶


in_sample_residuals_by_bin_ = None

out_sample_residuals_by_bin_ `instance-attribute` ¶


out_sample_residuals_by_bin_ = None

creation_date `instance-attribute` ¶


creation_date = pd.Timestamp.today().strftime(
    "%Y-%m-%d %H:%M:%S"
)

is_fitted `instance-attribute` ¶


is_fitted = False

fit_date `instance-attribute` ¶


fit_date = None

skforecast_version `instance-attribute` ¶


skforecast_version = __version__

python_version `instance-attribute` ¶


python_version = sys.version.split(' ')[0]

forecaster_id `instance-attribute` ¶


forecaster_id = forecaster_id

_probabilistic_mode `instance-attribute` ¶


_probabilistic_mode = 'binned'

lags_are_contiguous `instance-attribute` ¶


lags_are_contiguous = (
    self.lags is not None
    and np.array_equal(
        self.lags, np.arange(1, self.max_lag + 1)
    )
)

window_size `instance-attribute` ¶


window_size = max(
    [
        ws
        for ws in [
            self.max_lag,
            self.max_size_window_features,
        ]
        if ws is not None
    ]
)

window_features_class_names `instance-attribute` ¶


window_features_class_names = None

categorical_encoder `instance-attribute` ¶


categorical_encoder = OrdinalEncoder(
    dtype=float,
    handle_unknown="use_encoded_value",
    unknown_value=np.nan,
    encoded_missing_value=np.nan,
).set_output(transform="pandas")

fit_kwargs `instance-attribute` ¶


fit_kwargs = check_select_fit_kwargs(
    estimator=estimator, fit_kwargs=fit_kwargs
)

binner_kwargs `instance-attribute` ¶


binner_kwargs = binner_kwargs

binner `instance-attribute` ¶


binner = QuantileBinner(**(self.binner_kwargs))

binner_intervals_ `instance-attribute` ¶


binner_intervals_ = None

Methods:¶

_repr_html_ ¶


_repr_html_()

HTML representation of the object. The "General Information" section is expanded by default.

Source code in skforecast\recursive\_forecaster_recursive.py

def _repr_html_(self) -> str:
    """
    HTML representation of the object.
    The "General Information" section is expanded by default.
    """

    (
        params,
        _,
        _,
        exog_names_in_,
        _,
    ) = self._preprocess_repr(
            estimator                       = self.estimator,
            exog_names_in_                  = self.exog_names_in_,
            categorical_features_names_in_  = self.categorical_features_names_in_,
            as_html                         = True,
        )

    style, unique_id = get_style_repr_html(self.is_fitted)

    content = f"""
    <div class="container-{unique_id}">
        <p style="font-size: 1.5em; font-weight: bold; margin-block-start: 0.83em; margin-block-end: 0.83em;">{type(self).__name__}</p>
        <details open>
            <summary>General Information</summary>
            <ul>
                <li><strong>Estimator:</strong> {type(self.estimator).__name__}</li>
                <li><strong>Lags:</strong> {self.lags}</li>
                <li><strong>Window features:</strong> {self.window_features_names}</li>
                <li><strong>Calendar features:</strong> {self.calendar_features_names}</li>
                <li><strong>Window size:</strong> {self.window_size}</li>
                <li><strong>Series name:</strong> {self.series_name_in_}</li>
                <li><strong>Exogenous included:</strong> {self.exog_in_}</li>
                <li><strong>Categorical features:</strong> {self.categorical_features}</li>
                <li><strong>Weight function included:</strong> {self.weight_func is not None}</li>
                <li><strong>Differentiation order:</strong> {self.differentiation}</li>
                <li><strong>Drop NaN from series:</strong> {self.dropna_from_series}</li>
                <li><strong>Creation date:</strong> {self.creation_date}</li>
                <li><strong>Last fit date:</strong> {self.fit_date}</li>
                <li><strong>Skforecast version:</strong> {self.skforecast_version}</li>
                <li><strong>Python version:</strong> {self.python_version}</li>
                <li><strong>Forecaster id:</strong> {self.forecaster_id}</li>
            </ul>
        </details>
        <details>
            <summary>Exogenous Variables</summary>
            <p style="margin: 0.2em 0 0.2em 1.5em;">{exog_names_in_}</p>
        </details>
        <details>
            <summary>Data Transformations</summary>
            <ul>
                <li><strong>Transformer for y:</strong> {self.transformer_y}</li>
                <li><strong>Transformer for exog:</strong> {self.transformer_exog}</li>
            </ul>
        </details>
        <details>
            <summary>Training Information</summary>
            <ul>
                <li><strong>Training range:</strong> {self.training_range_.to_list() if self.is_fitted else 'Not fitted'}</li>
                <li><strong>Training index type:</strong> {str(self.index_type_).split('.')[-1][:-2] if self.is_fitted else 'Not fitted'}</li>
                <li><strong>Training index frequency:</strong> {self.index_freq_.freqstr if hasattr(self.index_freq_, 'freqstr') else str(self.index_freq_) if self.is_fitted else 'Not fitted'}</li>
            </ul>
        </details>
        <details>
            <summary>Estimator Parameters</summary>
            <ul>
                {params}
            </ul>
        </details>
        <details>
            <summary>Fit Kwargs</summary>
            <ul>
                {self.fit_kwargs}
            </ul>
        </details>
        <p>
            <a href="https://skforecast.org/{__version__}/api/forecasterrecursive.html">&#128214; <strong>API Reference</strong></a>
            &nbsp;&nbsp;
            <a href="https://skforecast.org/{__version__}/user_guides/autoregressive-forecaster.html">&#128221; <strong>User Guide</strong></a>
        </p>
    </div>
    """

    return style + content

_create_lags ¶


_create_lags(y, X_as_pandas=False, train_index=None)

Create the lagged values and their target variable from a time series.

Note that the returned matrix X_data contains the lag 1 in the first column, the lag 2 in the second column and so on.

Parameters:

Name	Type	Description	Default
`y`	`numpy ndarray`	Training time series values.	required
`X_as_pandas`	`bool`	If `True`, the returned matrix `X_data` is a pandas DataFrame.	`False`
`train_index`	`pandas Index`	Index of the training data. It is used to create the pandas DataFrame `X_data` when `X_as_pandas` is `True`.	`None`

Returns:

Name	Type	Description
`X_data`	`numpy ndarray, pandas DataFrame, None`	Lagged values (predictors).
`y_data`	`numpy ndarray`	Values of the time series related to each row of `X_data`.

Notes

Returned matrices may be views into the original y so care must be taken when modifying them.

Source code in skforecast\recursive\_forecaster_recursive.py

def _create_lags(
    self,
    y: np.ndarray,
    X_as_pandas: bool = False,
    train_index: pd.Index | None = None
) -> tuple[np.ndarray | pd.DataFrame | None, np.ndarray]:
    """
    Create the lagged values and their target variable from a time series.

    Note that the returned matrix `X_data` contains the lag 1 in the first 
    column, the lag 2 in the second column and so on.

    Parameters
    ----------
    y : numpy ndarray
        Training time series values.
    X_as_pandas : bool, default False
        If `True`, the returned matrix `X_data` is a pandas DataFrame.
    train_index : pandas Index, default None
        Index of the training data. It is used to create the pandas DataFrame
        `X_data` when `X_as_pandas` is `True`.

    Returns
    -------
    X_data : numpy ndarray, pandas DataFrame, None
        Lagged values (predictors).
    y_data : numpy ndarray
        Values of the time series related to each row of `X_data`.

    Notes
    -----
    Returned matrices may be views into the original `y` so care must be taken
    when modifying them.

    """

    X_data = None
    if self.lags is not None:
        y_strided = np.lib.stride_tricks.sliding_window_view(y, self.window_size)[:-1]
        if self.lags_are_contiguous:
            # Basic slice → view (no copy); reversed to put lag_1 first.
            X_data = y_strided[:, self.window_size - self.max_lag:][:, ::-1]
        else:
            # Non-contiguous lags require fancy indexing, which forces a copy.
            X_data = y_strided[:, self.window_size - self.lags]

        if X_as_pandas:
            X_data = pd.DataFrame(
                         data    = X_data,
                         columns = self.lags_names,
                         index   = train_index
                     )

    y_data = y[self.window_size:]

    return X_data, y_data

_create_window_features ¶


_create_window_features(y, train_index, X_as_pandas=False)

Create window features from a time series.

Parameters:

Name	Type	Description	Default
`y`	`pandas Series`	Training time series.	required
`train_index`	`pandas Index`	Index of the training data. It is used to create the pandas DataFrame `X_train_window_features` when `X_as_pandas` is `True`.	required
`X_as_pandas`	`bool`	If `True`, the returned matrix `X_train_window_features` is a pandas DataFrame.	`False`

Returns:

Name	Type	Description
`X_train_window_features`	`list`	List of numpy ndarrays or pandas DataFrames with the window features.
`X_train_window_features_names_out_`	`list`	Names of the window features.

Source code in skforecast\recursive\_forecaster_recursive.py

def _create_window_features(
    self, 
    y: pd.Series,
    train_index: pd.Index,
    X_as_pandas: bool = False,
) -> tuple[list[np.ndarray | pd.DataFrame], list[str]]:
    """
    Create window features from a time series.

    Parameters
    ----------
    y : pandas Series
        Training time series.
    train_index : pandas Index
        Index of the training data. It is used to create the pandas DataFrame
        `X_train_window_features` when `X_as_pandas` is `True`.
    X_as_pandas : bool, default False
        If `True`, the returned matrix `X_train_window_features` is a 
        pandas DataFrame.

    Returns
    -------
    X_train_window_features : list
        List of numpy ndarrays or pandas DataFrames with the window features.
    X_train_window_features_names_out_ : list
        Names of the window features.

    """

    len_train_index = len(train_index)
    X_train_window_features = []
    X_train_window_features_names_out_ = []
    for wf in self.window_features:
        X_train_wf = wf.transform_batch(y)
        if not isinstance(X_train_wf, pd.DataFrame):
            raise TypeError(
                f"The method `transform_batch` of {type(wf).__name__} "
                f"must return a pandas DataFrame."
            )
        X_train_wf = X_train_wf.iloc[-len_train_index:]
        if not len(X_train_wf) == len_train_index:
            raise ValueError(
                f"The method `transform_batch` of {type(wf).__name__} "
                f"must return a DataFrame with the same number of rows as "
                f"the input time series - `window_size`: {len_train_index}."
            )
        if not X_train_wf.index.equals(train_index):
            raise ValueError(
                f"The method `transform_batch` of {type(wf).__name__} "
                f"must return a DataFrame with the same index as "
                f"the input time series - `window_size`."
            )

        X_train_window_features_names_out_.extend(X_train_wf.columns)
        if not X_as_pandas:
            X_train_wf = X_train_wf.to_numpy()
        X_train_window_features.append(X_train_wf)

    return X_train_window_features, X_train_window_features_names_out_

_create_train_X_y ¶


_create_train_X_y(y, exog=None)

Create training matrices from univariate time series and exogenous variables.

Parameters:

Name	Type	Description	Default
`y`	`pandas Series`	Training time series.	required
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s. Must have the same number of observations as `y` and their indexes must be aligned.	`None`

Returns:

Name	Type	Description
`X_train`	`numpy ndarray`	Training values (predictors).
`y_train`	`numpy ndarray`	Values of the time series related to each row of `X_train`.
`train_index`	`pandas Index`	Index of the training data.
`exog_names_in_`	`list`	Names of the exogenous variables used during training.
`categorical_features_names_in_`	`list`	Names of the exogenous variables considered as categorical.
`X_train_window_features_names_out_`	`list`	Names of the window features included in the matrix `X_train` created internally for training.
`X_train_calendar_features_names_out_`	`list`	Names of the calendar features included in the matrix `X_train` created internally for training.
`X_train_exog_names_out_`	`list`	Names of the exogenous variables included in the matrix `X_train` created internally for training. It can be different from `exog_names_in_` if some exogenous variables are transformed during the training process.
`X_train_features_names_out_`	`list`	Names of the columns of the matrix created internally for training.
`exog_dtypes_in_`	`dict`	Type of each exogenous variable/s used in training before the transformation applied by `transformer_exog`. If `transformer_exog` is not used, it is equal to `exog_dtypes_out_`.
`exog_dtypes_out_`	`dict`	Type of each exogenous variable/s used in training after the transformation applied by `transformer_exog`. If `transformer_exog` is not used, it is equal to `exog_dtypes_in_`.

Notes

If y or exog contain interspersed NaN values, rows where y_train is NaN are always removed. Rows where X_train contains NaN (from lagged NaN in y or from NaN in exog) are removed only if dropna_from_series=True; otherwise a warning is issued.

Source code in skforecast\recursive\_forecaster_recursive.py

def _create_train_X_y(
    self,
    y: pd.Series,
    exog: pd.Series | pd.DataFrame | None = None
) -> tuple[
    np.ndarray, 
    np.ndarray, 
    pd.Index,
    list[str], 
    list[str], 
    list[str], 
    list[str], 
    list[str], 
    list[str], 
    dict[str, type],
    dict[str, type]
]:
    """
    Create training matrices from univariate time series and exogenous
    variables.

    Parameters
    ----------
    y : pandas Series
        Training time series.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `y` and their indexes must be aligned.

    Returns
    -------
    X_train : numpy ndarray
        Training values (predictors).
    y_train : numpy ndarray
        Values of the time series related to each row of `X_train`.
    train_index : pandas Index
        Index of the training data.
    exog_names_in_ : list
        Names of the exogenous variables used during training.
    categorical_features_names_in_ : list
        Names of the exogenous variables considered as categorical.
    X_train_window_features_names_out_ : list
        Names of the window features included in the matrix `X_train` created
        internally for training.
    X_train_calendar_features_names_out_ : list
        Names of the calendar features included in the matrix `X_train` created
        internally for training.
    X_train_exog_names_out_ : list
        Names of the exogenous variables included in the matrix `X_train` created
        internally for training. It can be different from `exog_names_in_` if
        some exogenous variables are transformed during the training process.
    X_train_features_names_out_ : list
        Names of the columns of the matrix created internally for training.
    exog_dtypes_in_ : dict
        Type of each exogenous variable/s used in training before the transformation
        applied by `transformer_exog`. If `transformer_exog` is not used, it
        is equal to `exog_dtypes_out_`.
    exog_dtypes_out_ : dict
        Type of each exogenous variable/s used in training after the transformation
        applied by `transformer_exog`. If `transformer_exog` is not used, it 
        is equal to `exog_dtypes_in_`.

    Notes
    -----
    If `y` or `exog` contain interspersed NaN values, rows where `y_train`
    is NaN are always removed. Rows where `X_train` contains NaN (from
    lagged NaN in `y` or from NaN in `exog`) are removed only if
    `dropna_from_series=True`; otherwise a warning is issued.

    """

    check_y(y=y, allow_nan=True)
    y = input_to_frame(data=y, input_name='y')

    if len(y) <= self.window_size:
        raise ValueError(
            f"Length of `y` must be greater than the maximum window size "
            f"needed by the forecaster.\n"
            f"    Length `y`: {len(y)}.\n"
            f"    Max window size: {self.window_size}.\n"
            f"    Lags window size: {self.max_lag}.\n"
            f"    Window features window size: {self.max_size_window_features}."
        )

    fit_transformer = False if self.is_fitted else True
    y = transform_dataframe(
            df                  = y, 
            transformer         = self.transformer_y,
            fit                 = fit_transformer,
            inverse_transform   = False,
            force_single_column = True
        )

    y_values, y_index = check_extract_values_and_index(data=y, data_label='`y`')
    train_index = y_index[self.window_size:]

    if self.calendar_features is not None:
        if not isinstance(train_index, pd.DatetimeIndex):
            raise TypeError(
                "When `calendar_features` is not `None`, the index of `y` must "
                "be a pandas DatetimeIndex."
            )

    if self.differentiation is not None:
        if not self.is_fitted:
            y_values = self.differentiator.fit_transform(y_values)
        else:
            differentiator = copy(self.differentiator)
            y_values = differentiator.fit_transform(y_values)

    exog_names_in_ = None
    exog_dtypes_in_ = None
    exog_dtypes_out_ = None
    X_train_exog_names_out_ = None
    categorical_features_names_in_ = None
    if exog is not None:
        check_exog(exog=exog, allow_nan=True)
        exog = input_to_frame(data=exog, input_name='exog')
        _, exog_index = check_extract_values_and_index(
            data=exog, data_label='`exog`', ignore_freq=True, return_values=False
        )

        len_y = len(y_values)
        len_train_index = len(train_index)
        len_exog = len(exog)
        if not len_exog == len_y and not len_exog == len_train_index:
            raise ValueError(
                f"Length of `exog` must be equal to the length of `y` (if index is "
                f"fully aligned) or length of `y` - `window_size` (if `exog` "
                f"starts after the first `window_size` values).\n"
                f"    `exog`              : ({exog_index[0]} -- {exog_index[-1]})  (n={len_exog})\n"
                f"    `y`                 : ({y.index[0]} -- {y.index[-1]})  (n={len_y})\n"
                f"    `y` - `window_size` : ({train_index[0]} -- {train_index[-1]})  (n={len_train_index})"
            )

        exog_names_in_ = exog.columns.to_list()
        exog_dtypes_in_ = get_exog_dtypes(exog=exog)

        exog = transform_dataframe(
                   df                = exog,
                   transformer       = self.transformer_exog,
                   fit               = fit_transformer,
                   inverse_transform = False
               )

        if self.categorical_features is not None:
            if self.categorical_features == 'auto':
                categorical_features_names_in_ = [
                    col for col, dtype in exog.dtypes.items()
                    if not pd.api.types.is_numeric_dtype(dtype)
                    and not pd.api.types.is_bool_dtype(dtype)
                ]
            else:
                missing_cols = set(self.categorical_features) - set(exog.columns)
                if missing_cols:
                    raise ValueError(
                        f"The following columns specified in `categorical_features` "
                        f"are not present in `exog` after `transformer_exog`: "
                        f"{missing_cols}."
                    )
                categorical_features_names_in_ = list(self.categorical_features)

            if categorical_features_names_in_: 
                # This copy is only necessary if `transformer_exog` is not used
                if self.transformer_exog is None:
                    exog = exog.copy()
                if fit_transformer:
                    exog[categorical_features_names_in_] = (
                        self.categorical_encoder.fit_transform(
                            exog[categorical_features_names_in_]
                        )
                    )
                else:
                    exog[categorical_features_names_in_] = (
                        self.categorical_encoder.transform(
                            exog[categorical_features_names_in_]
                        )
                    )

        if self.categorical_features is None:
            check_exog_dtypes(exog, call_check_exog=False)

        X_train_exog_names_out_ = exog.columns.to_list()
        exog_dtypes_out_ = get_exog_dtypes(exog=exog)

        exog = exog.to_numpy()

        if len_exog == len_y:
            if not exog_index.equals(y_index):
                raise ValueError(
                    "When `exog` has the same length as `y`, the index of "
                    "`exog` must be aligned with the index of `y` "
                    "to ensure the correct alignment of values."
                )
            # The first `self.window_size` positions have to be removed from 
            # exog since they are not in X_train.
            exog = exog[self.window_size:, ]
        else:
            if not exog_index.equals(train_index):
                raise ValueError(
                    "When `exog` doesn't contain the first `window_size` observations, "
                    "the index of `exog` must be aligned with the index of `y` minus "
                    "the first `window_size` observations to ensure the correct "
                    "alignment of values."
                )

    X_train = []
    X_train_features_names_out_ = []

    X_train_lags, y_train = self._create_lags(
        y=y_values, train_index=train_index
    )
    if X_train_lags is not None:
        X_train.append(X_train_lags)
        X_train_features_names_out_.extend(self.lags_names)

    X_train_window_features_names_out_ = None
    if self.window_features is not None:
        n_diff = 0 if self.differentiation is None else self.differentiation
        y_window_features = pd.Series(y_values[n_diff:], index=y_index[n_diff:])
        X_train_window_features, X_train_window_features_names_out_ = (
            self._create_window_features(
                y=y_window_features, train_index=train_index
            )
        )
        X_train.extend(X_train_window_features)
        X_train_features_names_out_.extend(X_train_window_features_names_out_)

    if exog is not None:
        X_train.append(exog)
        X_train_features_names_out_.extend(X_train_exog_names_out_)

    X_train_calendar_features_names_out_ = None
    if self.calendar_features is not None:
        X_train_calendar = self.calendar_features.fit_transform(train_index).to_numpy()
        X_train.append(X_train_calendar)
        X_train_calendar_features_names_out_ = self.calendar_features.feature_names_out_
        X_train_features_names_out_.extend(X_train_calendar_features_names_out_)

    if len(X_train_features_names_out_) != len(set(X_train_features_names_out_)):
        duplicated_names = [
            name for name in set(X_train_features_names_out_)
            if X_train_features_names_out_.count(name) > 1
        ]
        raise ValueError(
            f"Duplicated feature names detected in X_train: {duplicated_names}."
        )

    if len(X_train) == 1:
        X_train = X_train[0]
    else:
        X_train = np.concatenate(X_train, axis=1)

    # --- NaN row filtering (interspersed NaN support) ---
    if np.isnan(y_train).any():
        mask = ~np.isnan(y_train)
        y_train = y_train[mask]
        X_train = X_train[mask]
        train_index = train_index[mask]
        warnings.warn(
            "NaNs detected in `y_train`. They have been dropped because the "
            "target variable cannot have NaN values. Same rows have been "
            "dropped from `X_train` to maintain alignment. This is caused by "
            "interspersed NaNs in `y`.",
            MissingValuesWarning
        )

    if self.dropna_from_series:
        nan_rows = pd.isna(X_train).any(axis=1)
        if nan_rows.any():
            mask = ~nan_rows
            X_train = X_train[mask]
            y_train = y_train[mask]
            train_index = train_index[mask]
            warnings.warn(
                "NaNs detected in `X_train`. They have been dropped. If "
                "you want to keep them, set `forecaster.dropna_from_series = False`. "
                "Same rows have been removed from `y_train` to maintain alignment. "
                "This is caused by interspersed NaNs in `y` or `exog`.",
                MissingValuesWarning
            )
    else:
        if pd.isna(X_train).any():
            warnings.warn(
                "NaNs detected in `X_train`. Some estimators do not allow "
                "NaN values during training. If you want to drop them, "
                "set `forecaster.dropna_from_series = True`.",
                MissingValuesWarning
            )

    if len(y_train) == 0:
        raise ValueError(
            "All samples have been removed due to NaNs. Set "
            "`forecaster.dropna_from_series = False` or review `y` and "
            "`exog` values."
        )

    return (
        X_train,
        y_train,
        train_index,
        exog_names_in_,
        categorical_features_names_in_,
        X_train_window_features_names_out_,
        X_train_calendar_features_names_out_,
        X_train_exog_names_out_,
        X_train_features_names_out_,
        exog_dtypes_in_,
        exog_dtypes_out_
    )

create_train_X_y ¶


create_train_X_y(y, exog=None, suppress_warnings=False)

Create training matrices from univariate time series and exogenous variables.

Parameters:

Name	Type	Description	Default
`y`	`pandas Series`	Training time series.	required
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s. Must have the same number of observations as `y` and their indexes must be aligned.	`None`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the creation of the training matrices. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`

Returns:

Name	Type	Description
`X_train`	`pandas DataFrame`	Training values (predictors).
`y_train`	`pandas Series`	Values of the time series related to each row of `X_train`.

Notes

If y or exog contain interspersed NaN values, rows where y_train is NaN are always removed. Rows where X_train contains NaN (from lagged NaN in y or from NaN in exog) are removed only if dropna_from_series=True; otherwise a warning is issued.

Source code in skforecast\recursive\_forecaster_recursive.py

@manage_warnings
def create_train_X_y(
    self,
    y: pd.Series,
    exog: pd.Series | pd.DataFrame | None = None,
    suppress_warnings: bool = False
) -> tuple[pd.DataFrame, pd.Series]:
    """
    Create training matrices from univariate time series and exogenous
    variables.

    Parameters
    ----------
    y : pandas Series
        Training time series.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `y` and their indexes must be aligned.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the creation
        of the training matrices. See skforecast.exceptions.warn_skforecast_categories 
        for more information.

    Returns
    -------
    X_train : pandas DataFrame
        Training values (predictors).
    y_train : pandas Series
        Values of the time series related to each row of `X_train`.

    Notes
    -----
    If `y` or `exog` contain interspersed NaN values, rows where `y_train`
    is NaN are always removed. Rows where `X_train` contains NaN (from
    lagged NaN in `y` or from NaN in `exog`) are removed only if
    `dropna_from_series=True`; otherwise a warning is issued.

    """

    (
        X_train,
        y_train,
        train_index,
        _,
        _,
        _,
        _,
        _,
        X_train_features_names_out_,
        _,
        exog_dtypes_out_
    ) = self._create_train_X_y(y=y, exog=exog)

    X_train = pd.DataFrame(
                  data    = X_train,
                  index   = train_index,
                  columns = X_train_features_names_out_
              )

    if exog_dtypes_out_ is not None:
        X_train_dtypes = {col: float for col in X_train_features_names_out_}
        X_train_dtypes.update(exog_dtypes_out_)
        X_train = X_train.astype(X_train_dtypes, copy=False)

    y_train = pd.Series(
                  data  = y_train,
                  index = train_index,
                  name  = 'y'
              )

    return X_train, y_train

_train_test_split_one_step_ahead ¶


_train_test_split_one_step_ahead(
    y, initial_train_size, exog=None
)

Create matrices needed to train and test the forecaster for one-step-ahead predictions. Uses _create_train_X_y to work directly with numpy arrays and precomputes sample weights and fit kwargs (including categorical feature configuration) so they are computed once rather than per trial.

Parameters:

Name	Type	Description	Default
`y`	`pandas Series`	Training time series.	required
`initial_train_size`	`int`	Initial size of the training set. It is the number of observations used to train the forecaster before making the first prediction.	required
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s. Must have the same number of observations as `y` and their indexes must be aligned.	`None`

Returns:

Name	Type	Description
`X_train`	`numpy ndarray`	Predictor values used to train the model.
`y_train`	`numpy ndarray`	Target values related to each row of `X_train`.
`X_test`	`numpy ndarray`	Predictor values used to test the model.
`y_test`	`numpy ndarray`	Target values related to each row of `X_test`.
`sample_weight`	`numpy ndarray, None`	Precomputed sample weights for training. `None` if no `weight_func`.
`fit_kwargs`	`dict`	Precomputed keyword arguments for `estimator.fit`, including categorical feature configuration.

Source code in skforecast\recursive\_forecaster_recursive.py

def _train_test_split_one_step_ahead(
    self,
    y: pd.Series,
    initial_train_size: int,
    exog: pd.Series | pd.DataFrame | None = None
) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray, np.ndarray | None, dict[str, object]]:
    """
    Create matrices needed to train and test the forecaster for one-step-ahead
    predictions. Uses `_create_train_X_y` to work directly with numpy arrays
    and precomputes sample weights and fit kwargs (including categorical
    feature configuration) so they are computed once rather than per trial.

    Parameters
    ----------
    y : pandas Series
        Training time series.
    initial_train_size : int
        Initial size of the training set. It is the number of observations used
        to train the forecaster before making the first prediction.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `y` and their indexes must be aligned.

    Returns
    -------
    X_train : numpy ndarray
        Predictor values used to train the model.
    y_train : numpy ndarray
        Target values related to each row of `X_train`.
    X_test : numpy ndarray
        Predictor values used to test the model.
    y_test : numpy ndarray
        Target values related to each row of `X_test`.
    sample_weight : numpy ndarray, None
        Precomputed sample weights for training. `None` if no `weight_func`.
    fit_kwargs : dict
        Precomputed keyword arguments for `estimator.fit`, including
        categorical feature configuration.

    """

    is_fitted = self.is_fitted
    self.is_fitted = False

    (
        X_train,
        y_train,
        train_index,
        _,
        categorical_features_names_in_,
        _,
        _,
        _,
        X_train_features_names_out_,
        _,
        _
    ) = self._create_train_X_y(
            y    = y.iloc[:initial_train_size],
            exog = exog.iloc[:initial_train_size] if exog is not None else None
        )

    test_init = initial_train_size - self.window_size
    self.is_fitted = True

    (
        X_test,
        y_test,
        *_
    ) = self._create_train_X_y(
            y    = y.iloc[test_init:],
            exog = exog.iloc[test_init:] if exog is not None else None
        )

    self.is_fitted = is_fitted

    sample_weight = self.create_sample_weights(X_train=train_index)

    if self.categorical_features is not None:
        fit_kwargs = configure_estimator_categorical_features(
                         estimator                      = self.estimator,
                         categorical_features_names_in_ = categorical_features_names_in_,
                         X_train_features_names_out_    = X_train_features_names_out_,
                         fit_kwargs                     = {**self.fit_kwargs}
                     )
    else:
        fit_kwargs = {**self.fit_kwargs}

    X_train = cast_catboost_categorical_columns(
        X=X_train, fit_kwargs=fit_kwargs, estimator=self.estimator
    )
    X_test = cast_catboost_categorical_columns(
        X=X_test, fit_kwargs=fit_kwargs, estimator=self.estimator
    )

    return X_train, y_train, X_test, y_test, sample_weight, fit_kwargs

create_sample_weights ¶


create_sample_weights(X_train)

Create weights for each observation according to the forecaster's attribute weight_func.

Parameters:

Name	Type	Description	Default
`X_train`	`pandas DataFrame, pandas Index`	Dataframe created with the `create_train_X_y` method, first return, or the index of the DataFrame.	required

Returns:

Name	Type	Description
`sample_weight`	`numpy ndarray`	Weights to use in `fit` method.

Source code in skforecast\recursive\_forecaster_recursive.py

def create_sample_weights(
    self,
    X_train: pd.DataFrame | pd.Index,
) -> np.ndarray:
    """
    Create weights for each observation according to the forecaster's attribute
    `weight_func`.

    Parameters
    ----------
    X_train : pandas DataFrame, pandas Index
        Dataframe created with the `create_train_X_y` method, first return, 
        or the index of the DataFrame.

    Returns
    -------
    sample_weight : numpy ndarray
        Weights to use in `fit` method.

    """

    sample_weight = None

    if self.weight_func is not None:
        sample_weight = self.weight_func(
            X_train.index if isinstance(X_train, pd.DataFrame) else X_train
        )

    if sample_weight is not None:
        if np.isnan(sample_weight).any():
            raise ValueError(
                "The resulting `sample_weight` cannot have NaN values."
            )
        if np.any(sample_weight < 0):
            raise ValueError(
                "The resulting `sample_weight` cannot have negative values."
            )
        if np.sum(sample_weight) == 0:
            raise ValueError(
                "The resulting `sample_weight` cannot be normalized because "
                "the sum of the weights is zero."
            )

    return sample_weight

fit ¶


fit(
    y,
    exog=None,
    store_last_window=True,
    store_in_sample_residuals=False,
    random_state=123,
    suppress_warnings=False,
)

Training Forecaster.

Additional arguments to be passed to the fit method of the estimator can be added with the fit_kwargs argument when initializing the forecaster.

Parameters:

Name	Type	Description	Default
`y`	`pandas Series`	Training time series.	required
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s. Must have the same number of observations as `y` and their indexes must be aligned so that y[i] is regressed on exog[i].	`None`
`store_last_window`	`bool`	Whether or not to store the last window (`last_window_`) of training data.	`True`
`store_in_sample_residuals`	`bool`	If `True`, in-sample residuals will be stored in the forecaster object after fitting (`in_sample_residuals_` and `in_sample_residuals_by_bin_` attributes). If `False`, only the intervals of the bins are stored.	`False`
`random_state`	`int`	Set a seed for the random generator so that the stored sample residuals are always deterministic.	`123`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the training process. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`

Returns:

Type	Description
`None`

Source code in skforecast\recursive\_forecaster_recursive.py

@manage_warnings
def fit(
    self,
    y: pd.Series,
    exog: pd.Series | pd.DataFrame | None = None,
    store_last_window: bool = True,
    store_in_sample_residuals: bool = False,
    random_state: int = 123,
    suppress_warnings: bool = False
) -> None:
    """
    Training Forecaster.

    Additional arguments to be passed to the `fit` method of the estimator 
    can be added with the `fit_kwargs` argument when initializing the forecaster.

    Parameters
    ----------
    y : pandas Series
        Training time series.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `y` and their indexes must be aligned so
        that y[i] is regressed on exog[i].
    store_last_window : bool, default True
        Whether or not to store the last window (`last_window_`) of training data.
    store_in_sample_residuals : bool, default False
        If `True`, in-sample residuals will be stored in the forecaster object
        after fitting (`in_sample_residuals_` and `in_sample_residuals_by_bin_`
        attributes).
        If `False`, only the intervals of the bins are stored.
    random_state : int, default 123
        Set a seed for the random generator so that the stored sample 
        residuals are always deterministic.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the training 
        process. See skforecast.exceptions.warn_skforecast_categories for more
        information.

    Returns
    -------
    None

    """

    # TODO: create a method reset_forecaster() to reset all attributes
    # Reset values in case the forecaster has already been fitted.
    self.last_window_                         = None
    self.index_type_                          = None
    self.index_freq_                          = None
    self.training_range_                      = None
    self.series_name_in_                      = None
    self.exog_in_                             = False
    self.exog_names_in_                       = None
    self.exog_type_in_                        = None
    self.exog_dtypes_in_                      = None
    self.exog_dtypes_out_                     = None
    self.categorical_features_names_in_       = None
    self.X_train_window_features_names_out_   = None
    self.X_train_calendar_features_names_out_ = None
    self.X_train_exog_names_out_              = None
    self.X_train_features_names_out_          = None
    self.in_sample_residuals_                 = None
    self.in_sample_residuals_by_bin_          = None
    self.out_sample_residuals_                = None
    self.out_sample_residuals_by_bin_         = None
    self.binner_intervals_                    = None
    self.is_fitted                            = False
    self.fit_date                             = None

    (
        X_train,
        y_train,
        train_index,
        exog_names_in_,
        categorical_features_names_in_,
        X_train_window_features_names_out_,
        X_train_calendar_features_names_out_,
        X_train_exog_names_out_,
        X_train_features_names_out_,
        exog_dtypes_in_,
        exog_dtypes_out_
    ) = self._create_train_X_y(y=y, exog=exog)

    sample_weight = self.create_sample_weights(X_train=train_index)

    if self.categorical_features is not None:
        fit_kwargs = configure_estimator_categorical_features(
                         estimator                      = self.estimator,
                         categorical_features_names_in_ = categorical_features_names_in_,
                         X_train_features_names_out_    = X_train_features_names_out_,
                         fit_kwargs                     = {**self.fit_kwargs}
                     )
    else:
        fit_kwargs = {**self.fit_kwargs}

    X_train = cast_catboost_categorical_columns(
        X=X_train, fit_kwargs=fit_kwargs, estimator=self.estimator
    )

    if sample_weight is not None:
        self.estimator.fit(
            X             = X_train,
            y             = y_train,
            sample_weight = sample_weight,
            **fit_kwargs
        )
    else:
        self.estimator.fit(X=X_train, y=y_train, **fit_kwargs)

    self.X_train_window_features_names_out_ = X_train_window_features_names_out_
    self.X_train_calendar_features_names_out_ = X_train_calendar_features_names_out_
    self.X_train_features_names_out_ = X_train_features_names_out_

    self.is_fitted = True
    self.series_name_in_ = y.name if y.name is not None else 'y'
    self.fit_date = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
    self.training_range_ = y.index[[0, -1]]
    self.index_type_ = type(y.index)
    if isinstance(y.index, pd.DatetimeIndex):
        self.index_freq_ = y.index.freq
    else: 
        self.index_freq_ = y.index.step

    if exog is not None:
        self.exog_in_ = True
        self.exog_type_in_ = type(exog)
        self.exog_names_in_ = exog_names_in_
        self.exog_dtypes_in_ = exog_dtypes_in_
        self.exog_dtypes_out_ = exog_dtypes_out_
        self.categorical_features_names_in_ = categorical_features_names_in_
        self.X_train_exog_names_out_ = X_train_exog_names_out_

    # NOTE: This is done to save time during fit in functions such as backtesting()
    if self._probabilistic_mode is not False:
        with warnings.catch_warnings():
            warnings.filterwarnings(
                "ignore",
                message="X does not have valid feature names",
                category=UserWarning
            )
            y_pred = self.estimator.predict(X_train).ravel()
        self._binning_in_sample_residuals(
            y_true                    = y_train,
            y_pred                    = y_pred,
            store_in_sample_residuals = store_in_sample_residuals,
            random_state              = random_state
        )

    if store_last_window:
        self.last_window_ = (
            y.iloc[-self.window_size:]
            .copy()
            .to_frame(name=y.name if y.name is not None else 'y')
        )

_binning_in_sample_residuals ¶


_binning_in_sample_residuals(
    y_true,
    y_pred,
    store_in_sample_residuals=False,
    random_state=123,
)

Bin residuals according to the predicted value each residual is associated with. First a skforecast.preprocessing.QuantileBinner object is fitted to the predicted values. Then, residuals are binned according to the predicted value each residual is associated with. Residuals are stored in the forecaster object as in_sample_residuals_ and in_sample_residuals_by_bin_.

Optimized version that uses pure numpy operations instead of pandas DataFrame + groupby.

y_true and y_pred assumed to be differentiated and or transformed according to the attributes differentiation and transformer_y. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_. The total number of residuals stored is 10_000.

Parameters:

Name	Type	Description	Default
`y_true`	`numpy ndarray`	True values of the time series.	required
`y_pred`	`numpy ndarray`	Predicted values of the time series.	required
`store_in_sample_residuals`	`bool`	If `True`, in-sample residuals will be stored in the forecaster object after fitting (`in_sample_residuals_` and `in_sample_residuals_by_bin_` attributes). If `False`, only the intervals of the bins are stored.	`False`
`random_state`	`int`	Set a seed for the random generator so that the stored sample residuals are always deterministic.	`123`

Returns:

Type	Description
`None`

Source code in skforecast\recursive\_forecaster_recursive.py

def _binning_in_sample_residuals(
    self,
    y_true: np.ndarray,
    y_pred: np.ndarray,
    store_in_sample_residuals: bool = False,
    random_state: int = 123
) -> None:
    """
    Bin residuals according to the predicted value each residual is
    associated with. First a `skforecast.preprocessing.QuantileBinner` object
    is fitted to the predicted values. Then, residuals are binned according
    to the predicted value each residual is associated with. Residuals are
    stored in the forecaster object as `in_sample_residuals_` and
    `in_sample_residuals_by_bin_`.

    Optimized version that uses pure numpy operations instead of pandas
    DataFrame + groupby.

    `y_true` and `y_pred` assumed to be differentiated and or transformed
    according to the attributes `differentiation` and `transformer_y`.
    The number of residuals stored per bin is limited to 
    `10_000 // self.binner.n_bins_`. The total number of residuals stored is
    `10_000`.

    Parameters
    ----------
    y_true : numpy ndarray
        True values of the time series.
    y_pred : numpy ndarray
        Predicted values of the time series.
    store_in_sample_residuals : bool, default False
        If `True`, in-sample residuals will be stored in the forecaster object
        after fitting (`in_sample_residuals_` and `in_sample_residuals_by_bin_`
        attributes).
        If `False`, only the intervals of the bins are stored.
    random_state : int, default 123
        Set a seed for the random generator so that the stored sample 
        residuals are always deterministic.

    Returns
    -------
    None

    """

    residuals = y_true - y_pred

    if self._probabilistic_mode == "binned":
        self.binner.fit(y_pred)
        self.binner_intervals_ = self.binner.intervals_

    if store_in_sample_residuals:
        rng = np.random.default_rng(seed=random_state)
        if self._probabilistic_mode == "binned":
            bins = self.binner.transform(y_pred).astype(int)
            max_sample = 10_000 // self.binner.n_bins_

            self.in_sample_residuals_by_bin_ = {}
            for b in range(self.binner.n_bins_):
                bin_residuals = residuals[bins == b]
                if len(bin_residuals) == 0:
                    continue
                if len(bin_residuals) > max_sample:
                    bin_residuals = bin_residuals[
                        rng.integers(low=0, high=len(bin_residuals), size=max_sample)
                    ]
                self.in_sample_residuals_by_bin_[b] = bin_residuals

        if len(residuals) > 10_000:
            residuals = residuals[
                rng.integers(low=0, high=len(residuals), size=10_000)
            ]

        self.in_sample_residuals_ = residuals

_create_predict_inputs ¶


_create_predict_inputs(
    steps,
    last_window=None,
    exog=None,
    predict_probabilistic=False,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    check_inputs=True,
)

Create the inputs needed for the first iteration of the prediction process. As this is a recursive process, the last window is updated at each iteration of the prediction process.

Parameters:

Name	Type	Description	Default
`steps`	`int, str, pandas Timestamp`	Number of steps to predict. If steps is int, number of steps to predict. If str or pandas Datetime, the prediction will be up to that date.	required
`last_window`	`pandas Series, pandas DataFrame`	Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`predict_probabilistic`	`bool`	If `True`, the necessary checks for probabilistic predictions will be performed.	`False`
`use_in_sample_residuals`	`bool`	If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method.	`True`
`use_binned_residuals`	`bool`	If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly.	`True`
`check_inputs`	`bool`	If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed.	`True`

Returns:

Name	Type	Description
`last_window_values`	`numpy ndarray`	Series values used to create the predictors needed in the first iteration of the prediction (t + 1).
`exog_values`	`numpy ndarray, None`	Exogenous variable/s included as predictor/s.
`calendar_values`	`numpy ndarray, None`	Calendar features included as predictor/s. `None` if no `calendar_features` is set.
`prediction_index`	`pandas Index`	Index of the predictions.
`steps`	`int`	Number of future steps predicted.
`differentiator`	`(TimeSeriesDifferentiator, None)`	A copy of the differentiator fitted with the last window values. `None` if no differentiation is applied. This is used to reverse the differentiation of predictions without mutating the forecaster's internal state.

Source code in skforecast\recursive\_forecaster_recursive.py

def _create_predict_inputs(
    self,
    steps: int | str | pd.Timestamp, 
    last_window: pd.Series | pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    predict_probabilistic: bool = False,
    use_in_sample_residuals: bool = True,
    use_binned_residuals: bool = True,
    check_inputs: bool = True
) -> tuple[
    np.ndarray, np.ndarray | None, np.ndarray | None, pd.Index, int, object | None
]:
    """
    Create the inputs needed for the first iteration of the prediction 
    process. As this is a recursive process, the last window is updated at 
    each iteration of the prediction process.

    Parameters
    ----------
    steps : int, str, pandas Timestamp
        Number of steps to predict. 

        - If steps is int, number of steps to predict. 
        - If str or pandas Datetime, the prediction will be up to that date.
    last_window : pandas Series, pandas DataFrame, default None
        Series values used to create the predictors (lags) needed in the 
        first iteration of the prediction (t + 1).
        If `last_window = None`, the values stored in `self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    predict_probabilistic : bool, default False
        If `True`, the necessary checks for probabilistic predictions will be 
        performed.
    use_in_sample_residuals : bool, default True
        If `True`, residuals from the training data are used as proxy of
        prediction error to create predictions. 
        If `False`, out of sample residuals (calibration) are used. 
        Out-of-sample residuals must be precomputed using Forecaster's
        `set_out_sample_residuals()` method.
    use_binned_residuals : bool, default True
        If `True`, residuals are selected based on the predicted values 
        (binned selection).
        If `False`, residuals are selected randomly.
    check_inputs : bool, default True
        If `True`, the input is checked for possible warnings and errors 
        with the `check_predict_input` function. This argument is created 
        for internal use and is not recommended to be changed.

    Returns
    -------
    last_window_values : numpy ndarray
        Series values used to create the predictors needed in the first 
        iteration of the prediction (t + 1).
    exog_values : numpy ndarray, None
        Exogenous variable/s included as predictor/s.
    calendar_values : numpy ndarray, None
        Calendar features included as predictor/s. `None` if no
        `calendar_features` is set.
    prediction_index : pandas Index
        Index of the predictions.
    steps: int
        Number of future steps predicted.
    differentiator : TimeSeriesDifferentiator, None
        A copy of the differentiator fitted with the last window values.
        `None` if no differentiation is applied. This is used to reverse
        the differentiation of predictions without mutating the forecaster's
        internal state.

    """

    if last_window is None:
        last_window = self.last_window_

    if self.is_fitted:
        steps = date_to_index_position(
                    index        = last_window.index,
                    date_input   = steps,
                    method       = 'prediction',
                    date_literal = 'steps'
                )

    if check_inputs:
        check_predict_input(
            forecaster_name = type(self).__name__,
            steps           = steps,
            is_fitted       = self.is_fitted,
            exog_in_        = self.exog_in_,
            index_type_     = self.index_type_,
            index_freq_     = self.index_freq_,
            window_size     = self.window_size,
            last_window     = last_window,
            exog            = exog,
            exog_names_in_  = self.exog_names_in_
        )

        if predict_probabilistic:
            check_residuals_input(
                forecaster_name              = type(self).__name__,
                use_in_sample_residuals      = use_in_sample_residuals,
                in_sample_residuals_         = self.in_sample_residuals_,
                out_sample_residuals_        = self.out_sample_residuals_,
                use_binned_residuals         = use_binned_residuals,
                in_sample_residuals_by_bin_  = self.in_sample_residuals_by_bin_,
                out_sample_residuals_by_bin_ = self.out_sample_residuals_by_bin_
            )

    last_window_values = (
        last_window.iloc[-self.window_size:].to_numpy(copy=True).ravel()
    )
    last_window_values = transform_numpy(
                             array             = last_window_values,
                             transformer       = self.transformer_y,
                             fit               = False,
                             inverse_transform = False
                         )
    if self.differentiation is not None:
        differentiator = copy(self.differentiator)
        last_window_values = differentiator.fit_transform(last_window_values)
    else:
        differentiator = None

    if exog is not None:

        exog = input_to_frame(data=exog, input_name='exog')
        if exog.columns.tolist() != self.exog_names_in_:
            exog = exog[self.exog_names_in_]

        exog = transform_dataframe(
                   df                = exog,
                   transformer       = self.transformer_exog,
                   fit               = False,
                   inverse_transform = False
               )

        if self.categorical_features is not None and self.categorical_features_names_in_:
            # This copy is only necessary if `transformer_exog` is not used
            if self.transformer_exog is None:
                exog = exog.copy()

            exog[self.categorical_features_names_in_] = (
                self.categorical_encoder.transform(
                    exog[self.categorical_features_names_in_]
                )
            )

        # NOTE: Only check dtypes if they are not the same as seen in training
        if not exog.dtypes.to_dict() == self.exog_dtypes_out_:
            check_exog_dtypes(exog=exog)
        else:
            check_exog(exog=exog, allow_nan=False)

        exog_values = exog.to_numpy()[:steps]
    else:
        exog_values = None

    prediction_index = expand_index(
                           index = last_window.index,
                           steps = steps,
                       )

    if self.calendar_features is not None:
        calendar_values = self.calendar_features.transform(
            prediction_index
        ).to_numpy()
    else:
        calendar_values = None

    return (
        last_window_values,
        exog_values,
        calendar_values,
        prediction_index,
        steps,
        differentiator
    )

_recursive_predict ¶


_recursive_predict(
    steps,
    last_window_values,
    exog_values=None,
    calendar_values=None,
)

Predict n steps ahead. It is an iterative process in which, each prediction, is used as a predictor for the next step.

Fast prediction paths (bypassing sklearn's predict overhead) are used for the following estimators: linear models inheriting from sklearn's LinearModel (np.dot), LGBMRegressor (booster.predict), XGBRegressor (booster.inplace_predict), RandomForestRegressor and DecisionTreeRegressor (tree_.predict).

Parameters:

Name	Type	Description	Default
`steps`	`int`	Number of steps to predict.	required
`last_window_values`	`numpy ndarray`	Series values used to create the predictors needed in the first iteration of the prediction (t + 1).	required
`exog_values`	`numpy ndarray`	Exogenous variable/s included as predictor/s.	`None`
`calendar_values`	`numpy ndarray`	Calendar features included as predictor/s.	`None`

Returns:

Name	Type	Description
`predictions`	`numpy ndarray`	Predicted values.

Source code in skforecast\recursive\_forecaster_recursive.py

def _recursive_predict(
    self,
    steps: int,
    last_window_values: np.ndarray,
    exog_values: np.ndarray | None = None,
    calendar_values: np.ndarray | None = None
) -> np.ndarray:
    """
    Predict n steps ahead. It is an iterative process in which, each prediction,
    is used as a predictor for the next step.

    Fast prediction paths (bypassing sklearn's predict overhead) are used for
    the following estimators: linear models inheriting from sklearn's
    `LinearModel` (np.dot), `LGBMRegressor` (booster.predict),
    `XGBRegressor` (booster.inplace_predict), `RandomForestRegressor` and
    `DecisionTreeRegressor` (tree_.predict).

    Parameters
    ----------
    steps : int
        Number of steps to predict. 
    last_window_values : numpy ndarray
        Series values used to create the predictors needed in the first 
        iteration of the prediction (t + 1).
    exog_values : numpy ndarray, default None
        Exogenous variable/s included as predictor/s.
    calendar_values : numpy ndarray, default None
        Calendar features included as predictor/s.

    Returns
    -------
    predictions : numpy ndarray
        Predicted values.

    """

    original_device = set_cpu_gpu_device(estimator=self.estimator, device='cpu')

    n_lags = len(self.lags) if self.lags is not None else 0
    n_window_features = (
        len(self.X_train_window_features_names_out_)
        if self.window_features is not None
        else 0
    )
    n_exog = exog_values.shape[1] if exog_values is not None else 0
    n_calendar = calendar_values.shape[1] if calendar_values is not None else 0
    n_features = n_lags + n_window_features + n_exog + n_calendar

    X = np.full(shape=(n_features), fill_value=np.nan, dtype=float)
    predictions = np.full(shape=steps, fill_value=np.nan, dtype=float)
    last_window = np.concatenate((last_window_values, predictions))

    predict_fn = _build_predict_function(self.estimator)

    has_lags = self.lags is not None
    has_window_features = self.window_features is not None
    has_exog = exog_values is not None
    has_calendar = calendar_values is not None

    exog_start = n_lags + n_window_features
    exog_end = exog_start + n_exog

    if has_lags and not self.lags_are_contiguous:
        neg_lags = -self.lags

    for i in range(steps):

        remaining = steps - i

        if has_lags:
            if self.lags_are_contiguous:
                X[:n_lags] = last_window[-(remaining + n_lags): -remaining][::-1]
            else:
                X[:n_lags] = last_window[neg_lags - remaining]

        if has_window_features:
            window_data = last_window[i : -remaining]
            X[n_lags : exog_start] = np.concatenate(
                [
                    wf.transform(window_data)
                    for wf in self.window_features
                ]
            )

        if has_exog:
            X[exog_start : exog_end] = exog_values[i]

        if has_calendar:
            X[exog_end:] = calendar_values[i]

        pred = predict_fn(X.reshape(1, -1)).item()
        predictions[i] = pred

        # Update `last_window` values. The first position is discarded and 
        # the new prediction is added at the end.
        last_window[-remaining] = pred

    set_cpu_gpu_device(estimator=self.estimator, device=original_device)

    return predictions

_recursive_predict_bootstrapping ¶


_recursive_predict_bootstrapping(
    steps,
    last_window_values,
    sampled_residuals,
    use_binned_residuals,
    n_boot,
    exog_values=None,
    calendar_values=None,
)

Vectorized bootstrap prediction - predict all n_boot iterations per step. Instead of running n_boot sequential predictions, this method predicts all bootstrap samples at once per step, significantly reducing overhead.

Fast prediction paths (bypassing sklearn's predict overhead) are used for the following estimators: linear models inheriting from sklearn's LinearModel (np.dot), LGBMRegressor (booster.predict), XGBRegressor (booster.inplace_predict), RandomForestRegressor and DecisionTreeRegressor (tree_.predict).

Parameters:

Name	Type	Description	Default
`steps`	`int`	Number of steps to predict.	required
`last_window_values`	`numpy ndarray`	Series values used to create the predictors needed in the first iteration of the prediction (t + 1).	required
`sampled_residuals`	`numpy ndarray`	Pre-sampled residuals for all bootstrap iterations. - If `use_binned_residuals=True`: 3D array of shape (n_bins, steps, n_boot) - If `use_binned_residuals=False`: 2D array of shape (steps, n_boot)	required
`use_binned_residuals`	`bool`	If `True`, residuals are selected based on the predicted values. If `False`, residuals are selected randomly.	required
`n_boot`	`int`	Number of bootstrap iterations.	required
`exog_values`	`numpy ndarray`	Exogenous variable/s included as predictor/s.	`None`
`calendar_values`	`numpy ndarray`	Calendar features included as predictor/s.	`None`

Returns:

Name	Type	Description
`predictions`	`numpy ndarray`	Predicted values with shape (steps, n_boot).

Source code in skforecast\recursive\_forecaster_recursive.py

def _recursive_predict_bootstrapping(
    self,
    steps: int,
    last_window_values: np.ndarray,
    sampled_residuals: np.ndarray,
    use_binned_residuals: bool,
    n_boot: int,
    exog_values: np.ndarray | None = None,
    calendar_values: np.ndarray | None = None
) -> np.ndarray:
    """
    Vectorized bootstrap prediction - predict all n_boot iterations per step.
    Instead of running n_boot sequential predictions, this method predicts 
    all bootstrap samples at once per step, significantly reducing overhead.

    Fast prediction paths (bypassing sklearn's predict overhead) are used for
    the following estimators: linear models inheriting from sklearn's
    `LinearModel` (np.dot), `LGBMRegressor` (booster.predict),
    `XGBRegressor` (booster.inplace_predict), `RandomForestRegressor` and
    `DecisionTreeRegressor` (tree_.predict).

    Parameters
    ----------
    steps : int
        Number of steps to predict. 
    last_window_values : numpy ndarray
        Series values used to create the predictors needed in the first 
        iteration of the prediction (t + 1).
    sampled_residuals : numpy ndarray
        Pre-sampled residuals for all bootstrap iterations.
        - If `use_binned_residuals=True`: 3D array of shape (n_bins, steps, n_boot)
        - If `use_binned_residuals=False`: 2D array of shape (steps, n_boot)
    use_binned_residuals : bool
        If `True`, residuals are selected based on the predicted values.
        If `False`, residuals are selected randomly.
    n_boot : int
        Number of bootstrap iterations.
    exog_values : numpy ndarray, default None
        Exogenous variable/s included as predictor/s.
    calendar_values : numpy ndarray, default None
        Calendar features included as predictor/s.

    Returns
    -------
    predictions : numpy ndarray
        Predicted values with shape (steps, n_boot).

    """

    original_device = set_cpu_gpu_device(estimator=self.estimator, device='cpu')

    n_lags = len(self.lags) if self.lags is not None else 0
    n_window_features = (
        len(self.X_train_window_features_names_out_)
        if self.window_features is not None
        else 0
    )
    n_exog = exog_values.shape[1] if exog_values is not None else 0
    n_calendar = calendar_values.shape[1] if calendar_values is not None else 0
    n_features = n_lags + n_window_features + n_exog + n_calendar

    # Input matrix for prediction: shape (n_boot, n_features)
    X = np.full((n_boot, n_features), fill_value=np.nan, dtype=float)

    # Output predictions: shape (steps, n_boot)
    predictions = np.full((steps, n_boot), fill_value=np.nan, dtype=float)

    # Expand last_window to 2D: (window_size + steps, n_boot)
    # Each column represents a separate bootstrap trajectory
    last_window = np.tile(last_window_values[:, np.newaxis], (1, n_boot))
    last_window = np.vstack([last_window, np.full((steps, n_boot), np.nan)])

    predict_fn = _build_predict_function(self.estimator)

    has_lags = self.lags is not None
    has_window_features = self.window_features is not None
    has_exog = exog_values is not None
    has_calendar = calendar_values is not None

    exog_start = n_lags + n_window_features
    exog_end = exog_start + n_exog

    if has_lags and not self.lags_are_contiguous:
        neg_lags = -self.lags

    if use_binned_residuals:
        boot_indices = np.arange(n_boot)

    for i in range(steps):

        remaining = steps - i

        if has_lags:
            if self.lags_are_contiguous:
                X[:, :n_lags] = last_window[-(remaining + n_lags): -remaining, :][::-1].T
            else:
                X[:, :n_lags] = last_window[neg_lags - remaining, :].T

        if has_window_features:
            window_data = last_window[:-remaining, :]
            # transform accepts 2D: (window_length, n_boot) -> (n_boot, n_stats)
            # and concatenate along axis=1: (n_boot, total_window_features)
            X[:, n_lags : exog_start] = np.concatenate(
                [
                    wf.transform(window_data) 
                    for wf in self.window_features
                ],
                axis=1
            )

        if has_exog:
            X[:, exog_start : exog_end] = exog_values[i]

        if has_calendar:
            X[:, exog_end:] = calendar_values[i]

        pred = predict_fn(X)

        if use_binned_residuals:
            # sampled_residuals is a 3D array: (n_bins, steps, n_boot)
            pred_bins = self.binner.transform(pred).astype(int)
            pred += sampled_residuals[pred_bins, i, boot_indices]
        else:
            pred += sampled_residuals[i, :]

        predictions[i, :] = pred
        last_window[-remaining, :] = pred

    set_cpu_gpu_device(estimator=self.estimator, device=original_device)

    return predictions

create_predict_X ¶


create_predict_X(
    steps,
    last_window=None,
    exog=None,
    check_inputs=True,
    suppress_warnings=False,
)

Create the predictors needed to predict steps ahead. As it is a recursive process, the predictors are created at each iteration of the prediction process.

Parameters:

Name	Type	Description	Default
`steps`	`int, str, pandas Timestamp`	Number of steps to predict. If steps is int, number of steps to predict. If str or pandas Datetime, the prediction will be up to that date.	required
`last_window`	`pandas Series, pandas DataFrame`	Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`check_inputs`	`bool`	If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed.	`True`

Returns:

Name	Type	Description
`X_predict`	`pandas DataFrame`	Pandas DataFrame with the predictors for each step. The index is the same as the prediction index.

Source code in skforecast\recursive\_forecaster_recursive.py

@manage_warnings
def create_predict_X(
    self,
    steps: int,
    last_window: pd.Series | pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    check_inputs: bool = True,
    suppress_warnings: bool = False
) -> pd.DataFrame:
    """
    Create the predictors needed to predict `steps` ahead. As it is a recursive
    process, the predictors are created at each iteration of the prediction 
    process.

    Parameters
    ----------
    steps : int, str, pandas Timestamp
        Number of steps to predict. 

        - If steps is int, number of steps to predict. 
        - If str or pandas Datetime, the prediction will be up to that date.
    last_window : pandas Series, pandas DataFrame, default None
        Series values used to create the predictors (lags) needed in the 
        first iteration of the prediction (t + 1).
        If `last_window = None`, the values stored in `self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    check_inputs : bool, default True
        If `True`, the input is checked for possible warnings and errors 
        with the `check_predict_input` function. This argument is created 
        for internal use and is not recommended to be changed.

    Returns
    -------
    X_predict : pandas DataFrame
        Pandas DataFrame with the predictors for each step. The index 
        is the same as the prediction index.

    """

    (
        last_window_values,
        exog_values,
        calendar_values,
        prediction_index,
        steps,
        _
    ) = self._create_predict_inputs(
            steps        = steps,
            last_window  = last_window,
            exog         = exog,
            check_inputs = check_inputs,
        )

    with warnings.catch_warnings():
        warnings.filterwarnings(
            "ignore", 
            message="X does not have valid feature names", 
            category=UserWarning
        )
        predictions = self._recursive_predict(
                          steps              = steps,
                          last_window_values = last_window_values,
                          exog_values        = exog_values,
                          calendar_values    = calendar_values
                      )

    X_predict = []
    full_predictors = np.concatenate((last_window_values, predictions))

    if self.lags is not None:
        idx = np.arange(-steps, 0)[:, None] - self.lags
        X_lags = full_predictors[idx + len(full_predictors)]
        X_predict.append(X_lags)

    if self.window_features is not None:
        X_window_features = np.full(
            shape      = (steps, len(self.X_train_window_features_names_out_)), 
            fill_value = np.nan, 
            order      = 'C',
            dtype      = float
        )
        for i in range(steps):
            X_window_features[i, :] = np.concatenate(
                [wf.transform(full_predictors[i:-(steps - i)]) 
                 for wf in self.window_features]
            )
        X_predict.append(X_window_features)

    if exog is not None:
        X_predict.append(exog_values)

    if self.calendar_features is not None:
        X_predict.append(calendar_values)

    X_predict = pd.DataFrame(
                    data    = np.concatenate(X_predict, axis=1),
                    columns = self.X_train_features_names_out_,
                    index   = prediction_index
                )

    if self.exog_in_:
        X_predict_dtypes = {col: float for col in self.X_train_features_names_out_}
        X_predict_dtypes.update(self.exog_dtypes_out_)
        X_predict = X_predict.astype(X_predict_dtypes, copy=False)

    if self.transformer_y is not None or self.differentiation is not None:
        warnings.warn(
            "The output matrix is in the transformed scale due to the "
            "inclusion of transformations or differentiation in the Forecaster. "
            "As a result, any predictions generated using this matrix will also "
            "be in the transformed scale. Please refer to the documentation "
            "for more details: "
            "https://skforecast.org/latest/user_guides/training-and-prediction-matrices.html",
            DataTransformationWarning
        )

    return X_predict

predict ¶


predict(
    steps,
    last_window=None,
    exog=None,
    check_inputs=True,
    suppress_warnings=False,
)

Predict n steps ahead. It is a recursive process in which, each prediction, is used as a predictor for the next step.

Parameters:

Name	Type	Description	Default
`steps`	`int, str, pandas Timestamp`	Number of steps to predict. If steps is int, number of steps to predict. If str or pandas Datetime, the prediction will be up to that date.	required
`last_window`	`pandas Series, pandas DataFrame`	Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`check_inputs`	`bool`	If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed.	`True`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`

Returns:

Name	Type	Description
`predictions`	`pandas Series`	Predicted values.

Source code in skforecast\recursive\_forecaster_recursive.py

@manage_warnings
def predict(
    self,
    steps: int | str | pd.Timestamp,
    last_window: pd.Series | pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    check_inputs: bool = True,
    suppress_warnings: bool = False
) -> pd.Series:
    """
    Predict n steps ahead. It is a recursive process in which, each prediction,
    is used as a predictor for the next step.

    Parameters
    ----------
    steps : int, str, pandas Timestamp
        Number of steps to predict. 

        - If steps is int, number of steps to predict. 
        - If str or pandas Datetime, the prediction will be up to that date.
    last_window : pandas Series, pandas DataFrame, default None
        Series values used to create the predictors (lags) needed in the 
        first iteration of the prediction (t + 1).
        If `last_window = None`, the values stored in `self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    check_inputs : bool, default True
        If `True`, the input is checked for possible warnings and errors 
        with the `check_predict_input` function. This argument is created 
        for internal use and is not recommended to be changed.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the prediction 
        process. See skforecast.exceptions.warn_skforecast_categories for more
        information.

    Returns
    -------
    predictions : pandas Series
        Predicted values.

    """

    (
        last_window_values,
        exog_values,
        calendar_values,
        prediction_index,
        steps,
        differentiator
    ) = self._create_predict_inputs(
            steps        = steps,
            last_window  = last_window,
            exog         = exog,
            check_inputs = check_inputs
        )

    with warnings.catch_warnings():
        warnings.filterwarnings(
            "ignore", 
            message="X does not have valid feature names", 
            category=UserWarning
        )
        predictions = self._recursive_predict(
                          steps              = steps,
                          last_window_values = last_window_values,
                          exog_values        = exog_values,
                          calendar_values    = calendar_values
                      )

    if differentiator is not None:
        predictions = differentiator.inverse_transform_next_window(predictions)

    predictions = transform_numpy(
                      array             = predictions,
                      transformer       = self.transformer_y,
                      fit               = False,
                      inverse_transform = True
                  )

    predictions = pd.Series(
                      data  = predictions,
                      index = prediction_index,
                      name  = 'pred'
                  )

    return predictions

predict_bootstrapping ¶


predict_bootstrapping(
    steps,
    last_window=None,
    exog=None,
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
    suppress_warnings=False,
)

Generate multiple forecasting predictions using a bootstrapping process. By sampling from a collection of past observed errors (the residuals), each iteration of bootstrapping generates a different set of predictions. See the References section for more information.

Parameters:

Name	Type	Description	Default
`steps`	`int, str, pandas Timestamp`	Number of steps to predict. If steps is int, number of steps to predict. If str or pandas Datetime, the prediction will be up to that date.	required
`last_window`	`pandas Series, pandas DataFrame`	Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`n_boot`	`int`	Number of bootstrapping iterations to perform when estimating prediction intervals.	`250`
`use_in_sample_residuals`	`bool`	If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method.	`True`
`use_binned_residuals`	`bool`	If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly.	`True`
`random_state`	`int`	Seed for the random number generator to ensure reproducibility.	`123`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`

Returns:

Name	Type	Description
`boot_predictions`	`pandas DataFrame`	Predictions generated by bootstrapping. Shape: (steps, n_boot)

References

.. [1] Forecasting: Principles and Practice (3^rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html

Source code in skforecast\recursive\_forecaster_recursive.py

@manage_warnings
def predict_bootstrapping(
    self,
    steps: int | str | pd.Timestamp,
    last_window: pd.Series | pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    n_boot: int = 250,
    use_in_sample_residuals: bool = True,
    use_binned_residuals: bool = True,
    random_state: int = 123,
    suppress_warnings: bool = False
) -> pd.DataFrame:
    """
    Generate multiple forecasting predictions using a bootstrapping process.
    By sampling from a collection of past observed errors (the residuals),
    each iteration of bootstrapping generates a different set of predictions. 
    See the References section for more information. 

    Parameters
    ----------
    steps : int, str, pandas Timestamp
        Number of steps to predict. 

        - If steps is int, number of steps to predict. 
        - If str or pandas Datetime, the prediction will be up to that date.
    last_window : pandas Series, pandas DataFrame, default None
        Series values used to create the predictors (lags) needed in the 
        first iteration of the prediction (t + 1).
        If `last_window = None`, the values stored in `self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    n_boot : int, default 250
        Number of bootstrapping iterations to perform when estimating prediction
        intervals.
    use_in_sample_residuals : bool, default True
        If `True`, residuals from the training data are used as proxy of
        prediction error to create predictions. 
        If `False`, out of sample residuals (calibration) are used. 
        Out-of-sample residuals must be precomputed using Forecaster's
        `set_out_sample_residuals()` method.
    use_binned_residuals : bool, default True
        If `True`, residuals are selected based on the predicted values 
        (binned selection).
        If `False`, residuals are selected randomly.
    random_state : int, default 123
        Seed for the random number generator to ensure reproducibility.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the prediction 
        process. See skforecast.exceptions.warn_skforecast_categories for more
        information.

    Returns
    -------
    boot_predictions : pandas DataFrame
        Predictions generated by bootstrapping.
        Shape: (steps, n_boot)

    References
    ----------
    .. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos.
           https://otexts.com/fpp3/prediction-intervals.html

    """

    (
        last_window_values,
        exog_values,
        calendar_values,
        prediction_index,
        steps,
        differentiator
    ) = self._create_predict_inputs(
            steps                   = steps, 
            last_window             = last_window, 
            exog                    = exog,
            predict_probabilistic   = True, 
            use_in_sample_residuals = use_in_sample_residuals,
            use_binned_residuals    = use_binned_residuals
        )

    if use_in_sample_residuals:
        residuals = self.in_sample_residuals_
        residuals_by_bin = self.in_sample_residuals_by_bin_
    else:
        residuals = self.out_sample_residuals_
        residuals_by_bin = self.out_sample_residuals_by_bin_

    rng = np.random.default_rng(seed=random_state)
    if use_binned_residuals:
        # Create 3D array with sampled residuals: (n_bins, steps, n_boot)
        n_bins = len(residuals_by_bin)
        sampled_residuals = np.stack(
            [
                residuals_by_bin[k][
                    rng.integers(
                        low=0, high=len(residuals_by_bin[k]), size=(steps, n_boot)
                    )
                ]
                for k in range(n_bins)
            ],
            axis=0,
        )
    else:
        sampled_residuals = residuals[
            rng.integers(low=0, high=len(residuals), size=(steps, n_boot))
        ]

    with warnings.catch_warnings():
        warnings.filterwarnings(
            "ignore", 
            message="X does not have valid feature names", 
            category=UserWarning
        )
        boot_predictions = self._recursive_predict_bootstrapping(
            steps                = steps,
            last_window_values   = last_window_values,
            exog_values          = exog_values,
            calendar_values      = calendar_values,
            sampled_residuals    = sampled_residuals,
            use_binned_residuals = use_binned_residuals,
            n_boot               = n_boot
        )

    if differentiator is not None:
        boot_predictions = (
            differentiator.inverse_transform_next_window(boot_predictions)
        )

    if self.transformer_y:
        boot_predictions = transform_numpy(
                               array             = boot_predictions,
                               transformer       = self.transformer_y,
                               fit               = False,
                               inverse_transform = True
                           )

    boot_columns = [f"pred_boot_{i}" for i in range(n_boot)]
    boot_predictions = pd.DataFrame(
                           data    = boot_predictions,
                           index   = prediction_index,
                           columns = boot_columns
                       )

    return boot_predictions

_predict_interval_conformal ¶


_predict_interval_conformal(
    steps,
    last_window=None,
    exog=None,
    nominal_coverage=0.95,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
)

Generate prediction intervals using the conformal prediction split method [1]_.

Parameters:

Name	Type	Description	Default
`steps`	`int, str, pandas Timestamp`	Number of steps to predict. If steps is int, number of steps to predict. If str or pandas Datetime, the prediction will be up to that date.	required
`last_window`	`pandas Series, pandas DataFrame`	Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in`self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`nominal_coverage`	`float`	Nominal coverage, also known as expected coverage, of the prediction intervals. Must be between 0 and 1.	`0.95`
`use_in_sample_residuals`	`bool`	If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method.	`True`
`use_binned_residuals`	`bool`	If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly.	`True`

Returns:

Name	Type	Description
`predictions`	`pandas DataFrame`	Values predicted by the forecaster and their estimated interval. pred: predictions. lower_bound: lower bound of the interval. upper_bound: upper bound of the interval.

References

.. [1] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method

Source code in skforecast\recursive\_forecaster_recursive.py

def _predict_interval_conformal(
    self,
    steps: int | str | pd.Timestamp,
    last_window: pd.Series | pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    nominal_coverage: float = 0.95,
    use_in_sample_residuals: bool = True,
    use_binned_residuals: bool = True
) -> pd.DataFrame:
    """
    Generate prediction intervals using the conformal prediction 
    split method [1]_.

    Parameters
    ----------
    steps : int, str, pandas Timestamp
        Number of steps to predict. 

        - If steps is int, number of steps to predict. 
        - If str or pandas Datetime, the prediction will be up to that date.
    last_window : pandas Series, pandas DataFrame, default None
        Series values used to create the predictors (lags) needed in the 
        first iteration of the prediction (t + 1).
        If `last_window = None`, the values stored in` self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    nominal_coverage : float, default 0.95
        Nominal coverage, also known as expected coverage, of the prediction
        intervals. Must be between 0 and 1.
    use_in_sample_residuals : bool, default True
        If `True`, residuals from the training data are used as proxy of
        prediction error to create predictions. 
        If `False`, out of sample residuals (calibration) are used. 
        Out-of-sample residuals must be precomputed using Forecaster's
        `set_out_sample_residuals()` method.
    use_binned_residuals : bool, default True
        If `True`, residuals are selected based on the predicted values 
        (binned selection).
        If `False`, residuals are selected randomly.

    Returns
    -------
    predictions : pandas DataFrame
        Values predicted by the forecaster and their estimated interval.

        - pred: predictions.
        - lower_bound: lower bound of the interval.
        - upper_bound: upper bound of the interval.

    References
    ----------
    .. [1] MAPIE - Model Agnostic Prediction Interval Estimator.
           https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method

    """

    (
        last_window_values,
        exog_values,
        calendar_values,
        prediction_index,
        steps,
        differentiator
    ) = self._create_predict_inputs(
            steps                   = steps,
            last_window             = last_window,
            exog                    = exog,
            predict_probabilistic   = True,
            use_in_sample_residuals = use_in_sample_residuals,
            use_binned_residuals    = use_binned_residuals
        )

    if use_in_sample_residuals:
        residuals = self.in_sample_residuals_
        residuals_by_bin = self.in_sample_residuals_by_bin_
    else:
        residuals = self.out_sample_residuals_
        residuals_by_bin = self.out_sample_residuals_by_bin_

    with warnings.catch_warnings():
        warnings.filterwarnings(
            "ignore", 
            message="X does not have valid feature names", 
            category=UserWarning
        )
        predictions = self._recursive_predict(
                          steps              = steps,
                          last_window_values = last_window_values,
                          exog_values        = exog_values,
                          calendar_values    = calendar_values
                      )

    if use_binned_residuals:
        correction_factor_by_bin = {
            k: np.quantile(np.abs(v), nominal_coverage)
            for k, v in residuals_by_bin.items()
        }
        replace_func = np.vectorize(lambda x: correction_factor_by_bin[x])
        predictions_bin = self.binner.transform(predictions)
        correction_factor = replace_func(predictions_bin)
    else:
        correction_factor = np.quantile(np.abs(residuals), nominal_coverage)

    if differentiator is not None:
        predictions = differentiator.inverse_transform_next_window(predictions)
        correction_factor = scale_correction_factor_differentiation(
            correction_factor     = correction_factor,
            steps                 = len(predictions),
            differentiation_order = self.differentiation
        )

    lower_bound = predictions - correction_factor
    upper_bound = predictions + correction_factor
    predictions = np.column_stack([predictions, lower_bound, upper_bound])

    if self.transformer_y:
        predictions = transform_numpy(
                          array             = predictions,
                          transformer       = self.transformer_y,
                          fit               = False,
                          inverse_transform = True
                      )

    predictions = pd.DataFrame(
                      data    = predictions,
                      index   = prediction_index,
                      columns = ["pred", "lower_bound", "upper_bound"]
                  )

    return predictions

predict_interval ¶


predict_interval(
    steps,
    last_window=None,
    exog=None,
    method="bootstrapping",
    interval=[0.05, 0.95],
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
    suppress_warnings=False,
)

Predict n steps ahead and estimate prediction intervals using either bootstrapping or conformal prediction methods. Refer to the References section for additional details on these methods.

Parameters:

Name	Type	Description	Default
`steps`	`int, str, pandas Timestamp`	Number of steps to predict. If steps is int, number of steps to predict. If str or pandas Datetime, the prediction will be up to that date.	required
`last_window`	`pandas Series, pandas DataFrame`	Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in`self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`method`	`str`	Technique used to estimate prediction intervals. Available options: 'bootstrapping': Bootstrapping is used to generate prediction intervals [1]_. 'conformal': Employs the conformal prediction split method for interval estimation [2]_.	`'bootstrapping'`
`interval`	`(float, list, tuple)`	Confidence level of the prediction interval. Interpretation depends on the method used: If `float`, represents the nominal (expected) coverage (between 0 and 1). For instance, `interval=0.95` corresponds to `[0.025, 0.975]` quantiles. If `list` or `tuple`, defines the exact quantiles to compute, which must be between 0 and 1 inclusive. For example, interval of 95% should be as `interval = [0.025, 0.975]`. When using `method='conformal'`, the interval must be a float or a list/tuple defining a symmetric interval. Changed in version 0.23.0: `interval` is now expressed as quantiles (0-1) instead of percentiles (0-100). Passing percentiles is deprecated and emits a `FutureWarning`.	`[0.05, 0.95]`
`n_boot`	`int`	Number of bootstrapping iterations to perform when estimating prediction intervals.	`250`
`use_in_sample_residuals`	`bool`	If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method.	`True`
`use_binned_residuals`	`bool`	If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly.	`True`
`random_state`	`int`	Seed for the random number generator to ensure reproducibility.	`123`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`

Returns:

Name	Type	Description
`predictions`	`pandas DataFrame`	Values predicted by the forecaster and their estimated interval. pred: predictions. lower_bound: lower bound of the interval. upper_bound: upper bound of the interval.

References

.. [1] Forecasting: Principles and Practice (3^rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html

.. [2] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method

Source code in skforecast\recursive\_forecaster_recursive.py

@manage_warnings
def predict_interval(
    self,
    steps: int | str | pd.Timestamp,
    last_window: pd.Series | pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    method: str = 'bootstrapping',
    interval: float | list[float] | tuple[float] = [0.05, 0.95],
    n_boot: int = 250,
    use_in_sample_residuals: bool = True,
    use_binned_residuals: bool = True,
    random_state: int = 123,
    suppress_warnings: bool = False
) -> pd.DataFrame:
    """
    Predict n steps ahead and estimate prediction intervals using either 
    bootstrapping or conformal prediction methods. Refer to the References 
    section for additional details on these methods.

    Parameters
    ----------
    steps : int, str, pandas Timestamp
        Number of steps to predict. 

        - If steps is int, number of steps to predict. 
        - If str or pandas Datetime, the prediction will be up to that date.
    last_window : pandas Series, pandas DataFrame, default None
        Series values used to create the predictors (lags) needed in the 
        first iteration of the prediction (t + 1).
        If `last_window = None`, the values stored in` self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    method : str, default 'bootstrapping'
        Technique used to estimate prediction intervals. Available options:

        - 'bootstrapping': Bootstrapping is used to generate prediction 
        intervals [1]_.
        - 'conformal': Employs the conformal prediction split method for 
        interval estimation [2]_.
    interval : float, list, tuple, default [0.05, 0.95]
        Confidence level of the prediction interval. Interpretation depends 
        on the method used:

        - If `float`, represents the nominal (expected) coverage (between 0 
        and 1). For instance, `interval=0.95` corresponds to `[0.025, 0.975]` 
        quantiles.
        - If `list` or `tuple`, defines the exact quantiles to compute, which 
        must be between 0 and 1 inclusive. For example, interval 
        of 95% should be as `interval = [0.025, 0.975]`.
        - When using `method='conformal'`, the interval must be a float or 
        a list/tuple defining a symmetric interval.

        **Changed in version 0.23.0:** `interval` is now expressed as
        quantiles (0-1) instead of percentiles (0-100). Passing percentiles
        is deprecated and emits a `FutureWarning`.
    n_boot : int, default 250
        Number of bootstrapping iterations to perform when estimating prediction
        intervals.
    use_in_sample_residuals : bool, default True
        If `True`, residuals from the training data are used as proxy of
        prediction error to create predictions. 
        If `False`, out of sample residuals (calibration) are used. 
        Out-of-sample residuals must be precomputed using Forecaster's
        `set_out_sample_residuals()` method.
    use_binned_residuals : bool, default True
        If `True`, residuals are selected based on the predicted values 
        (binned selection).
        If `False`, residuals are selected randomly.
    random_state : int, default 123
        Seed for the random number generator to ensure reproducibility.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the prediction 
        process. See skforecast.exceptions.warn_skforecast_categories for more
        information.

    Returns
    -------
    predictions : pandas DataFrame
        Values predicted by the forecaster and their estimated interval.

        - pred: predictions.
        - lower_bound: lower bound of the interval.
        - upper_bound: upper bound of the interval.

    References
    ----------
    .. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos.
           https://otexts.com/fpp3/prediction-intervals.html

    .. [2] MAPIE - Model Agnostic Prediction Interval Estimator.
           https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method

    """

    if method == "bootstrapping":

        if isinstance(interval, (list, tuple)):
            interval = _normalize_interval_scale(interval)
            check_interval(interval=interval, ensure_symmetric_intervals=False)
            interval = np.array(interval)
        else:
            check_interval(alpha=interval, alpha_literal='interval')
            interval = np.array([0.5 - interval / 2, 0.5 + interval / 2])

        boot_predictions = self.predict_bootstrapping(
                               steps                   = steps,
                               last_window             = last_window,
                               exog                    = exog,
                               n_boot                  = n_boot,
                               random_state            = random_state,
                               use_in_sample_residuals = use_in_sample_residuals,
                               use_binned_residuals    = use_binned_residuals,
                               suppress_warnings       = suppress_warnings
                           )

        predictions = self.predict(
                          steps             = steps,
                          last_window       = last_window,
                          exog              = exog,
                          check_inputs      = False,
                          suppress_warnings = suppress_warnings
                      )

        predictions_interval = boot_predictions.quantile(q=interval, axis=1).transpose()
        predictions_interval.columns = ['lower_bound', 'upper_bound']
        predictions = pd.concat((predictions, predictions_interval), axis=1)

    elif method == 'conformal':

        if isinstance(interval, (list, tuple)):
            interval = _normalize_interval_scale(interval)
            check_interval(interval=interval, ensure_symmetric_intervals=True)
            nominal_coverage = interval[1] - interval[0]
        else:
            check_interval(alpha=interval, alpha_literal='interval')
            nominal_coverage = interval

        predictions = self._predict_interval_conformal(
                          steps                   = steps,
                          last_window             = last_window,
                          exog                    = exog,
                          nominal_coverage        = nominal_coverage,
                          use_in_sample_residuals = use_in_sample_residuals,
                          use_binned_residuals    = use_binned_residuals
                      )
    else:
        raise ValueError(
            f"Invalid `method` '{method}'. Choose 'bootstrapping' or 'conformal'."
        )

    return predictions

predict_quantiles ¶


predict_quantiles(
    steps,
    last_window=None,
    exog=None,
    quantiles=[0.05, 0.5, 0.95],
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
    suppress_warnings=False,
)

Calculate the specified quantiles for each step. After generating multiple forecasting predictions through a bootstrapping process, each quantile is calculated for each step.

Parameters:

Name	Type	Description	Default
`steps`	`int, str, pandas Timestamp`	Number of steps to predict. If steps is int, number of steps to predict. If str or pandas Datetime, the prediction will be up to that date.	required
`last_window`	`pandas Series, pandas DataFrame`	Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in`self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`quantiles`	`(list, tuple)`	Sequence of quantiles to compute, which must be between 0 and 1 inclusive. For example, quantiles of 0.05, 0.5 and 0.95 should be as `quantiles = [0.05, 0.5, 0.95]`.	`[0.05, 0.5, 0.95]`
`n_boot`	`int`	Number of bootstrapping iterations to perform when estimating quantiles.	`250`
`use_in_sample_residuals`	`bool`	If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method.	`True`
`use_binned_residuals`	`bool`	If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly.	`True`
`random_state`	`int`	Seed for the random number generator to ensure reproducibility.	`123`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`

Returns:

Name	Type	Description
`predictions`	`pandas DataFrame`	Quantiles predicted by the forecaster.

References

.. [1] Forecasting: Principles and Practice (3^rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html

Source code in skforecast\recursive\_forecaster_recursive.py

@manage_warnings
def predict_quantiles(
    self,
    steps: int | str | pd.Timestamp,
    last_window: pd.Series | pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    quantiles: list[float] | tuple[float] = [0.05, 0.5, 0.95],
    n_boot: int = 250,
    use_in_sample_residuals: bool = True,
    use_binned_residuals: bool = True,
    random_state: int = 123,
    suppress_warnings: bool = False
) -> pd.DataFrame:
    """
    Calculate the specified quantiles for each step. After generating 
    multiple forecasting predictions through a bootstrapping process, each 
    quantile is calculated for each step.

    Parameters
    ----------
    steps : int, str, pandas Timestamp
        Number of steps to predict. 

        - If steps is int, number of steps to predict. 
        - If str or pandas Datetime, the prediction will be up to that date.
    last_window : pandas Series, pandas DataFrame, default None
        Series values used to create the predictors (lags) needed in the 
        first iteration of the prediction (t + 1).
        If `last_window = None`, the values stored in` self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    quantiles : list, tuple, default [0.05, 0.5, 0.95]
        Sequence of quantiles to compute, which must be between 0 and 1 
        inclusive. For example, quantiles of 0.05, 0.5 and 0.95 should be as 
        `quantiles = [0.05, 0.5, 0.95]`.
    n_boot : int, default 250
        Number of bootstrapping iterations to perform when estimating quantiles.
    use_in_sample_residuals : bool, default True
        If `True`, residuals from the training data are used as proxy of
        prediction error to create predictions. 
        If `False`, out of sample residuals (calibration) are used. 
        Out-of-sample residuals must be precomputed using Forecaster's
        `set_out_sample_residuals()` method.
    use_binned_residuals : bool, default True
        If `True`, residuals are selected based on the predicted values 
        (binned selection).
        If `False`, residuals are selected randomly.
    random_state : int, default 123
        Seed for the random number generator to ensure reproducibility.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the prediction 
        process. See skforecast.exceptions.warn_skforecast_categories for more
        information.

    Returns
    -------
    predictions : pandas DataFrame
        Quantiles predicted by the forecaster.

    References
    ----------
    .. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos.
           https://otexts.com/fpp3/prediction-intervals.html

    """

    check_interval(quantiles=quantiles)

    boot_predictions = self.predict_bootstrapping(
                           steps                   = steps,
                           last_window             = last_window,
                           exog                    = exog,
                           n_boot                  = n_boot,
                           random_state            = random_state,
                           use_in_sample_residuals = use_in_sample_residuals,
                           use_binned_residuals    = use_binned_residuals,
                           suppress_warnings       = suppress_warnings
                       )

    predictions = boot_predictions.quantile(q=quantiles, axis=1).transpose()
    predictions.columns = [f'q_{q}' for q in quantiles]

    return predictions

predict_dist ¶


predict_dist(
    steps,
    distribution,
    last_window=None,
    exog=None,
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
    suppress_warnings=False,
)

Fit a given probability distribution for each step. After generating multiple forecasting predictions through a bootstrapping process, each step is fitted to the given distribution.

Parameters:

Name	Type	Description	Default
`steps`	`int, str, pandas Timestamp`	Number of steps to predict. If steps is int, number of steps to predict. If str or pandas Datetime, the prediction will be up to that date.	required
`distribution`	`object`	A distribution object from scipy.stats with methods `_pdf` and `fit`. For example scipy.stats.norm.	required
`last_window`	`pandas Series, pandas DataFrame`	Series values used to create the predictors (lags) needed in the first iteration of the prediction (t + 1). If `last_window = None`, the values stored in`self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`n_boot`	`int`	Number of bootstrapping iterations to perform when estimating prediction intervals.	`250`
`use_in_sample_residuals`	`bool`	If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method.	`True`
`use_binned_residuals`	`bool`	If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly.	`True`
`random_state`	`int`	Seed for the random number generator to ensure reproducibility.	`123`
`suppress_warnings`	`bool`	If `True`, skforecast warnings are suppressed during execution. See `skforecast.exceptions.warn_skforecast_categories` for the list of warnings that are suppressed.	`False`

Returns:

Name	Type	Description
`predictions`	`pandas DataFrame`	Distribution parameters estimated for each step.

References

.. [1] Forecasting: Principles and Practice (3^rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html

Source code in skforecast\recursive\_forecaster_recursive.py

@manage_warnings
def predict_dist(
    self,
    steps: int | str | pd.Timestamp,
    distribution: object,
    last_window: pd.Series | pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    n_boot: int = 250,
    use_in_sample_residuals: bool = True,
    use_binned_residuals: bool = True,
    random_state: int = 123,
    suppress_warnings: bool = False
) -> pd.DataFrame:
    """
    Fit a given probability distribution for each step. After generating 
    multiple forecasting predictions through a bootstrapping process, each 
    step is fitted to the given distribution.

    Parameters
    ----------
    steps : int, str, pandas Timestamp
        Number of steps to predict. 

        - If steps is int, number of steps to predict. 
        - If str or pandas Datetime, the prediction will be up to that date.
    distribution : object
        A distribution object from scipy.stats with methods `_pdf` and `fit`. 
        For example scipy.stats.norm.
    last_window : pandas Series, pandas DataFrame, default None
        Series values used to create the predictors (lags) needed in the 
        first iteration of the prediction (t + 1).  
        If `last_window = None`, the values stored in` self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    n_boot : int, default 250
        Number of bootstrapping iterations to perform when estimating prediction
        intervals.
    use_in_sample_residuals : bool, default True
        If `True`, residuals from the training data are used as proxy of
        prediction error to create predictions. 
        If `False`, out of sample residuals (calibration) are used. 
        Out-of-sample residuals must be precomputed using Forecaster's
        `set_out_sample_residuals()` method.
    use_binned_residuals : bool, default True
        If `True`, residuals are selected based on the predicted values 
        (binned selection).
        If `False`, residuals are selected randomly.
    random_state : int, default 123
        Seed for the random number generator to ensure reproducibility.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings are suppressed during execution.
        See `skforecast.exceptions.warn_skforecast_categories` for the
        list of warnings that are suppressed.

    Returns
    -------
    predictions : pandas DataFrame
        Distribution parameters estimated for each step.

    References
    ----------
    .. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos.
           https://otexts.com/fpp3/prediction-intervals.html

    """

    if not hasattr(distribution, "_pdf") or not callable(getattr(distribution, "fit", None)):
        raise TypeError(
            "`distribution` must be a valid probability distribution object "
            "from scipy.stats, with methods `_pdf` and `fit`."
        )

    predictions = self.predict_bootstrapping(
                      steps                   = steps,
                      last_window             = last_window,
                      exog                    = exog,
                      n_boot                  = n_boot,
                      random_state            = random_state,
                      use_in_sample_residuals = use_in_sample_residuals,
                      use_binned_residuals    = use_binned_residuals,
                      suppress_warnings       = suppress_warnings
                  )       

    param_names = [
        p for p in inspect.signature(distribution._pdf).parameters
        if not p == 'x'
    ] + ["loc", "scale"]

    predictions[param_names] = (
        predictions.apply(
            lambda x: distribution.fit(x), axis=1, result_type='expand'
        )
    )
    predictions = predictions[param_names]

    return predictions

set_params ¶


set_params(params)

Set new values to the parameters of the scikit-learn model stored in the forecaster. After calling this method, the forecaster is reset to an unfitted state. The fit method must be called before prediction.

Parameters:

Name	Type	Description	Default
`params`	`dict`	Parameters values.	required

Returns:

Type	Description
`None`

Source code in skforecast\recursive\_forecaster_recursive.py

def set_params(
    self, 
    params: dict[str, object]
) -> None:
    """
    Set new values to the parameters of the scikit-learn model stored in the
    forecaster. After calling this method, the forecaster is reset to an 
    unfitted state. The `fit` method must be called before prediction.

    Parameters
    ----------
    params : dict
        Parameters values.

    Returns
    -------
    None

    """

    self.estimator = clone(self.estimator)
    self.estimator.set_params(**params)
    self.is_fitted = False

set_lags ¶


set_lags(lags=None)

Set new value to the attribute lags. Attributes lags_names, max_lag and window_size are also updated.

Parameters:

Name	Type	Description	Default
`lags`	`int, list, numpy ndarray, range`	Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. `int`: include lags from 1 to `lags` (included). `list`, `1d numpy ndarray` or `range`: include only lags present in `lags`, all elements must be int. `None`: no lags are included as predictors.	`None`

Returns:

Type	Description
`None`

Source code in skforecast\recursive\_forecaster_recursive.py

def set_lags(
    self, 
    lags: int | list[int] | np.ndarray[int] | range[int] | None = None
) -> None:
    """
    Set new value to the attribute `lags`. Attributes `lags_names`, 
    `max_lag` and `window_size` are also updated.

    Parameters
    ----------
    lags : int, list, numpy ndarray, range, default None
        Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. 

        - `int`: include lags from 1 to `lags` (included).
        - `list`, `1d numpy ndarray` or `range`: include only lags present in 
        `lags`, all elements must be int.
        - `None`: no lags are included as predictors. 

    Returns
    -------
    None

    """

    if self.window_features is None and lags is None:
        raise ValueError(
            "At least one of the arguments `lags` or `window_features` "
            "must be different from None. This is required to create the "
            "predictors used in training the forecaster."
        )

    self.lags, self.lags_names, self.max_lag = initialize_lags(type(self).__name__, lags)
    self.lags_are_contiguous = (
        self.lags is not None
        and np.array_equal(self.lags, np.arange(1, self.max_lag + 1))
    )
    self.window_size = max(
        [ws for ws in [self.max_lag, self.max_size_window_features] 
         if ws is not None]
    )
    if self.differentiation is not None:
        self.window_size += self.differentiation
        self.differentiator.set_params(window_size=self.window_size)

set_window_features ¶


set_window_features(window_features=None)

Set new value to the attribute window_features. Attributes max_size_window_features, window_features_names, window_features_class_names and window_size are also updated.

Parameters:

Name	Type	Description	Default
`window_features`	`(object, list)`	Instance or list of instances used to create window features. Window features are created from the original time series and are included as predictors.	`None`

Returns:

Type	Description
`None`

Source code in skforecast\recursive\_forecaster_recursive.py

def set_window_features(
    self, 
    window_features: object | list[object] | None = None
) -> None:
    """
    Set new value to the attribute `window_features`. Attributes 
    `max_size_window_features`, `window_features_names`, 
    `window_features_class_names` and `window_size` are also updated.

    Parameters
    ----------
    window_features : object, list, default None
        Instance or list of instances used to create window features. Window features
        are created from the original time series and are included as predictors.

    Returns
    -------
    None

    """

    if window_features is None and self.lags is None:
        raise ValueError(
            "At least one of the arguments `lags` or `window_features` "
            "must be different from None. This is required to create the "
            "predictors used in training the forecaster."
        )

    self.window_features, self.window_features_names, self.max_size_window_features = (
        initialize_window_features(window_features)
    )
    self.window_features_class_names = None
    if window_features is not None:
        self.window_features_class_names = [
            type(wf).__name__ for wf in self.window_features
        ] 
    self.window_size = max(
        [ws for ws in [self.max_lag, self.max_size_window_features] 
         if ws is not None]
    )
    if self.differentiation is not None:
        self.window_size += self.differentiation
        self.differentiator.set_params(window_size=self.window_size)

set_fit_kwargs ¶


set_fit_kwargs(fit_kwargs)

Set new values for the additional keyword arguments passed to the fit method of the estimator.

Parameters:

Name	Type	Description	Default
`fit_kwargs`	`dict`	Dict of the form {"argument": new_value}.	required

Returns:

Type	Description
`None`

Source code in skforecast\recursive\_forecaster_recursive.py

def set_fit_kwargs(
    self, 
    fit_kwargs: dict[str, object]
) -> None:
    """
    Set new values for the additional keyword arguments passed to the `fit` 
    method of the estimator.

    Parameters
    ----------
    fit_kwargs : dict
        Dict of the form {"argument": new_value}.

    Returns
    -------
    None

    """

    self.fit_kwargs = check_select_fit_kwargs(self.estimator, fit_kwargs=fit_kwargs)

set_in_sample_residuals ¶


set_in_sample_residuals(
    y, exog=None, random_state=123, suppress_warnings=False
)

Set in-sample residuals in case they were not calculated during the training process.

In-sample residuals are calculated as the difference between the true values and the predictions made by the forecaster using the training data. The following internal attributes are updated:

in_sample_residuals_: residuals stored in a numpy ndarray.
binner_intervals_: intervals used to bin the residuals are calculated using the quantiles of the predicted values.
in_sample_residuals_by_bin_: residuals are binned according to the predicted value they are associated with and stored in a dictionary, where the keys are the intervals of the predicted values and the values are the residuals associated with that range.

A total of 10_000 residuals are stored in the attribute in_sample_residuals_. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_.

Parameters:

Name	Type	Description	Default
`y`	`pandas Series`	Training time series.	required
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s. Must have the same number of observations as `y` and their indexes must be aligned so that y[i] is regressed on exog[i].	`None`
`random_state`	`int`	Sets a seed to the random sampling for reproducible output.	`123`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the sampling process. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`

Returns:

Type	Description
`None`

Source code in skforecast\recursive\_forecaster_recursive.py

@manage_warnings
def set_in_sample_residuals(
    self,
    y: pd.Series,
    exog: pd.Series | pd.DataFrame | None = None,
    random_state: int = 123,
    suppress_warnings: bool = False
) -> None:
    """
    Set in-sample residuals in case they were not calculated during the
    training process. 

    In-sample residuals are calculated as the difference between the true 
    values and the predictions made by the forecaster using the training 
    data. The following internal attributes are updated:

    + `in_sample_residuals_`: residuals stored in a numpy ndarray.
    + `binner_intervals_`: intervals used to bin the residuals are calculated
    using the quantiles of the predicted values.
    + `in_sample_residuals_by_bin_`: residuals are binned according to the
    predicted value they are associated with and stored in a dictionary, where
    the keys are the intervals of the predicted values and the values are
    the residuals associated with that range. 

    A total of 10_000 residuals are stored in the attribute `in_sample_residuals_`.
    If the number of residuals is greater than 10_000, a random sample of
    10_000 residuals is stored. The number of residuals stored per bin is
    limited to `10_000 // self.binner.n_bins_`.

    Parameters
    ----------
    y : pandas Series
        Training time series.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `y` and their indexes must be aligned so
        that y[i] is regressed on exog[i].
    random_state : int, default 123
        Sets a seed to the random sampling for reproducible output.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the sampling 
        process. See skforecast.exceptions.warn_skforecast_categories for more
        information.

    Returns
    -------
    None

    """

    if not self.is_fitted:
        raise NotFittedError(
            "This forecaster is not fitted yet. Call `fit` with appropriate "
            "arguments before using `set_in_sample_residuals()`."
        )

    check_y(y=y, allow_nan=True)
    y_index_range = check_extract_values_and_index(
        data=y, data_label='`y`', return_values=False
    )[1][[0, -1]]
    if not y_index_range.equals(self.training_range_):
        raise IndexError(
            f"The index range of `y` does not match the range "
            f"used during training. Please ensure the index is aligned "
            f"with the training data.\n"
            f"    Expected : {self.training_range_}\n"
            f"    Received : {y_index_range}"
        )

    (
        X_train,
        y_train,
        _,
        _,
        _,
        _,
        _,
        _,
        X_train_features_names_out_,
        *_
    ) = self._create_train_X_y(y=y, exog=exog)

    if not X_train_features_names_out_ == self.X_train_features_names_out_:
        raise ValueError(
            f"Feature mismatch detected after matrix creation. The features "
            f"generated from the provided data do not match those used during "
            f"the training process. To correctly set in-sample residuals, "
            f"ensure that the same data and preprocessing steps are applied.\n"
            f"    Expected output : {self.X_train_features_names_out_}\n"
            f"    Current output  : {X_train_features_names_out_}"
        )

    with warnings.catch_warnings():
        warnings.filterwarnings(
            "ignore",
            message="X does not have valid feature names",
            category=UserWarning
        )
        y_pred = self.estimator.predict(X_train).ravel()

    self._binning_in_sample_residuals(
        y_true                    = y_train,
        y_pred                    = y_pred,
        store_in_sample_residuals = True,
        random_state              = random_state
    )

set_out_sample_residuals ¶


set_out_sample_residuals(
    y_true, y_pred, append=False, random_state=123
)

Set new values to the attribute out_sample_residuals_. Out of sample residuals are meant to be calculated using observations that did not participate in the training process. y_true and y_pred are expected to be in the original scale of the time series. Residuals are calculated as y_true - y_pred, after applying the necessary transformations and differentiations if the forecaster includes them (self.transformer_y and self.differentiation). Two internal attributes are updated:

out_sample_residuals_: residuals stored in a numpy ndarray.
out_sample_residuals_by_bin_: residuals are binned according to the predicted value they are associated with and stored in a dictionary, where the keys are the intervals of the predicted values and the values are the residuals associated with that range. If a bin binning is empty, it is filled with a random sample of residuals from other bins. This is done to ensure that all bins have at least one residual and can be used in the prediction process.

A total of 10_000 residuals are stored in the attribute out_sample_residuals_. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_.

Parameters:

Name	Type	Description	Default
`y_true`	`numpy ndarray, pandas Series`	True values of the time series from which the residuals have been calculated.	required
`y_pred`	`numpy ndarray, pandas Series`	Predicted values of the time series.	required
`append`	`bool`	If `True`, new residuals are added to the once already stored in the forecaster. If after appending the new residuals, the limit of `10_000 // self.binner.n_bins_` values per bin is reached, a random sample of residuals is stored.	`False`
`random_state`	`int`	Sets a seed to the random sampling for reproducible output.	`123`

Returns:

Type	Description
`None`

Source code in skforecast\recursive\_forecaster_recursive.py

def set_out_sample_residuals(
    self,
    y_true: np.ndarray | pd.Series,
    y_pred: np.ndarray | pd.Series,
    append: bool = False,
    random_state: int = 123
) -> None:
    """
    Set new values to the attribute `out_sample_residuals_`. Out of sample
    residuals are meant to be calculated using observations that did not
    participate in the training process. `y_true` and `y_pred` are expected
    to be in the original scale of the time series. Residuals are calculated
    as `y_true` - `y_pred`, after applying the necessary transformations and
    differentiations if the forecaster includes them (`self.transformer_y`
    and `self.differentiation`). Two internal attributes are updated:

    + `out_sample_residuals_`: residuals stored in a numpy ndarray.
    + `out_sample_residuals_by_bin_`: residuals are binned according to the
    predicted value they are associated with and stored in a dictionary, where
    the keys are the  intervals of the predicted values and the values are
    the residuals associated with that range. If a bin binning is empty, it
    is filled with a random sample of residuals from other bins. This is done
    to ensure that all bins have at least one residual and can be used in the
    prediction process.

    A total of 10_000 residuals are stored in the attribute `out_sample_residuals_`.
    If the number of residuals is greater than 10_000, a random sample of
    10_000 residuals is stored. The number of residuals stored per bin is
    limited to `10_000 // self.binner.n_bins_`.

    Parameters
    ----------
    y_true : numpy ndarray, pandas Series
        True values of the time series from which the residuals have been
        calculated.
    y_pred : numpy ndarray, pandas Series
        Predicted values of the time series.
    append : bool, default False
        If `True`, new residuals are added to the once already stored in the
        forecaster. If after appending the new residuals, the limit of
        `10_000 // self.binner.n_bins_` values per bin is reached, a random
        sample of residuals is stored.
    random_state : int, default 123
        Sets a seed to the random sampling for reproducible output.

    Returns
    -------
    None

    """

    if not self.is_fitted:
        raise NotFittedError(
            "This forecaster is not fitted yet. Call `fit` with appropriate "
            "arguments before using `set_out_sample_residuals()`."
        )

    if not isinstance(y_true, (np.ndarray, pd.Series)):
        raise TypeError(
            f"`y_true` argument must be `numpy ndarray` or `pandas Series`. "
            f"Got {type(y_true)}."
        )

    if not isinstance(y_pred, (np.ndarray, pd.Series)):
        raise TypeError(
            f"`y_pred` argument must be `numpy ndarray` or `pandas Series`. "
            f"Got {type(y_pred)}."
        )

    if len(y_true) != len(y_pred):
        raise ValueError(
            f"`y_true` and `y_pred` must have the same length. "
            f"Got {len(y_true)} and {len(y_pred)}."
        )

    if isinstance(y_true, pd.Series) and isinstance(y_pred, pd.Series):
        if not y_true.index.equals(y_pred.index):
            raise ValueError(
                "`y_true` and `y_pred` must have the same index."
            )

    if not isinstance(y_pred, np.ndarray):
        y_pred = y_pred.to_numpy()
    if not isinstance(y_true, np.ndarray):
        y_true = y_true.to_numpy()

    if self.transformer_y:
        y_true = transform_numpy(
                     array             = y_true,
                     transformer       = self.transformer_y,
                     fit               = False,
                     inverse_transform = False
                 )
        y_pred = transform_numpy(
                     array             = y_pred,
                     transformer       = self.transformer_y,
                     fit               = False,
                     inverse_transform = False
                 )

    if self.differentiation is not None:
        differentiator = copy(self.differentiator)
        differentiator.set_params(window_size=None)
        y_true = differentiator.fit_transform(y_true)[self.differentiation:]
        y_pred = differentiator.fit_transform(y_pred)[self.differentiation:]

    data = pd.DataFrame(
        {'prediction': y_pred, 'residuals': y_true - y_pred}
    ).dropna()
    y_pred = data['prediction'].to_numpy()
    residuals = data['residuals'].to_numpy()

    data['bin'] = self.binner.transform(y_pred).astype(int)
    residuals_by_bin = data.groupby('bin')['residuals'].apply(np.array).to_dict()

    out_sample_residuals = (
        np.array([]) 
        if self.out_sample_residuals_ is None
        else self.out_sample_residuals_
    )
    out_sample_residuals_by_bin = (
        {} 
        if self.out_sample_residuals_by_bin_ is None
        else self.out_sample_residuals_by_bin_
    )
    if append:
        out_sample_residuals = np.concatenate([out_sample_residuals, residuals])
        for k, v in residuals_by_bin.items():
            if k in out_sample_residuals_by_bin:
                out_sample_residuals_by_bin[k] = np.concatenate(
                    (out_sample_residuals_by_bin[k], v)
                )
            else:
                out_sample_residuals_by_bin[k] = v
    else:
        out_sample_residuals = residuals
        out_sample_residuals_by_bin = residuals_by_bin

    max_samples = 10_000 // self.binner.n_bins_
    rng = np.random.default_rng(seed=random_state)
    for k, v in out_sample_residuals_by_bin.items():
        if len(v) > max_samples:
            sample = rng.choice(a=v, size=max_samples, replace=False)
            out_sample_residuals_by_bin[k] = sample

    bin_keys = (
        []
        if self.binner_intervals_ is None
        else self.binner_intervals_.keys()
    )
    for k in bin_keys:
        if k not in out_sample_residuals_by_bin:
            out_sample_residuals_by_bin[k] = np.array([])

    empty_bins = [
        k for k, v in out_sample_residuals_by_bin.items() 
        if v.size == 0
    ]
    if empty_bins:
        warnings.warn(
            f"The following bins have no out of sample residuals: {empty_bins}. "
            f"No predicted values fall in the interval "
            f"{[self.binner_intervals_[bin] for bin in empty_bins]}. "
            f"Empty bins will be filled with a random sample of residuals.",
            ResidualsUsageWarning
        )
        empty_bin_size = min(max_samples, len(out_sample_residuals))
        for k in empty_bins:
            out_sample_residuals_by_bin[k] = rng.choice(
                a       = out_sample_residuals,
                size    = empty_bin_size,
                replace = False
            )

    if len(out_sample_residuals) > 10_000:
        out_sample_residuals = rng.choice(
            a       = out_sample_residuals, 
            size    = 10_000, 
            replace = False
        )

    self.out_sample_residuals_ = out_sample_residuals
    self.out_sample_residuals_by_bin_ = out_sample_residuals_by_bin

get_feature_importances ¶


get_feature_importances(sort_importance=True)

Return feature importances of the estimator stored in the forecaster. Only valid when estimator stores internally the feature importances in the attribute feature_importances_ or coef_. Otherwise, returns None.

Parameters:

Name	Type	Description	Default
`sort_importance`	`bool`	If `True`, sorts the feature importances in descending order.	`True`

Returns:

Name	Type	Description
`feature_importances`	`pandas DataFrame`	Feature importances associated with each predictor.

Source code in skforecast\recursive\_forecaster_recursive.py

def get_feature_importances(
    self,
    sort_importance: bool = True
) -> pd.DataFrame:
    """
    Return feature importances of the estimator stored in the forecaster.
    Only valid when estimator stores internally the feature importances in the
    attribute `feature_importances_` or `coef_`. Otherwise, returns `None`.

    Parameters
    ----------
    sort_importance: bool, default True
        If `True`, sorts the feature importances in descending order.

    Returns
    -------
    feature_importances : pandas DataFrame
        Feature importances associated with each predictor.

    """

    if not self.is_fitted:
        raise NotFittedError(
            "This forecaster is not fitted yet. Call `fit` with appropriate "
            "arguments before using `get_feature_importances()`."
        )

    if isinstance(self.estimator, Pipeline):
        estimator = self.estimator[-1]
    else:
        estimator = self.estimator

    if hasattr(estimator, 'feature_importances_'):
        feature_importances = estimator.feature_importances_
    elif hasattr(estimator, 'coef_'):
        feature_importances = estimator.coef_
    else:
        warnings.warn(
            f"Impossible to access feature importances for estimator of type "
            f"{type(estimator)}. This method is only valid when the "
            f"estimator stores internally the feature importances in the "
            f"attribute `feature_importances_` or `coef_`."
        )
        feature_importances = None

    if feature_importances is not None:
        feature_importances = pd.DataFrame({
                                  'feature': self.X_train_features_names_out_,
                                  'importance': feature_importances
                              })
        if sort_importance:
            feature_importances = feature_importances.sort_values(
                                      by='importance', ascending=False
                                  )

    return feature_importances

ForecasterRecursive¶

skforecast.recursive._forecaster_recursive.ForecasterRecursive ¶

Attributes¶

estimator instance-attribute ¶

calendar_features instance-attribute ¶

calendar_features_names instance-attribute ¶

transformer_y instance-attribute ¶

transformer_exog instance-attribute ¶

categorical_features instance-attribute ¶

weight_func instance-attribute ¶

source_code_weight_func instance-attribute ¶

differentiation instance-attribute ¶

differentiation_max instance-attribute ¶

differentiator instance-attribute ¶

dropna_from_series instance-attribute ¶

last_window_ instance-attribute ¶

index_type_ instance-attribute ¶

index_freq_ instance-attribute ¶

training_range_ instance-attribute ¶

series_name_in_ instance-attribute ¶

exog_in_ instance-attribute ¶

exog_names_in_ instance-attribute ¶

exog_type_in_ instance-attribute ¶

exog_dtypes_in_ instance-attribute ¶

exog_dtypes_out_ instance-attribute ¶

categorical_features_names_in_ instance-attribute ¶

X_train_window_features_names_out_ instance-attribute ¶

X_train_calendar_features_names_out_ instance-attribute ¶

X_train_exog_names_out_ instance-attribute ¶

X_train_features_names_out_ instance-attribute ¶

in_sample_residuals_ instance-attribute ¶

out_sample_residuals_ instance-attribute ¶

in_sample_residuals_by_bin_ instance-attribute ¶

out_sample_residuals_by_bin_ instance-attribute ¶

creation_date instance-attribute ¶

is_fitted instance-attribute ¶

fit_date instance-attribute ¶

skforecast_version instance-attribute ¶

python_version instance-attribute ¶

forecaster_id instance-attribute ¶

_probabilistic_mode instance-attribute ¶

lags_are_contiguous instance-attribute ¶

window_size instance-attribute ¶

window_features_class_names instance-attribute ¶

categorical_encoder instance-attribute ¶

fit_kwargs instance-attribute ¶

binner_kwargs instance-attribute ¶

binner instance-attribute ¶

binner_intervals_ instance-attribute ¶

Methods:¶

_repr_html_ ¶

_create_lags ¶

_create_window_features ¶

_create_train_X_y ¶

create_train_X_y ¶

_train_test_split_one_step_ahead ¶

create_sample_weights ¶

fit ¶

_binning_in_sample_residuals ¶

_create_predict_inputs ¶

_recursive_predict ¶

_recursive_predict_bootstrapping ¶

create_predict_X ¶

predict ¶

predict_bootstrapping ¶

_predict_interval_conformal ¶

predict_interval ¶

predict_quantiles ¶

predict_dist ¶

set_params ¶

set_lags ¶

set_window_features ¶

set_fit_kwargs ¶

set_in_sample_residuals ¶

set_out_sample_residuals ¶

get_feature_importances ¶

`ForecasterRecursive`¶

estimator `instance-attribute` ¶

calendar_features `instance-attribute` ¶

calendar_features_names `instance-attribute` ¶

transformer_y `instance-attribute` ¶

transformer_exog `instance-attribute` ¶

categorical_features `instance-attribute` ¶

weight_func `instance-attribute` ¶

source_code_weight_func `instance-attribute` ¶

differentiation `instance-attribute` ¶

differentiation_max `instance-attribute` ¶

differentiator `instance-attribute` ¶

dropna_from_series `instance-attribute` ¶

last_window_ `instance-attribute` ¶

index_type_ `instance-attribute` ¶

index_freq_ `instance-attribute` ¶

training_range_ `instance-attribute` ¶

series_name_in_ `instance-attribute` ¶

exog_in_ `instance-attribute` ¶

exog_names_in_ `instance-attribute` ¶

exog_type_in_ `instance-attribute` ¶

exog_dtypes_in_ `instance-attribute` ¶

exog_dtypes_out_ `instance-attribute` ¶

categorical_features_names_in_ `instance-attribute` ¶

X_train_window_features_names_out_ `instance-attribute` ¶

X_train_calendar_features_names_out_ `instance-attribute` ¶

X_train_exog_names_out_ `instance-attribute` ¶

X_train_features_names_out_ `instance-attribute` ¶

in_sample_residuals_ `instance-attribute` ¶

out_sample_residuals_ `instance-attribute` ¶

in_sample_residuals_by_bin_ `instance-attribute` ¶

out_sample_residuals_by_bin_ `instance-attribute` ¶

creation_date `instance-attribute` ¶

is_fitted `instance-attribute` ¶

fit_date `instance-attribute` ¶

skforecast_version `instance-attribute` ¶

python_version `instance-attribute` ¶

forecaster_id `instance-attribute` ¶

_probabilistic_mode `instance-attribute` ¶

lags_are_contiguous `instance-attribute` ¶

window_size `instance-attribute` ¶

window_features_class_names `instance-attribute` ¶

categorical_encoder `instance-attribute` ¶

fit_kwargs `instance-attribute` ¶

binner_kwargs `instance-attribute` ¶

binner `instance-attribute` ¶

binner_intervals_ `instance-attribute` ¶