`ForecasterDirectMultiVariate`¶

skforecast.direct._forecaster_direct_multivariate.ForecasterDirectMultiVariate ¶


ForecasterDirectMultiVariate(
    level,
    steps,
    estimator=None,
    lags=None,
    window_features=None,
    transformer_series=StandardScaler(),
    transformer_exog=None,
    weight_func=None,
    differentiation=None,
    fit_kwargs=None,
    binner_kwargs=None,
    n_jobs="auto",
    forecaster_id=None,
    regressor=None,
)

Bases: ForecasterBase

This class turns any estimator compatible with the scikit-learn API into a autoregressive multivariate direct multi-step forecaster. A separate model is created for each forecast time step. See documentation for more details.

Parameters:

Name	Type	Description	Default
`estimator`	`estimator or pipeline compatible with the scikit-learn API`	An instance of a estimator or pipeline compatible with the scikit-learn API.	`None`
`level`	`str`	Name of the time series to be predicted.	required
`steps`	`int`	Maximum number of future steps the forecaster will predict when using method `predict()`. Since a different model is created for each step, this value must be defined before training.	required
`lags`	`int, list, numpy ndarray, range, dict`	Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. `int`: include lags from 1 to `lags` (included). `list`, `1d numpy ndarray` or `range`: include only lags present in `lags`, all elements must be int. `dict`: create different lags for each series. {'series_column_name': lags}. `None`: no lags are included as predictors.	`None`
`window_features`	`(object, list)`	Instance or list of instances used to create window features. Window features are created from the original time series and are included as predictors.	`None`
`transformer_series`	`(transformer(preprocessor), dict)`	An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API with methods: fit, transform, fit_transform and inverse_transform. Transformation is applied to each `series` before training the forecaster. ColumnTransformers are not allowed since they do not have inverse_transform method. If single transformer: it is cloned and applied to all series. If `dict` of transformers: a different transformer can be used for each series.	`sklearn.preprocessing.StandardScaler`
`transformer_exog`	`transformer`	An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API. The transformation is applied to `exog` before training the forecaster. `inverse_transform` is not available when using ColumnTransformers.	`None`
`weight_func`	`Callable`	Function that defines the individual weights for each sample based on the index. For example, a function that assigns a lower weight to certain dates. Ignored if `estimator` does not have the argument `sample_weight` in its `fit` method. The resulting `sample_weight` cannot have negative values.	`None`
`differentiation`	`int`	Order of differencing applied to the time series before training the forecaster. If `None`, no differencing is applied. The order of differentiation is the number of times the differencing operation is applied to a time series. Differencing involves computing the differences between consecutive data points in the series. Before returning a prediction, the differencing operation is reversed.	`None`
`fit_kwargs`	`dict`	Additional arguments to be passed to the `fit` method of the estimator.	`None`
`binner_kwargs`	`dict`	Additional arguments to pass to the `QuantileBinner` used to discretize the residuals into k bins according to the predicted values associated with each residual. Available arguments are: `n_bins`, `method`, `subsample`, `random_state` and `dtype`. Argument `method` is passed internally to the function `numpy.percentile`. New in version 0.15.0	`None`
`n_jobs`	`(int, 'auto')`	The number of jobs to run in parallel. If `-1`, then the number of jobs is set to the number of cores. If 'auto', `n_jobs` is set using the function skforecast.utils.select_n_jobs_fit_forecaster.	`'auto'`
`forecaster_id`	`(str, int)`	Name used as an identifier of the forecaster.	`None`
`regressor`	`estimator or pipeline compatible with the Keras API`	Deprecated, alias for `estimator`.	`None`

Attributes:

Name	Type	Description
`estimator`	`estimator or pipeline compatible with the scikit-learn API`	An instance of a estimator or pipeline compatible with the scikit-learn API. An instance of this estimator is trained for each step. All of them are stored in `self.estimators_`.
`estimators_`	`dict`	Dictionary with estimators trained for each step. They are initialized as a copy of `estimator`.
`steps`	`numpy array`	Future steps the forecaster will predict when using method `predict()`. Since a different model is created for each step, this value should be defined before training.
`max_step`	`int`	Maximum step the forecaster is able to predict. It is the maximum value included in `steps`.
`lags`	`numpy ndarray, dict`	Lags used as predictors.
`lags_`	`dict`	Dictionary with the lags of each series. Created from `lags` when creating the training matrices and used internally to avoid overwriting.
`lags_names`	`dict`	Names of the lags of each series.
`max_lag`	`int`	Maximum lag included in `lags`.
`window_features`	`list`	Class or list of classes used to create window features.
`window_features_names`	`list`	Names of the window features to be included in the `X_train` matrix.
`window_features_class_names`	`list`	Names of the classes used to create the window features.
`max_size_window_features`	`int`	Maximum window size required by the window features.
`window_size`	`int`	The window size needed to create the predictors. It is calculated as the maximum value between `max_lag` and `max_size_window_features`. If differentiation is used, `window_size` is increased by n units equal to the order of differentiation so that predictors can be generated correctly.
`transformer_series`	`transformer (preprocessor), dict, default None`	An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API with methods: fit, transform, fit_transform and inverse_transform. Transformation is applied to each `series` before training the forecaster. ColumnTransformers are not allowed since they do not have inverse_transform method. If single transformer: it is cloned and applied to all series. If `dict` of transformers: a different transformer can be used for each series.
`transformer_series_`	`dict`	Dictionary with the transformer for each series. It is created cloning the objects in `transformer_series` and is used internally to avoid overwriting.
`transformer_exog`	`transformer`	An instance of a transformer (preprocessor) compatible with the scikit-learn preprocessing API. The transformation is applied to `exog` before training the forecaster. `inverse_transform` is not available when using ColumnTransformers.
`weight_func`	`Callable`	Function that defines the individual weights for each sample based on the index. For example, a function that assigns a lower weight to certain dates. Ignored if `estimator` does not have the argument `sample_weight` in its `fit` method. The resulting `sample_weight` cannot have negative values.
`source_code_weight_func`	`str`	Source code of the custom function used to create weights.
`differentiation`	`int`	Order of differencing applied to the time series before training the forecaster.
`differentiation_max`	`int`	Maximum order of differentiation. For this Forecaster, it is equal to the value of the `differentiation` parameter.
`differentiator`	`TimeSeriesDifferentiator`	Skforecast object used to differentiate the time series.
`differentiator_`	`dict`	Dictionary with the `differentiator` for each series. It is created cloning the objects in `differentiator` and is used internally to avoid overwriting.
`last_window_`	`pandas DataFrame`	This window represents the most recent data observed by the predictor during its training phase. It contains the values needed to predict the next step immediately after the training data. These values are stored in the original scale of the time series before undergoing any transformations or differentiation. When `differentiation` parameter is specified, the dimensions of the `last_window_` are expanded as many values as the order of differentiation. For example, if `lags` = 7 and `differentiation` = 1, `last_window_` will have 8 values.
`index_type_`	`type`	Type of index of the input used in training.
`index_freq_`	`str`	Frequency of Index of the input used in training.
`training_range_`	`pandas Index`	First and last values of index of the data used during training.
`series_names_in_`	`list`	Names of the series used during training.
`exog_in_`	`bool`	If the forecaster has been trained using exogenous variable/s.
`exog_names_in_`	`list`	Names of the exogenous variables used during training.
`exog_type_in_`	`type`	Type of exogenous variable/s used in training.
`exog_dtypes_in_`	`dict`	Type of each exogenous variable/s used in training before the transformation applied by `transformer_exog`. If `transformer_exog` is not used, it is equal to `exog_dtypes_out_`.
`exog_dtypes_out_`	`dict`	Type of each exogenous variable/s used in training after the transformation applied by `transformer_exog`. If `transformer_exog` is not used, it is equal to `exog_dtypes_in_`.
`X_train_series_names_in_`	`list`	Names of the series added to `X_train` when creating the training matrices with `_create_train_X_y` method. It is a subset of `series_names_in_`.
`X_train_window_features_names_out_`	`list`	Names of the window features included in the matrix `X_train` created internally for training.
`X_train_exog_names_out_`	`list`	Names of the exogenous variables included in the matrix `X_train` created internally for training. It can be different from `exog_names_in_` if some exogenous variables are transformed during the training process.
`X_train_direct_exog_names_out_`	`list`	Same as `X_train_exog_names_out_` but using the direct format. The same exogenous variable is repeated for each step.
`X_train_features_names_out_`	`list`	Names of columns of the matrix created internally for training.
`fit_kwargs`	`dict`	Additional arguments to be passed to the `fit` method of the estimator.
`in_sample_residuals_`	`dict`	Residuals of the model when predicting training data. Only stored up to 10_000 values per series in the form `{series: residuals}`. If `transformer_series` is not `None`, residuals are stored in the transformed scale. If `differentiation` is not `None`, residuals are stored after differentiation.
`in_sample_residuals_by_bin_`	`dict`	In-sample residuals binned according to the predicted value each residual is associated with. The number of residuals stored per bin is limited to `10_000 // self.binner.n_bins_` per series in the form `{series: residuals}`. If `transformer_series` is not `None`, residuals are stored in the transformed scale. If `differentiation` is not `None`, residuals are stored after differentiation. New in version 0.15.0
`out_sample_residuals_`	`dict`	Residuals of the model when predicting non-training data. Only stored up to 10_000 values per series in the form `{series: residuals}`. Use `set_out_sample_residuals()` method to set values. If `transformer_series` is not `None`, residuals are stored in the transformed scale. If `differentiation` is not `None`, residuals are stored after differentiation.
`out_sample_residuals_by_bin_`	`dict`	Out of sample residuals binned according to the predicted value each residual is associated with. The number of residuals stored per bin is limited to `10_000 // self.binner.n_bins_` per series in the form `{series: residuals}`. If `transformer_series` is not `None`, residuals are stored in the transformed scale. If `differentiation` is not `None`, residuals are stored after differentiation. New in version 0.15.0
`binner`	`dict`	Dictionary of `skforecast.preprocessing.QuantileBinner` used to discretize residuals of each series into k bins according to the predicted values associated with each residual. In the form `{series: binner}`. New in version 0.15.0
`binner_intervals_`	`dict`	Intervals used to discretize residuals into k bins according to the predicted values associated with each residual. In the form `{series: binner_intervals_}`. New in version 0.15.0
`binner_kwargs`	`dict`	Additional arguments to pass to the `QuantileBinner`. New in version 0.15.0
`creation_date`	`str`	Date of creation.
`is_fitted`	`bool`	Tag to identify if the estimator has been fitted (trained).
`fit_date`	`str`	Date of last fit.
`skforecast_version`	`str`	Version of skforecast library used to create the forecaster.
`python_version`	`str`	Version of python used to create the forecaster.
`n_jobs`	`(int, 'auto')`	The number of jobs to run in parallel. If `-1`, then the number of jobs is set to the number of cores. If 'auto', `n_jobs` is set using the function skforecast.utils.select_n_jobs_fit_forecaster.
`forecaster_id`	`(str, int)`	Name used as an identifier of the forecaster.
`__skforecast_tags__`	`dict`	Tags associated with the forecaster.
`_probabilistic_mode`	`(str, bool)`	Private attribute used to indicate whether the forecaster should perform some calculations during backtesting.
`dropna_from_series`	`Ignored`	Not used, present here for API consistency by convention.
`encoding`	`Ignored`	Not used, present here for API consistency by convention.

Notes

A separate model is created for each forecasting time step. It is important to note that all models share the same parameter and hyperparameter configuration.

Methods:

Name	Description
`create_train_X_y`	Create training matrices from multiple time series and exogenous
`filter_train_X_y_for_step`	Select the columns needed to train a forecaster for a specific step.
`create_sample_weights`	Create weights for each observation according to the forecaster's attribute
`fit`	Training Forecaster.
`create_predict_X`	Create the predictors needed to predict `steps` ahead.
`predict`	Predict n steps ahead
`predict_bootstrapping`	Generate multiple forecasting predictions using a bootstrapping process.
`predict_interval`	Predict n steps ahead and estimate prediction intervals using either
`predict_quantiles`	Bootstrapping based predicted quantiles.
`predict_dist`	Fit a given probability distribution for each step. After generating
`set_params`	Set new values to the parameters of the scikit-learn model stored in the
`set_fit_kwargs`	Set new values for the additional keyword arguments passed to the `fit`
`set_lags`	Set new value to the attribute `lags`. Attributes `lags_names`,
`set_window_features`	Set new value to the attribute `window_features`. Attributes
`set_in_sample_residuals`	Set in-sample residuals in case they were not calculated during the
`set_out_sample_residuals`	Set new values to the attribute `out_sample_residuals_`. Out of sample
`get_feature_importances`	Return feature importance of the model stored in the forecaster for a

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def __init__(
    self,
    level: str,
    steps: int,
    estimator: object = None,
    lags: int | list[int] | np.ndarray[int] | range[int] | dict[str, int | list] | None = None,
    window_features: object | list[object] | None = None,
    transformer_series: object | dict[str, object] | None = StandardScaler(),
    transformer_exog: object | None = None,
    weight_func: Callable | None = None,
    differentiation: int | None = None,
    fit_kwargs: dict[str, object] | None = None,
    binner_kwargs: dict[str, object] | None = None,
    n_jobs: int | str = 'auto',
    forecaster_id: str | int | None = None,
    regressor: object = None
) -> None:

    self.estimator                          = copy(initialize_estimator(estimator, regressor))
    self.level                              = level
    self.lags_                              = None
    self.transformer_series                 = transformer_series
    self.transformer_series_                = None
    self.transformer_exog                   = transformer_exog
    self.weight_func                        = weight_func
    self.source_code_weight_func            = None
    self.differentiation                    = differentiation
    self.differentiation_max                = None
    self.differentiator                     = None
    self.differentiator_                    = None
    self.last_window_                       = None
    self.index_type_                        = None
    self.index_freq_                        = None
    self.training_range_                    = None
    self.series_names_in_                   = None
    self.exog_in_                           = False
    self.exog_names_in_                     = None
    self.exog_type_in_                      = None
    self.exog_dtypes_in_                    = None
    self.exog_dtypes_out_                   = None
    self.X_train_series_names_in_           = None
    self.X_train_window_features_names_out_ = None
    self.X_train_exog_names_out_            = None
    self.X_train_direct_exog_names_out_     = None
    self.X_train_features_names_out_        = None
    self.in_sample_residuals_               = None
    self.out_sample_residuals_              = None
    self.in_sample_residuals_by_bin_        = None
    self.out_sample_residuals_by_bin_       = None
    self.creation_date                      = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
    self.is_fitted                          = False
    self.fit_date                           = None
    self.skforecast_version                 = __version__
    self.python_version                     = sys.version.split(" ")[0]
    self.forecaster_id                      = forecaster_id
    self._probabilistic_mode                = "binned"
    self.dropna_from_series                 = False  # Ignored in this forecaster
    self.encoding                           = None   # Ignored in this forecaster

    if not isinstance(level, str):
        raise TypeError(
            f"`level` argument must be a str. Got {type(level)}."
        )

    if not isinstance(steps, int):
        raise TypeError(
            f"`steps` argument must be an int greater than or equal to 1. "
            f"Got {type(steps)}."
        )

    if steps < 1:
        raise ValueError(
            f"`steps` argument must be greater than or equal to 1. Got {steps}."
        )

    self.steps    = np.arange(steps) + 1
    self.max_step = steps

    self.estimators_ = {step: clone(self.estimator) for step in self.steps}

    if isinstance(lags, dict):
        self.lags = {}
        self.lags_names = {}
        list_max_lags = []
        for key in lags:
            if lags[key] is None:
                self.lags[key] = None
                self.lags_names[key] = None
            else:
                self.lags[key], lags_names, max_lag = initialize_lags(
                    forecaster_name = type(self).__name__,
                    lags            = lags[key]
                )
                self.lags_names[key] = (
                    [f'{key}_{lag}' for lag in lags_names] 
                     if lags_names is not None 
                     else None
                )
                if max_lag is not None:
                    list_max_lags.append(max_lag)

        self.max_lag = max(list_max_lags) if len(list_max_lags) != 0 else None
    else:
        self.lags, self.lags_names, self.max_lag = initialize_lags(
            forecaster_name = type(self).__name__, 
            lags            = lags
        )

    self.window_features, self.window_features_names, self.max_size_window_features = (
        initialize_window_features(window_features)
    )
    if self.window_features is None and (self.lags is None or self.max_lag is None):
        raise ValueError(
            "At least one of the arguments `lags` or `window_features` "
            "must be different from None. This is required to create the "
            "predictors used in training the forecaster."
        )

    self.window_size = max(
        [ws for ws in [self.max_lag, self.max_size_window_features] 
         if ws is not None]
    )
    self.window_features_class_names = None
    if window_features is not None:
        self.window_features_class_names = [
            type(wf).__name__ for wf in self.window_features
        ]

    self.weight_func, self.source_code_weight_func, _ = initialize_weights(
        forecaster_name = type(self).__name__, 
        estimator       = estimator, 
        weight_func     = weight_func, 
        series_weights  = None
    )

    if differentiation is not None:
        if not isinstance(differentiation, int) or differentiation < 1:
            raise ValueError(
                f"Argument `differentiation` must be an integer equal to or "
                f"greater than 1. Got {differentiation}."
            )
        self.differentiation = differentiation
        self.differentiation_max = differentiation
        self.window_size += differentiation
        self.differentiator = TimeSeriesDifferentiator(
            order=differentiation, window_size=self.window_size
        )

    self.fit_kwargs = check_select_fit_kwargs(
                          estimator  = estimator,
                          fit_kwargs = fit_kwargs
                      )

    self.binner = {}
    self.binner_intervals_ = {}
    self.binner_kwargs = binner_kwargs
    if binner_kwargs is None:
        self.binner_kwargs = {
            'n_bins': 10, 'method': 'linear', 'subsample': 200000,
            'random_state': 789654, 'dtype': np.float64
        }

    if n_jobs == 'auto':
        self.n_jobs = select_n_jobs_fit_forecaster(
                          forecaster_name = type(self).__name__,
                          estimator       = self.estimator
                      )
    else:
        if not isinstance(n_jobs, int):
            raise TypeError(
                f"`n_jobs` must be an integer or `'auto'`. Got {type(n_jobs)}."
            )
        self.n_jobs = n_jobs if n_jobs > 0 else cpu_count()

    self.__skforecast_tags__ = {
        "library": "skforecast",
        "forecaster_name": "ForecasterDirectMultiVariate",
        "forecaster_task": "regression",
        "forecasting_scope": "global",  # single-series | global
        "forecasting_strategy": "direct",  # recursive | direct | deep_learning
        "index_types_supported": ["pandas.RangeIndex", "pandas.DatetimeIndex"],
        "requires_index_frequency": True,

        "allowed_input_types_series": ["pandas.DataFrame"],
        "supports_exog": True,
        "allowed_input_types_exog": ["pandas.Series", "pandas.DataFrame"],
        "handles_missing_values_series": False, 
        "handles_missing_values_exog": True, 

        "supports_lags": True,
        "supports_window_features": True,
        "supports_transformer_series": True,
        "supports_transformer_exog": True,
        "supports_weight_func": True,
        "supports_series_weights": False,
        "supports_differentiation": True,

        "prediction_types": ["point", "interval", "bootstrapping", "quantiles", "distribution"],
        "supports_probabilistic": True,
        "probabilistic_methods": ["bootstrapping", "conformal"],
        "handles_binned_residuals": True
    }

Attributes¶

estimator `instance-attribute` ¶


estimator = copy(initialize_estimator(estimator, regressor))

level `instance-attribute` ¶


level = level

lags_ `instance-attribute` ¶


lags_ = None

transformer_series `instance-attribute` ¶


transformer_series = transformer_series

transformer_series_ `instance-attribute` ¶


transformer_series_ = None

transformer_exog `instance-attribute` ¶


transformer_exog = transformer_exog

weight_func `instance-attribute` ¶


weight_func = weight_func

source_code_weight_func `instance-attribute` ¶


source_code_weight_func = None

differentiation `instance-attribute` ¶


differentiation = differentiation

differentiation_max `instance-attribute` ¶


differentiation_max = None

differentiator `instance-attribute` ¶


differentiator = None

differentiator_ `instance-attribute` ¶


differentiator_ = None

last_window_ `instance-attribute` ¶


last_window_ = None

index_type_ `instance-attribute` ¶


index_type_ = None

index_freq_ `instance-attribute` ¶


index_freq_ = None

training_range_ `instance-attribute` ¶


training_range_ = None

series_names_in_ `instance-attribute` ¶


series_names_in_ = None

exog_in_ `instance-attribute` ¶


exog_in_ = False

exog_names_in_ `instance-attribute` ¶


exog_names_in_ = None

exog_type_in_ `instance-attribute` ¶


exog_type_in_ = None

exog_dtypes_in_ `instance-attribute` ¶


exog_dtypes_in_ = None

exog_dtypes_out_ `instance-attribute` ¶


exog_dtypes_out_ = None

X_train_series_names_in_ `instance-attribute` ¶


X_train_series_names_in_ = None

X_train_window_features_names_out_ `instance-attribute` ¶


X_train_window_features_names_out_ = None

X_train_exog_names_out_ `instance-attribute` ¶


X_train_exog_names_out_ = None

X_train_direct_exog_names_out_ `instance-attribute` ¶


X_train_direct_exog_names_out_ = None

X_train_features_names_out_ `instance-attribute` ¶


X_train_features_names_out_ = None

in_sample_residuals_ `instance-attribute` ¶


in_sample_residuals_ = None

out_sample_residuals_ `instance-attribute` ¶


out_sample_residuals_ = None

in_sample_residuals_by_bin_ `instance-attribute` ¶


in_sample_residuals_by_bin_ = None

out_sample_residuals_by_bin_ `instance-attribute` ¶


out_sample_residuals_by_bin_ = None

creation_date `instance-attribute` ¶


creation_date = strftime('%Y-%m-%d %H:%M:%S')

is_fitted `instance-attribute` ¶


is_fitted = False

fit_date `instance-attribute` ¶


fit_date = None

skforecast_version `instance-attribute` ¶


skforecast_version = __version__

python_version `instance-attribute` ¶


python_version = split(' ')[0]

forecaster_id `instance-attribute` ¶


forecaster_id = forecaster_id

_probabilistic_mode `instance-attribute` ¶


_probabilistic_mode = 'binned'

dropna_from_series `instance-attribute` ¶


dropna_from_series = False

encoding `instance-attribute` ¶


encoding = None

steps `instance-attribute` ¶


steps = arange(steps) + 1

max_step `instance-attribute` ¶


max_step = steps

estimators_ `instance-attribute` ¶


estimators_ = {step: (clone(estimator)) for step in (steps)}

lags `instance-attribute` ¶


lags = {}

lags_names `instance-attribute` ¶


lags_names = {}

max_lag `instance-attribute` ¶


max_lag = (
    max(list_max_lags) if len(list_max_lags) != 0 else None
)

window_size `instance-attribute` ¶


window_size = max(
    [
        ws
        for ws in [max_lag, max_size_window_features]
        if ws is not None
    ]
)

window_features_class_names `instance-attribute` ¶


window_features_class_names = None

fit_kwargs `instance-attribute` ¶


fit_kwargs = check_select_fit_kwargs(
    estimator=estimator, fit_kwargs=fit_kwargs
)

binner `instance-attribute` ¶


binner = {}

binner_intervals_ `instance-attribute` ¶


binner_intervals_ = {}

binner_kwargs `instance-attribute` ¶


binner_kwargs = binner_kwargs

n_jobs `instance-attribute` ¶


n_jobs = select_n_jobs_fit_forecaster(
    forecaster_name=__name__, estimator=estimator
)

Functions¶

_repr_html_ ¶


_repr_html_()

HTML representation of the object. The "General Information" section is expanded by default.

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def _repr_html_(self) -> str:
    """
    HTML representation of the object.
    The "General Information" section is expanded by default.
    """

    (
        params,
        _,
        series_names_in_,
        exog_names_in_,
        transformer_series,
    ) = self._preprocess_repr(
            estimator          = self.estimator,
            series_names_in_   = self.series_names_in_,
            exog_names_in_     = self.exog_names_in_,
            transformer_series = self.transformer_series,
        )

    style, unique_id = get_style_repr_html(self.is_fitted)

    content = f"""
    <div class="container-{unique_id}">
        <p style="font-size: 1.5em; font-weight: bold; margin-block-start: 0.83em; margin-block-end: 0.83em;">{type(self).__name__}</p>
        <details open>
            <summary>General Information</summary>
            <ul>
                <li><strong>Estimator:</strong> {type(self.estimator).__name__}</li>
                <li><strong>Target series (level):</strong> {self.level}</li>
                <li><strong>Lags:</strong> {self.lags}</li>
                <li><strong>Window features:</strong> {self.window_features_names}</li>
                <li><strong>Window size:</strong> {self.window_size}</li>
                <li><strong>Maximum steps to predict:</strong> {self.max_step}</li>
                <li><strong>Exogenous included:</strong> {self.exog_in_}</li>
                <li><strong>Weight function included:</strong> {self.weight_func is not None}</li>
                <li><strong>Differentiation order:</strong> {self.differentiation}</li>
                <li><strong>Creation date:</strong> {self.creation_date}</li>
                <li><strong>Last fit date:</strong> {self.fit_date}</li>
                <li><strong>Skforecast version:</strong> {self.skforecast_version}</li>
                <li><strong>Python version:</strong> {self.python_version}</li>
                <li><strong>Forecaster id:</strong> {self.forecaster_id}</li>
            </ul>
        </details>
        <details>
            <summary>Exogenous Variables</summary>
            <ul>
                {exog_names_in_}
            </ul>
        </details>
        <details>
            <summary>Data Transformations</summary>
            <ul>
                <li><strong>Transformer for series:</strong> {transformer_series}</li>
                <li><strong>Transformer for exog:</strong> {self.transformer_exog}</li>
            </ul>
        </details>
        <details>
            <summary>Training Information</summary>
            <ul>
                <li><strong>Target series (level):</strong> {self.level}</li>
                <li><strong>Multivariate series:</strong> {series_names_in_}</li>
                <li><strong>Training range:</strong> {self.training_range_.to_list() if self.is_fitted else 'Not fitted'}</li>
                <li><strong>Training index type:</strong> {str(self.index_type_).split('.')[-1][:-2] if self.is_fitted else 'Not fitted'}</li>
                <li><strong>Training index frequency:</strong> {self.index_freq_ if self.is_fitted else 'Not fitted'}</li>
            </ul>
        </details>
        <details>
            <summary>Estimator Parameters</summary>
            <ul>
                {params}
            </ul>
        </details>
        <details>
            <summary>Fit Kwargs</summary>
            <ul>
                {self.fit_kwargs}
            </ul>
        </details>
        <p>
            <a href="https://skforecast.org/{__version__}/api/forecasterdirectmultivariate.html">&#128712 <strong>API Reference</strong></a>
            &nbsp;&nbsp;
            <a href="https://skforecast.org/{__version__}/user_guides/dependent-multi-series-multivariate-forecasting.html">&#128462 <strong>User Guide</strong></a>
        </p>
    </div>
    """

    # Return the combined style and content
    return style + content

_create_data_to_return_dict ¶


_create_data_to_return_dict(series_names_in_)

Create data_to_return_dict based on series names and lags configuration. The dictionary contains the information to decide what data to return in the _create_lags method.

Parameters:

Name	Type	Description	Default
`series_names_in_`	`list`	Names of the series used during training.	required

Returns:

Name	Type	Description
`data_to_return_dict`	`dict`	Dictionary with the information to decide what data to return in the `_create_lags` method. Options are 'X', 'y' or 'both'.
`X_train_series_names_in_`	`list`	Names of the series added to `X_train` when creating the training matrices with `_create_train_X_y` method. It is a subset of `series_names_in_`.

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def _create_data_to_return_dict(
    self, 
    series_names_in_: list[str]
) -> tuple[dict[str, str], list[str]]:
    """
    Create `data_to_return_dict` based on series names and lags configuration.
    The dictionary contains the information to decide what data to return in 
    the `_create_lags` method.

    Parameters
    ----------
    series_names_in_ : list
        Names of the series used during training.

    Returns
    -------
    data_to_return_dict : dict
        Dictionary with the information to decide what data to return in the
        `_create_lags` method. Options are 'X', 'y' or 'both'.
    X_train_series_names_in_ : list
        Names of the series added to `X_train` when creating the training
        matrices with `_create_train_X_y` method. It is a subset of 
        `series_names_in_`.

    """

    if isinstance(self.lags, dict):
        lags_keys = list(self.lags.keys())
        if set(lags_keys) != set(series_names_in_):  # Set to avoid order
            raise ValueError(
                f"When `lags` parameter is a `dict`, its keys must be the "
                f"same as `series` column names. If don't want to include lags, "
                 "add '{column: None}' to the lags dict.\n"
                f"  Lags keys        : {lags_keys}.\n"
                f"  `series` columns : {series_names_in_}."
            )
        self.lags_ = copy(self.lags)
    else:
        self.lags_ = {series: self.lags for series in series_names_in_}
        if self.lags is not None:
            # Defined `lags_names` here to avoid overwriting when fit and then create_train_X_y
            lags_names = [f'lag_{i}' for i in self.lags]
            self.lags_names = {
                series: [f'{series}_{lag}' for lag in lags_names]
                for series in series_names_in_
            }
        else:
            self.lags_names = {series: None for series in series_names_in_}

    X_train_series_names_in_ = series_names_in_
    if self.lags is None:
        data_to_return_dict = {self.level: 'y'}
    else:
        # If col is not level and has lags, create 'X' if no lags don't include
        # If col is level, create 'both' (`X` and `y`)
        data_to_return_dict = {
            col: ('both' if col == self.level else 'X')
            for col in series_names_in_
            if col == self.level or self.lags_.get(col) is not None
        }

        # Adjust 'level' in case self.lags_[level] is None
        if self.lags_.get(self.level) is None:
            data_to_return_dict[self.level] = 'y'

        if self.window_features is None:
            # X_train_series_names_in_ include series that will be added to X_train
            X_train_series_names_in_ = [
                col for col in data_to_return_dict.keys()
                if data_to_return_dict[col] in ['X', 'both']
            ]

    return data_to_return_dict, X_train_series_names_in_

_create_lags ¶


_create_lags(y, lags, data_to_return='both')

Create the lagged values and their target variable from a time series.

Note that the returned matrix X_data contains the lag 1 in the first column, the lag 2 in the in the second column and so on.

The returned matrices are views into the original y so care must be taken when modifying them.

Parameters:

Name	Type	Description	Default
`y`	`numpy ndarray`	Training time series values.	required
`lags`	`numpy ndarray`	lags to create.	required
`data_to_return`	`str`	Specifies which data to return. Options are 'X', 'y', 'both' or None.	`'both'`

Returns:

Name	Type	Description
`X_data`	`numpy ndarray, None`	Lagged values (predictors).
`y_data`	`numpy ndarray, None`	Values of the time series related to each row of `X_data`.

Notes

Returned matrices are views into the original y so care must be taken when modifying them.

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def _create_lags(
    self, 
    y: np.ndarray,
    lags: np.ndarray,
    data_to_return: str | None = 'both'
) -> tuple[np.ndarray | None, np.ndarray | None]:
    """
    Create the lagged values and their target variable from a time series.

    Note that the returned matrix `X_data` contains the lag 1 in the first 
    column, the lag 2 in the in the second column and so on.

    The returned matrices are views into the original `y` so care must be taken
    when modifying them.

    Parameters
    ----------
    y : numpy ndarray
        Training time series values.
    lags : numpy ndarray
        lags to create.
    data_to_return : str, default 'both'
        Specifies which data to return. Options are 'X', 'y', 'both' or None.

    Returns
    -------
    X_data : numpy ndarray, None
        Lagged values (predictors).
    y_data : numpy ndarray, None
        Values of the time series related to each row of `X_data`.

    Notes
    -----
    Returned matrices are views into the original `y` so care must be taken
    when modifying them.

    """

    X_data = None
    y_data = None
    if data_to_return is not None:
        n_rows = len(y) - self.window_size - (self.max_step - 1)
        windows = np.lib.stride_tricks.sliding_window_view(y, self.window_size + self.max_step)

        if data_to_return != 'y':
            # If `data_to_return` is not 'y', it means is 'X' or 'both', X_data is created
            lag_indices = [self.window_size - lag for lag in lags]
            X_data = windows[:n_rows, lag_indices]

        if data_to_return != 'X':
            # If `data_to_return` is not 'X', it means is 'y' or 'both', y_data is created
            y_data = windows[:n_rows, self.window_size : self.window_size + self.max_step]

    return X_data, y_data

_create_window_features ¶


_create_window_features(y, train_index, X_as_pandas=False)

Create window features from a time series.

Parameters:

Name	Type	Description	Default
`y`	`pandas Series`	Training time series.	required
`train_index`	`pandas Index`	Index of the training data. It is used to create the pandas DataFrame `X_train_window_features` when `X_as_pandas` is `True`.	required
`X_as_pandas`	`bool`	If `True`, the returned matrix `X_train_window_features` is a pandas DataFrame.	`False`

Returns:

Name	Type	Description
`X_train_window_features`	`list`	List of numpy ndarrays or pandas DataFrames with the window features.
`X_train_window_features_names_out_`	`list`	Names of the window features.

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def _create_window_features(
    self, 
    y: pd.Series,
    train_index: pd.Index,
    X_as_pandas: bool = False,
) -> tuple[list[np.ndarray | pd.DataFrame], list[str]]:
    """
    Create window features from a time series.

    Parameters
    ----------
    y : pandas Series
        Training time series.
    train_index : pandas Index
        Index of the training data. It is used to create the pandas DataFrame
        `X_train_window_features` when `X_as_pandas` is `True`.
    X_as_pandas : bool, default False
        If `True`, the returned matrix `X_train_window_features` is a 
        pandas DataFrame.

    Returns
    -------
    X_train_window_features : list
        List of numpy ndarrays or pandas DataFrames with the window features.
    X_train_window_features_names_out_ : list
        Names of the window features.

    """

    len_train_index = len(train_index)
    X_train_window_features = []
    X_train_window_features_names_out_ = []
    for wf in self.window_features:
        X_train_wf = wf.transform_batch(y)
        if not isinstance(X_train_wf, pd.DataFrame):
            raise TypeError(
                f"The method `transform_batch` of {type(wf).__name__} "
                f"must return a pandas DataFrame."
            )
        X_train_wf = X_train_wf.iloc[-len_train_index:]
        if not len(X_train_wf) == len_train_index:
            raise ValueError(
                f"The method `transform_batch` of {type(wf).__name__} "
                f"must return a DataFrame with the same number of rows as "
                f"the input time series - (`window_size` + (`steps` - 1)): {len_train_index}."
            )
        X_train_wf.index = train_index

        X_train_wf.columns = [f'{y.name}_{col}' for col in X_train_wf.columns]
        X_train_window_features_names_out_.extend(X_train_wf.columns)
        if not X_as_pandas:
            X_train_wf = X_train_wf.to_numpy()     
        X_train_window_features.append(X_train_wf)

    return X_train_window_features, X_train_window_features_names_out_

_create_train_X_y ¶


_create_train_X_y(series, exog=None)

Create training matrices from multiple time series and exogenous variables. The resulting matrices contain the target variable and predictors needed to train all the estimators (one per step).

Parameters:

Name	Type	Description	Default
`series`	`pandas DataFrame`	Training time series.	required
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s. Must have the same number of observations as `series` and their indexes must be aligned.	`None`

Returns:

Name	Type	Description
`X_train`	`pandas DataFrame`	Training values (predictors) for each step. Note that the index corresponds to that of the last step. It is updated for the corresponding step in the filter_train_X_y_for_step method.
`y_train`	`dict`	Values of the time series related to each row of `X_train` for each step in the form {step: y_step_[i]}.
`series_names_in_`	`list`	Names of the series used during training.
`X_train_series_names_in_`	`list`	Names of the series added to `X_train` when creating the training matrices with `_create_train_X_y` method. It is a subset of `series_names_in_`.
`exog_names_in_`	`list`	Names of the exogenous variables included in the training matrices.
`X_train_exog_names_out_`	`list`	Names of the exogenous variables included in the matrix `X_train` created internally for training. It can be different from `exog_names_in_` if some exogenous variables are transformed during the training process.
`X_train_features_names_out_`	`list`	Names of the columns of the matrix created internally for training.
`exog_dtypes_in_`	`dict`	Type of each exogenous variable/s used in training before the transformation applied by `transformer_exog`. If `transformer_exog` is not used, it is equal to `exog_dtypes_out_`.
`exog_dtypes_out_`	`dict`	Type of each exogenous variable/s used in training after the transformation applied by `transformer_exog`. If `transformer_exog` is not used, it is equal to `exog_dtypes_in_`.

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def _create_train_X_y(
    self,
    series: pd.DataFrame,
    exog: pd.Series | pd.DataFrame | None = None
) -> tuple[
    pd.DataFrame, 
    dict[int, pd.Series], 
    list[str], 
    list[str], 
    list[str], 
    list[str], 
    list[str], 
    dict[str, type], 
    dict[str, type]
]:
    """
    Create training matrices from multiple time series and exogenous
    variables. The resulting matrices contain the target variable and predictors
    needed to train all the estimators (one per step).

    Parameters
    ----------
    series : pandas DataFrame
        Training time series.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `series` and their indexes must be aligned.

    Returns
    -------
    X_train : pandas DataFrame
        Training values (predictors) for each step. Note that the index 
        corresponds to that of the last step. It is updated for the corresponding 
        step in the filter_train_X_y_for_step method.
    y_train : dict
        Values of the time series related to each row of `X_train` for each 
        step in the form {step: y_step_[i]}.
    series_names_in_ : list
        Names of the series used during training.
    X_train_series_names_in_ : list
        Names of the series added to `X_train` when creating the training
        matrices with `_create_train_X_y` method. It is a subset of 
        `series_names_in_`.
    exog_names_in_ : list
        Names of the exogenous variables included in the training matrices.
    X_train_exog_names_out_ : list
        Names of the exogenous variables included in the matrix `X_train` created
        internally for training. It can be different from `exog_names_in_` if
        some exogenous variables are transformed during the training process.
    X_train_features_names_out_ : list
        Names of the columns of the matrix created internally for training.
    exog_dtypes_in_ : dict
        Type of each exogenous variable/s used in training before the transformation
        applied by `transformer_exog`. If `transformer_exog` is not used, it
        is equal to `exog_dtypes_out_`.
    exog_dtypes_out_ : dict
        Type of each exogenous variable/s used in training after the transformation 
        applied by `transformer_exog`. If `transformer_exog` is not used, it 
        is equal to `exog_dtypes_in_`.

    """

    if not isinstance(series, pd.DataFrame):
        raise TypeError(
            f"`series` must be a pandas DataFrame. Got {type(series)}."
        )

    if len(series) < self.window_size + self.max_step:
        raise ValueError(
            f"Minimum length of `series` for training this forecaster is "
            f"{self.window_size + self.max_step}. Reduce the number of "
            f"predicted steps, {self.max_step}, or the maximum "
            f"window_size, {self.window_size}, if no more data is available.\n"
            f"    Length `series`: {len(series)}.\n"
            f"    Max step : {self.max_step}.\n"
            f"    Max window size: {self.window_size}.\n"
            f"    Lags window size: {self.max_lag}.\n"
            f"    Window features window size: {self.max_size_window_features}."
        )

    _, series_index = check_extract_values_and_index(
        data=series, data_label="`series`", return_values=False
    )
    series_names_in_ = list(series.columns)

    if self.level not in series_names_in_:
        raise ValueError(
            f"One of the `series` columns must be named as the `level` of the forecaster.\n"
            f"  Forecaster `level` : {self.level}.\n"
            f"  `series` columns   : {series_names_in_}."
        )

    data_to_return_dict, X_train_series_names_in_ = (
        self._create_data_to_return_dict(series_names_in_=series_names_in_)
    )

    series_to_create_autoreg_features_and_y = [
        col for col in series_names_in_ 
        if col in X_train_series_names_in_ + [self.level]
    ]

    fit_transformer = False
    if not self.is_fitted:
        fit_transformer = True
        self.transformer_series_ = initialize_transformer_series(
                                       forecaster_name    = type(self).__name__,
                                       series_names_in_   = series_to_create_autoreg_features_and_y,
                                       transformer_series = self.transformer_series
                                   )

    if self.differentiation is None:
        self.differentiator_ = {
            serie: None for serie in series_to_create_autoreg_features_and_y
        }
    else:
        if not self.is_fitted:
            self.differentiator_ = {
                serie: copy(self.differentiator)
                for serie in series_to_create_autoreg_features_and_y
            }

    exog_names_in_ = None
    exog_dtypes_in_ = None
    exog_dtypes_out_ = None
    X_as_pandas = False
    if exog is not None:
        check_exog(exog=exog, allow_nan=True)
        exog = input_to_frame(data=exog, input_name='exog')
        _, exog_index = check_extract_values_and_index(
            data=exog, data_label='`exog`', ignore_freq=True, return_values=False
        )

        series_index_no_ws = series_index[self.window_size:]
        len_series = len(series)
        len_series_no_ws = len_series - self.window_size
        len_exog = len(exog)
        if not len_exog == len_series and not len_exog == len_series_no_ws:
            raise ValueError(
                f"Length of `exog` must be equal to the length of `series` (if "
                f"index is fully aligned) or length of `series` - `window_size` "
                f"(if `exog` starts after the first `window_size` values).\n"
                f"    `exog`                   : ({exog_index[0]} -- {exog_index[-1]})  (n={len_exog})\n"
                f"    `series`                 : ({series_index[0]} -- {series_index[-1]})  (n={len_series})\n"
                f"    `series` - `window_size` : ({series_index_no_ws[0]} -- {series_index_no_ws[-1]})  (n={len_series_no_ws})"
            )

        exog_names_in_ = exog.columns.to_list()
        if len(set(exog_names_in_) - set(series_names_in_)) != len(exog_names_in_):
            raise ValueError(
                f"`exog` cannot contain a column named the same as one of "
                f"the series (column names of series).\n"
                f"  `series` columns : {series_names_in_}.\n"
                f"  `exog`   columns : {exog_names_in_}."
            )

        # NOTE: Need here for filter_train_X_y_for_step to work without fitting
        self.exog_in_ = True
        exog_dtypes_in_ = get_exog_dtypes(exog=exog)

        exog = transform_dataframe(
                   df                = exog,
                   transformer       = self.transformer_exog,
                   fit               = fit_transformer,
                   inverse_transform = False
               )

        check_exog_dtypes(exog, call_check_exog=True)
        exog_dtypes_out_ = get_exog_dtypes(exog=exog)
        X_as_pandas = any(
            not pd.api.types.is_numeric_dtype(dtype) or pd.api.types.is_bool_dtype(dtype) 
            for dtype in set(exog.dtypes)
        )

        if len_exog == len_series:
            if not (exog_index == series_index).all():
                raise ValueError(
                    "When `exog` has the same length as `series`, the index "
                    "of `exog` must be aligned with the index of `series` "
                    "to ensure the correct alignment of values."
                )
            # The first `self.window_size` positions have to be removed from 
            # exog since they are not in X_train.
            exog = exog.iloc[self.window_size:, ]
        else:
            if not (exog_index == series_index_no_ws).all():
                raise ValueError(
                    "When `exog` doesn't contain the first `window_size` "
                    "observations, the index of `exog` must be aligned with "
                    "the index of `series` minus the first `window_size` "
                    "observations to ensure the correct alignment of values."
                )

    X_train_autoreg = []
    X_train_window_features_names_out_ = [] if self.window_features is not None else None
    X_train_features_names_out_ = []
    train_index = series_index[self.window_size + (self.max_step - 1):]
    for col in series_to_create_autoreg_features_and_y:

        y_values = series[col].to_numpy(copy=True).ravel()
        if np.isnan(y_values).any():
            raise ValueError(f"Column '{col}' has missing values.")

        y_values = transform_numpy(
                       array             = y_values,
                       transformer       = self.transformer_series_[col],
                       fit               = fit_transformer,
                       inverse_transform = False
                   )

        if self.differentiation is not None:
            if not self.is_fitted:
                y_values = self.differentiator_[col].fit_transform(y_values)
            else:
                differentiator = copy(self.differentiator_[col])
                y_values = differentiator.fit_transform(y_values)

        X_train_autoreg_col = []

        X_train_lags, y_train_values = self._create_lags(
            y=y_values, lags=self.lags_[col], data_to_return=data_to_return_dict.get(col, None)
        )
        if X_train_lags is not None:
            X_train_autoreg_col.append(X_train_lags)
            X_train_features_names_out_.extend(self.lags_names[col])

        if col == self.level:
            y_train = y_train_values

        if self.window_features is not None:
            n_diff = 0 if self.differentiation is None else self.differentiation
            end_wf = None if self.max_step == 1 else -(self.max_step - 1)
            y_window_features = pd.Series(
                y_values[n_diff:end_wf], index=series_index[n_diff:end_wf], name=col
            )
            X_train_window_features, X_train_wf_names_out_ = (
                self._create_window_features(
                    y=y_window_features, X_as_pandas=False, train_index=train_index
                )
            )
            X_train_autoreg_col.extend(X_train_window_features)
            X_train_window_features_names_out_.extend(X_train_wf_names_out_)
            X_train_features_names_out_.extend(X_train_wf_names_out_)

        if X_train_autoreg_col:
            if len(X_train_autoreg_col) == 1:
                X_train_autoreg_col = X_train_autoreg_col[0]
            else:
                X_train_autoreg_col = np.concatenate(X_train_autoreg_col, axis=1)

            X_train_autoreg.append(X_train_autoreg_col)

    X_train = []
    len_train_index = len(train_index)
    if X_as_pandas:
        if len(X_train_autoreg) == 1:
            X_train_autoreg = X_train_autoreg[0]
        else:
            X_train_autoreg = np.concatenate(X_train_autoreg, axis=1)
        X_train_autoreg = pd.DataFrame(
                              data    = X_train_autoreg,
                              columns = X_train_features_names_out_,
                              index   = train_index
                          )
        X_train.append(X_train_autoreg)
    else:
        X_train.extend(X_train_autoreg)

    # NOTE: Need here for filter_train_X_y_for_step to work without fitting
    self.X_train_window_features_names_out_ = X_train_window_features_names_out_

    X_train_exog_names_out_ = None
    if exog is not None:
        X_train_exog_names_out_ = exog.columns.to_list()
        if X_as_pandas:
            exog_direct, X_train_direct_exog_names_out_ = exog_to_direct(
                exog=exog, steps=self.max_step
            )
            exog_direct.index = train_index
        else:
            exog_direct, X_train_direct_exog_names_out_ = exog_to_direct_numpy(
                exog=exog, steps=self.max_step
            )

        # NOTE: Need here for filter_train_X_y_for_step to work without fitting
        self.X_train_direct_exog_names_out_ = X_train_direct_exog_names_out_

        X_train_features_names_out_.extend(self.X_train_direct_exog_names_out_)
        X_train.append(exog_direct)

    if len(X_train) == 1:
        X_train = X_train[0]
    else:
        if X_as_pandas:
            X_train = pd.concat(X_train, axis=1)
        else:
            X_train = np.concatenate(X_train, axis=1)

    if X_as_pandas:
        X_train.index = train_index
    else:
        X_train = pd.DataFrame(
                      data    = X_train,
                      index   = train_index,
                      columns = X_train_features_names_out_
                  )

    # Optimize: pre-compute indices to avoid repeated slicing
    y_train_dict = {}
    for step in self.steps:
        step_idx_start = self.window_size + step - 1
        step_index = series_index[step_idx_start:step_idx_start + len_train_index]
        y_train_dict[step] = pd.Series(
            data=y_train[:, step - 1],
            index=step_index,
            name=f"{self.level}_step_{step}"
        )
    y_train = y_train_dict

    return (
        X_train,
        y_train,
        series_names_in_,
        X_train_series_names_in_,
        exog_names_in_,
        X_train_exog_names_out_,
        X_train_features_names_out_,
        exog_dtypes_in_,
        exog_dtypes_out_
    )

create_train_X_y ¶


create_train_X_y(
    series, exog=None, suppress_warnings=False
)

Create training matrices from multiple time series and exogenous variables. The resulting matrices contain the target variable and predictors needed to train all the estimators (one per step).

Parameters:

Name	Type	Description	Default
`series`	`pandas DataFrame`	Training time series.	required
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s. Must have the same number of observations as `series` and their indexes must be aligned.	`None`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the creation of the training matrices. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`

Returns:

Name	Type	Description
`X_train`	`pandas DataFrame`	Training values (predictors) for each step. Note that the index corresponds to that of the last step. It is updated for the corresponding step in the filter_train_X_y_for_step method.
`y_train`	`dict`	Values of the time series related to each row of `X_train` for each step in the form {step: y_step_[i]}.

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def create_train_X_y(
    self,
    series: pd.DataFrame,
    exog: pd.Series | pd.DataFrame | None = None,
    suppress_warnings: bool = False
) -> tuple[pd.DataFrame, dict[int, pd.Series]]:
    """
    Create training matrices from multiple time series and exogenous
    variables. The resulting matrices contain the target variable and predictors
    needed to train all the estimators (one per step).

    Parameters
    ----------
    series : pandas DataFrame
        Training time series.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `series` and their indexes must be aligned.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the creation
        of the training matrices. See skforecast.exceptions.warn_skforecast_categories 
        for more information.

    Returns
    -------
    X_train : pandas DataFrame
        Training values (predictors) for each step. Note that the index 
        corresponds to that of the last step. It is updated for the corresponding 
        step in the filter_train_X_y_for_step method.
    y_train : dict
        Values of the time series related to each row of `X_train` for each 
        step in the form {step: y_step_[i]}.

    """

    set_skforecast_warnings(suppress_warnings, action='ignore')

    output = self._create_train_X_y(
                 series = series, 
                 exog   = exog
             )

    X_train = output[0]
    y_train = output[1]

    set_skforecast_warnings(suppress_warnings, action='default')

    return X_train, y_train

filter_train_X_y_for_step ¶


filter_train_X_y_for_step(
    step, X_train, y_train, remove_suffix=False
)

Select the columns needed to train a forecaster for a specific step.
The input matrices should be created using _create_train_X_y method. This method updates the index of X_train to the corresponding one according to y_train. If remove_suffix=True the suffix "_step_i" will be removed from the column names.

Parameters:

Name	Type	Description	Default
`step`	`int`	step for which columns must be selected. Starts at 1.	required
`X_train`	`pandas DataFrame`	Dataframe created with the `_create_train_X_y` method, first return.	required
`y_train`	`dict`	Dict created with the `_create_train_X_y` method, second return.	required
`remove_suffix`	`bool`	If True, suffix "_step_i" is removed from the column names.	`False`

Returns:

Name	Type	Description
`X_train_step`	`pandas DataFrame`	Training values (predictors) for the selected step.
`y_train_step`	`pandas Series`	Values of the time series related to each row of `X_train`.

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def filter_train_X_y_for_step(
    self,
    step: int,
    X_train: pd.DataFrame,
    y_train: dict[int, pd.Series],
    remove_suffix: bool = False
) -> tuple[pd.DataFrame, pd.Series]:
    """
    Select the columns needed to train a forecaster for a specific step.  
    The input matrices should be created using `_create_train_X_y` method. 
    This method updates the index of `X_train` to the corresponding one 
    according to `y_train`. If `remove_suffix=True` the suffix "_step_i" 
    will be removed from the column names. 

    Parameters
    ----------
    step : int
        step for which columns must be selected. Starts at 1.
    X_train : pandas DataFrame
        Dataframe created with the `_create_train_X_y` method, first return.
    y_train : dict
        Dict created with the `_create_train_X_y` method, second return.
    remove_suffix : bool, default False
        If True, suffix "_step_i" is removed from the column names.

    Returns
    -------
    X_train_step : pandas DataFrame
        Training values (predictors) for the selected step.
    y_train_step : pandas Series
        Values of the time series related to each row of `X_train`.

    """

    if (step < 1) or (step > self.max_step):
        raise ValueError(
            f"Invalid value `step`. For this forecaster, minimum value is 1 "
            f"and the maximum step is {self.max_step}."
        )

    y_train_step = y_train[step]

    # Matrix X_train starts at index 0.
    if not self.exog_in_:
        X_train_step = X_train
    else:
        n_lags = len(list(
            chain(*[v for v in self.lags_.values() if v is not None])
        ))
        n_window_features = (
            len(self.X_train_window_features_names_out_) if self.window_features is not None else 0
        )
        idx_columns_autoreg = np.arange(n_lags + n_window_features)
        n_exog = len(self.X_train_direct_exog_names_out_) / self.max_step
        idx_columns_exog = (
            np.arange((step - 1) * n_exog, (step) * n_exog) + idx_columns_autoreg[-1] + 1 
        )
        idx_columns = np.concatenate((idx_columns_autoreg, idx_columns_exog))
        X_train_step = X_train.iloc[:, idx_columns]

    X_train_step.index = y_train_step.index

    if remove_suffix:
        X_train_step.columns = [
            col_name.replace(f"_step_{step}", "")
            for col_name in X_train_step.columns
        ]
        y_train_step.name = y_train_step.name.replace(f"_step_{step}", "")

    return X_train_step, y_train_step

_train_test_split_one_step_ahead ¶


_train_test_split_one_step_ahead(
    series, initial_train_size, exog=None
)

Create matrices needed to train and test the forecaster for one-step-ahead predictions.

Parameters:

Name	Type	Description	Default
`series`	`pandas DataFrame`	Training time series.	required
`initial_train_size`	`int`	Initial size of the training set. It is the number of observations used to train the forecaster before making the first prediction.	required
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s. Must have the same number of observations as `series` and their indexes must be aligned.	`None`

Returns:

Name	Type	Description
`X_train`	`pandas DataFrame`	Predictor values used to train the model.
`y_train`	`dict`	Values of the time series related to each row of `X_train` for each step in the form {step: y_step_[i]}.
`X_test`	`pandas DataFrame`	Predictor values used to test the model.
`y_test`	`dict`	Values of the time series related to each row of `X_test` for each step in the form {step: y_step_[i]}.
`X_train_encoding`	`pandas Series`	Series identifiers for each row of `X_train`.
`X_test_encoding`	`pandas Series`	Series identifiers for each row of `X_test`.

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def _train_test_split_one_step_ahead(
    self,
    series: pd.DataFrame,
    initial_train_size: int,
    exog: pd.Series | pd.DataFrame | None = None
) -> tuple[
    pd.DataFrame, 
    dict[int, pd.Series], 
    pd.DataFrame, 
    dict[int, pd.Series], 
    pd.Series, 
    pd.Series
]:
    """
    Create matrices needed to train and test the forecaster for one-step-ahead
    predictions.

    Parameters
    ----------
    series : pandas DataFrame
        Training time series.
    initial_train_size : int
        Initial size of the training set. It is the number of observations used
        to train the forecaster before making the first prediction.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `series` and their indexes must be aligned.

    Returns
    -------
    X_train : pandas DataFrame
        Predictor values used to train the model.
    y_train : dict
        Values of the time series related to each row of `X_train` for each 
        step in the form {step: y_step_[i]}.
    X_test : pandas DataFrame
        Predictor values used to test the model.
    y_test : dict
        Values of the time series related to each row of `X_test` for each 
        step in the form {step: y_step_[i]}.
    X_train_encoding : pandas Series
        Series identifiers for each row of `X_train`.
    X_test_encoding : pandas Series
        Series identifiers for each row of `X_test`.

    """

    span_index = series.index

    fold = [
        0,
        [0, initial_train_size],
        [initial_train_size - self.window_size, initial_train_size],
        [initial_train_size - self.window_size, len(span_index)],
        [0, 0],  # Dummy value
        True
    ]
    data_fold = _extract_data_folds_multiseries(
                    series             = series,
                    folds              = [fold],
                    span_index         = span_index,
                    window_size        = self.window_size,
                    exog               = exog,
                    dropna_last_window = self.dropna_from_series,
                    externally_fitted  = False
                )
    series_train, _, levels_last_window, exog_train, exog_test, _ = next(data_fold)

    start_test_idx = initial_train_size - self.window_size
    series_test = series.iloc[start_test_idx:, :]
    series_test = series_test.loc[:, levels_last_window]
    series_test = series_test.dropna(axis=1, how='all')

    _is_fitted = self.is_fitted
    _series_names_in_ = self.series_names_in_
    _exog_names_in_ = self.exog_names_in_

    self.is_fitted = False
    X_train, y_train, series_names_in_, _, exog_names_in_, *_ = (
        self._create_train_X_y(
            series = series_train,
            exog   = exog_train,
        )
    )
    self.series_names_in_ = series_names_in_
    if exog is not None:
        self.exog_names_in_ = exog_names_in_
    self.is_fitted = True

    X_test, y_test, *_ = self._create_train_X_y(
                             series = series_test,
                             exog   = exog_test,
                         )
    self.is_fitted = _is_fitted
    self.series_names_in_ = _series_names_in_
    self.exog_names_in_ = _exog_names_in_

    X_train_encoding = pd.Series(self.level, index=X_train.index)
    X_test_encoding = pd.Series(self.level, index=X_test.index)

    return X_train, y_train, X_test, y_test, X_train_encoding, X_test_encoding

create_sample_weights ¶


create_sample_weights(X_train)

Create weights for each observation according to the forecaster's attribute weight_func.

Parameters:

Name	Type	Description	Default
`X_train`	`pandas DataFrame`	Dataframe created with `_create_train_X_y` and filter_train_X_y_for_step` methods, first return.	required

Returns:

Name	Type	Description
`sample_weight`	`numpy ndarray`	Weights to use in `fit` method.

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def create_sample_weights(
    self,
    X_train: pd.DataFrame
) -> np.ndarray:
    """
    Create weights for each observation according to the forecaster's attribute
    `weight_func`. 

    Parameters
    ----------
    X_train : pandas DataFrame
        Dataframe created with `_create_train_X_y` and filter_train_X_y_for_step`
        methods, first return.

    Returns
    -------
    sample_weight : numpy ndarray
        Weights to use in `fit` method.

    """

    sample_weight = None

    if self.weight_func is not None:
        sample_weight = self.weight_func(X_train.index)

    if sample_weight is not None:
        if np.isnan(sample_weight).any():
            raise ValueError(
                "The resulting `sample_weight` cannot have NaN values."
            )
        if np.any(sample_weight < 0):
            raise ValueError(
                "The resulting `sample_weight` cannot have negative values."
            )
        if np.sum(sample_weight) == 0:
            raise ValueError(
                ("The resulting `sample_weight` cannot be normalized because "
                 "the sum of the weights is zero.")
            )

    return sample_weight

fit ¶


fit(
    series,
    exog=None,
    store_last_window=True,
    store_in_sample_residuals=False,
    random_state=123,
    suppress_warnings=False,
)

Training Forecaster.

Additional arguments to be passed to the fit method of the estimator can be added with the fit_kwargs argument when initializing the forecaster.

Parameters:

Name	Type	Description	Default
`series`	`pandas DataFrame`	Training time series.	required
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s. Must have the same number of observations as `series` and their indexes must be aligned so that series[i] is regressed on exog[i].	`None`
`store_last_window`	`bool`	Whether or not to store the last window (`last_window_`) of training data.	`True`
`store_in_sample_residuals`	`bool`	If `True`, in-sample residuals will be stored in the forecaster object after fitting (`in_sample_residuals_` and `in_sample_residuals_by_bin_` attributes). If `False`, only the intervals of the bins are stored.	`False`
`random_state`	`int`	Set a seed for the random generator so that the stored sample residuals are always deterministic.	`123`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the training process. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`

Returns:

Type	Description
`None`

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def fit(
    self,
    series: pd.DataFrame,
    exog: pd.Series | pd.DataFrame | None = None,
    store_last_window: bool = True,
    store_in_sample_residuals: bool = False,
    random_state: int = 123,
    suppress_warnings: bool = False
) -> None:
    """
    Training Forecaster.

    Additional arguments to be passed to the `fit` method of the estimator 
    can be added with the `fit_kwargs` argument when initializing the forecaster.

    Parameters
    ----------
    series : pandas DataFrame
        Training time series.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `series` and their indexes must be aligned so
        that series[i] is regressed on exog[i].
    store_last_window : bool, default True
        Whether or not to store the last window (`last_window_`) of training data.
    store_in_sample_residuals : bool, default False
        If `True`, in-sample residuals will be stored in the forecaster object
        after fitting (`in_sample_residuals_` and `in_sample_residuals_by_bin_`
        attributes).
        If `False`, only the intervals of the bins are stored.
    random_state : int, default 123
        Set a seed for the random generator so that the stored sample 
        residuals are always deterministic.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the training 
        process. See skforecast.exceptions.warn_skforecast_categories for more
        information.

    Returns
    -------
    None

    """

    set_skforecast_warnings(suppress_warnings, action='ignore')

    # Reset values in case the forecaster has already been fitted.
    self.lags_                              = None
    self.last_window_                       = None
    self.index_type_                        = None
    self.index_freq_                        = None
    self.training_range_                    = None
    self.series_names_in_                   = None
    self.exog_in_                           = False
    self.exog_names_in_                     = None
    self.exog_type_in_                      = None
    self.exog_dtypes_in_                    = None
    self.exog_dtypes_out_                   = None
    self.X_train_series_names_in_           = None
    self.X_train_window_features_names_out_ = None
    self.X_train_exog_names_out_            = None
    self.X_train_direct_exog_names_out_     = None
    self.X_train_features_names_out_        = None
    self.in_sample_residuals_               = None
    self.in_sample_residuals_by_bin_        = None
    self.binner                             = {}
    self.binner_intervals_                  = {}
    self.is_fitted                          = False
    self.fit_date                           = None

    (
        X_train,
        y_train,
        series_names_in_,
        X_train_series_names_in_,
        exog_names_in_,
        X_train_exog_names_out_,
        X_train_features_names_out_,
        exog_dtypes_in_,
        exog_dtypes_out_
    ) = self._create_train_X_y(series=series, exog=exog)

    def fit_forecaster(estimator, X_train, y_train, step):
        """
        Auxiliary function to fit each of the forecaster's estimators in parallel.

        Parameters
        ----------
        estimator : object
            Estimator to be fitted.
        X_train : pandas DataFrame
            Dataframe created with the `_create_train_X_y` method, first return.
        y_train : dict
            Dict created with the `_create_train_X_y` method, second return.
        step : int
            Step of the forecaster to be fitted.

        Returns
        -------
        Tuple with the step, fitted estimator, in-sample residuals, true values
        and predicted values for the step.

        """

        X_train_step, y_train_step = self.filter_train_X_y_for_step(
                                         step          = step,
                                         X_train       = X_train,
                                         y_train       = y_train,
                                         remove_suffix = True
                                     )
        sample_weight = self.create_sample_weights(X_train=X_train_step)
        if sample_weight is not None:
            estimator.fit(
                X             = X_train_step,
                y             = y_train_step,
                sample_weight = sample_weight,
                **self.fit_kwargs
            )
        else:
            estimator.fit(
                X = X_train_step,
                y = y_train_step,
                **self.fit_kwargs
            )

        # NOTE: This is done to save time during fit in functions such as backtesting()
        y_true_step = None
        y_pred_step = None
        if self._probabilistic_mode is not False:
            y_true_step = y_train_step.to_numpy()
            y_pred_step = estimator.predict(X_train_step)

        return step, estimator, y_true_step, y_pred_step

    results_fit = (
        Parallel(n_jobs=self.n_jobs)
        (delayed(fit_forecaster)
        (
            estimator = copy(self.estimator),
            X_train   = X_train,
            y_train   = y_train,
            step      = step
        )
        for step in self.steps)
    )

    self.estimators_ = {step: estimator for step, estimator, *_ in results_fit}

    self.in_sample_residuals_ = {}
    self.in_sample_residuals_by_bin_ = {}
    if self._probabilistic_mode is not False:
        for level in [self.level]:
            y_true_level, y_pred_level = zip(
                *[(y_true, y_pred) for *_, y_true, y_pred in results_fit]
            )
            self._binning_in_sample_residuals(
                level                     = level,
                y_true                    = np.concatenate(y_true_level),
                y_pred                    = np.concatenate(y_pred_level),
                store_in_sample_residuals = store_in_sample_residuals,
                random_state              = random_state
            )

    if not store_in_sample_residuals:
        for level in [self.level]:
            self.in_sample_residuals_[level] = None
            self.in_sample_residuals_by_bin_[level] = None

    self.series_names_in_ = series_names_in_
    self.X_train_series_names_in_ = X_train_series_names_in_
    self.X_train_features_names_out_ = X_train_features_names_out_

    self.is_fitted = True
    self.fit_date = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
    self.training_range_ = series.index[[0, -1]]
    self.index_type_ = type(series.index)
    if isinstance(series.index, pd.DatetimeIndex):
        self.index_freq_ = series.index.freq
    else: 
        self.index_freq_ = series.index.step

    if exog is not None:
        self.exog_in_ = True
        self.exog_names_in_ = exog_names_in_
        self.exog_type_in_ = type(exog)
        self.exog_dtypes_in_ = exog_dtypes_in_
        self.exog_dtypes_out_ = exog_dtypes_out_
        self.X_train_exog_names_out_ = X_train_exog_names_out_

    if store_last_window:
        self.last_window_ = series.iloc[-self.window_size:, ][
            self.X_train_series_names_in_
        ].copy()

    set_skforecast_warnings(suppress_warnings, action='default')

_binning_in_sample_residuals ¶


_binning_in_sample_residuals(
    level,
    y_true,
    y_pred,
    store_in_sample_residuals=False,
    random_state=123,
)

Bin residuals according to the predicted value each residual is associated with. First a skforecast.preprocessing.QuantileBinner object is fitted to the predicted values. Then, residuals are binned according to the predicted value each residual is associated with. Residuals are stored in the forecaster object as in_sample_residuals_ and in_sample_residuals_by_bin_.

y_true and y_pred assumed to be differentiated and/or transformed according to the attributes differentiation and transformer_series. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_. The total number of residuals stored is 10_000. New in version 0.15.0

Parameters:

Name	Type	Description	Default
`level`	`str`	Name of the series (level) to store the residuals.	required
`y_true`	`numpy ndarray`	True values of the time series.	required
`y_pred`	`numpy ndarray`	Predicted values of the time series.	required
`store_in_sample_residuals`	`bool`	If `True`, in-sample residuals will be stored in the forecaster object after fitting (`in_sample_residuals_` and `in_sample_residuals_by_bin_` attributes). If `False`, only the intervals of the bins are stored.	`False`
`random_state`	`int`	Set a seed for the random generator so that the stored sample residuals are always deterministic.	`123`

Returns:

Type	Description
`None`

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def _binning_in_sample_residuals(
    self,
    level: str,
    y_true: np.ndarray,
    y_pred: np.ndarray,
    store_in_sample_residuals: bool = False,
    random_state: int = 123
) -> None:
    """
    Bin residuals according to the predicted value each residual is
    associated with. First a `skforecast.preprocessing.QuantileBinner` object
    is fitted to the predicted values. Then, residuals are binned according
    to the predicted value each residual is associated with. Residuals are
    stored in the forecaster object as `in_sample_residuals_` and
    `in_sample_residuals_by_bin_`.

    `y_true` and `y_pred` assumed to be differentiated and/or transformed
    according to the attributes `differentiation` and `transformer_series`.
    The number of residuals stored per bin is limited to 
    `10_000 // self.binner.n_bins_`. The total number of residuals stored is
    `10_000`.
    **New in version 0.15.0**

    Parameters
    ----------
    level : str
        Name of the series (level) to store the residuals.
    y_true : numpy ndarray
        True values of the time series.
    y_pred : numpy ndarray
        Predicted values of the time series.
    store_in_sample_residuals : bool, default False
        If `True`, in-sample residuals will be stored in the forecaster object
        after fitting (`in_sample_residuals_` and `in_sample_residuals_by_bin_`
        attributes).
        If `False`, only the intervals of the bins are stored.
    random_state : int, default 123
        Set a seed for the random generator so that the stored sample 
        residuals are always deterministic.

    Returns
    -------
    None

    """

    y_true = np.asarray(y_true)
    y_pred = np.asarray(y_pred)
    residuals = y_true - y_pred

    if self._probabilistic_mode == "binned":
        data = pd.DataFrame({'prediction': y_pred, 'residuals': residuals})
        self.binner[level] = QuantileBinner(**self.binner_kwargs)
        self.binner[level].fit(y_pred)
        self.binner_intervals_[level] = self.binner[level].intervals_

    if store_in_sample_residuals:
        rng = np.random.default_rng(seed=random_state)
        if self._probabilistic_mode == "binned":
            data['bin'] = self.binner[level].transform(y_pred).astype(int)
            self.in_sample_residuals_by_bin_[level] = (
                data.groupby('bin')['residuals'].apply(np.array).to_dict()
            )

            max_sample = 10_000 // self.binner[level].n_bins_
            for k, v in self.in_sample_residuals_by_bin_[level].items():
                if len(v) > max_sample:
                    sample = v[rng.integers(low=0, high=len(v), size=max_sample)]
                    self.in_sample_residuals_by_bin_[level][k] = sample
        else:
            self.in_sample_residuals_by_bin_[level] = None

        if len(residuals) > 10_000:
            residuals = residuals[
                rng.integers(low=0, high=len(residuals), size=10_000)
            ]
        self.in_sample_residuals_[level] = residuals

_create_predict_inputs ¶


_create_predict_inputs(
    steps=None,
    last_window=None,
    exog=None,
    predict_probabilistic=False,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    check_inputs=True,
)

Create the inputs needed for the prediction process.

Parameters:

Name	Type	Description	Default
`steps`	`(int, list, None)`	Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. If `int`: Only steps within the range of 1 to int are predicted. If `list`: List of ints. Only the steps contained in the list are predicted. If `None`: As many steps are predicted as were defined at initialization.	`None`
`last_window`	`pandas Series, pandas DataFrame`	Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`predict_probabilistic`	`bool`	If `True`, the necessary checks for probabilistic predictions will be performed.	`False`
`use_in_sample_residuals`	`bool`	If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method.	`True`
`use_binned_residuals`	`bool`	If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly.	`True`
`check_inputs`	`bool`	If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed.	`True`

Returns:

Name	Type	Description
`Xs`	`list`	List of numpy arrays with the predictors for each step.
`Xs_col_names`	`list`	Names of the columns of the matrix created internally for prediction.
`steps`	`list`	Steps to predict.
`prediction_index`	`pandas Index`	Index of the predictions.

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def _create_predict_inputs(
    self,
    steps: int | list[int] | None = None,
    last_window: pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    predict_probabilistic: bool = False,
    use_in_sample_residuals: bool = True,
    use_binned_residuals: bool = True,
    check_inputs: bool = True
) -> tuple[list[np.ndarray], list[str], list[int], pd.Index]:
    """
    Create the inputs needed for the prediction process.

    Parameters
    ----------
    steps : int, list, None, default None
        Predict n steps. The value of `steps` must be less than or equal to the 
        value of steps defined when initializing the forecaster. Starts at 1.

        - If `int`: Only steps within the range of 1 to int are predicted.
        - If `list`: List of ints. Only the steps contained in the list 
        are predicted.
        - If `None`: As many steps are predicted as were defined at 
        initialization.
    last_window : pandas Series, pandas DataFrame, default None
        Series values used to create the predictors (lags) needed to 
        predict `steps`.
        If `last_window = None`, the values stored in `self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    predict_probabilistic : bool, default False
        If `True`, the necessary checks for probabilistic predictions will be 
        performed.
    use_in_sample_residuals : bool, default True
        If `True`, residuals from the training data are used as proxy of
        prediction error to create predictions. 
        If `False`, out of sample residuals (calibration) are used. 
        Out-of-sample residuals must be precomputed using Forecaster's
        `set_out_sample_residuals()` method.
    use_binned_residuals : bool, default True
        If `True`, residuals are selected based on the predicted values 
        (binned selection).
        If `False`, residuals are selected randomly.
    check_inputs : bool, default True
        If `True`, the input is checked for possible warnings and errors 
        with the `check_predict_input` function. This argument is created 
        for internal use and is not recommended to be changed.

    Returns
    -------
    Xs : list
        List of numpy arrays with the predictors for each step.
    Xs_col_names : list
        Names of the columns of the matrix created internally for prediction.
    steps : list
        Steps to predict.
    prediction_index : pandas Index
        Index of the predictions.

    """

    steps = prepare_steps_direct(
                steps    = steps,
                max_step = self.max_step
            )

    if last_window is None:
        last_window = self.last_window_

    if check_inputs:
        check_predict_input(
            forecaster_name  = type(self).__name__,
            steps            = steps,
            is_fitted        = self.is_fitted,
            exog_in_         = self.exog_in_,
            index_type_      = self.index_type_,
            index_freq_      = self.index_freq_,
            window_size      = self.window_size,
            last_window      = last_window,
            exog             = exog,
            exog_names_in_   = self.exog_names_in_,
            interval         = None,
            max_step         = self.max_step,
            series_names_in_ = self.X_train_series_names_in_
        )

        if predict_probabilistic:
            check_residuals_input(
                forecaster_name              = type(self).__name__,
                use_in_sample_residuals      = use_in_sample_residuals,
                in_sample_residuals_         = self.in_sample_residuals_,
                out_sample_residuals_        = self.out_sample_residuals_,
                use_binned_residuals         = use_binned_residuals,
                in_sample_residuals_by_bin_  = self.in_sample_residuals_by_bin_,
                out_sample_residuals_by_bin_ = self.out_sample_residuals_by_bin_,
                levels                       = [self.level],
            )

    last_window = last_window.iloc[
        -self.window_size:, last_window.columns.get_indexer(self.X_train_series_names_in_)
    ].copy()

    X_autoreg = []
    Xs_col_names = []
    for series in self.X_train_series_names_in_:
        last_window_series = transform_numpy(
                                 array             = last_window[series].to_numpy(),
                                 transformer       = self.transformer_series_[series],
                                 fit               = False,
                                 inverse_transform = False
                             )

        if self.differentiation is not None:
            last_window_series = self.differentiator_[series].fit_transform(last_window_series)

        if self.lags is not None:
            X_lags = last_window_series[-self.lags_[series]]
            X_autoreg.append(X_lags)
            Xs_col_names.extend(self.lags_names[series])

        if self.window_features is not None:
            n_diff = 0 if self.differentiation is None else self.differentiation
            X_window_features = np.concatenate(
                [
                    wf.transform(last_window_series[n_diff:]) 
                    for wf in self.window_features
                ]
            )
            X_autoreg.append(X_window_features)
            # HACK: This is not the best way to do it. Can have any problem
            # if the window_features are not in the same order as the
            # self.window_features_names.
            Xs_col_names.extend([f"{series}_{wf}" for wf in self.window_features_names])

    X_autoreg = np.concatenate(X_autoreg).reshape(1, -1)
    if exog is not None:

        exog = input_to_frame(data=exog, input_name='exog')
        if exog.columns.tolist() != self.exog_names_in_:
            exog = exog[self.exog_names_in_]

        exog = transform_dataframe(
                   df                = exog,
                   transformer       = self.transformer_exog,
                   fit               = False,
                   inverse_transform = False
               )

        # NOTE: Only check dtypes if they are not the same as seen in training
        if not exog.dtypes.to_dict() == self.exog_dtypes_out_:
            check_exog_dtypes(exog=exog)
        else:
            check_exog(exog=exog, allow_nan=False)

        exog_values, _ = exog_to_direct_numpy(
                             exog  = exog.to_numpy()[:max(steps)],
                             steps = max(steps)
                         )
        exog_values = exog_values[0]

        n_exog = exog.shape[1]
        Xs = [
            np.concatenate(
                [
                    X_autoreg, 
                    exog_values[(step - 1) * n_exog : step * n_exog].reshape(1, -1)
                ],
                axis=1
            )
            for step in steps
        ]
        # HACK: This is not the best way to do it. Can have any problem
        # if the exog_columns are not in the same order as the
        # self.window_features_names.
        Xs_col_names = Xs_col_names + exog.columns.to_list()
    else:
        Xs = [X_autoreg] * len(steps)

    prediction_index = expand_index(
                           index = last_window.index,
                           steps = max(steps)
                       )[np.array(steps) - 1]
    if isinstance(last_window.index, pd.DatetimeIndex) and np.array_equal(
        steps, np.arange(min(steps), max(steps) + 1)
    ):
        prediction_index.freq = last_window.index.freq

    # HACK: Why no use self.X_train_features_names_out_ as Xs_col_names?
    return Xs, Xs_col_names, steps, prediction_index

create_predict_X ¶


create_predict_X(
    steps=None,
    last_window=None,
    exog=None,
    suppress_warnings=False,
    check_inputs=True,
    levels=None,
)

Create the predictors needed to predict steps ahead.

Parameters:

Name	Type	Description	Default
`steps`	`(int, list, None)`	Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. If `int`: Only steps within the range of 1 to int are predicted. If `list`: List of ints. Only the steps contained in the list are predicted. If `None`: As many steps are predicted as were defined at initialization.	`None`
`last_window`	`pandas DataFrame`	Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`
`check_inputs`	`bool`	If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed.	`True`
`levels`	`Ignored`	Not used, present here for API consistency by convention.	`None`

Returns:

Name	Type	Description
`X_predict`	`pandas DataFrame`	Pandas DataFrame with the predictors for each step. The index is the same as the prediction index.

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def create_predict_X(
    self,
    steps: int | list[int] | None = None,
    last_window: pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    suppress_warnings: bool = False,
    check_inputs: bool = True,
    levels: Any = None
) -> pd.DataFrame:
    """
    Create the predictors needed to predict `steps` ahead.

    Parameters
    ----------
    steps : int, list, None, default None
        Predict n steps. The value of `steps` must be less than or equal to the 
        value of steps defined when initializing the forecaster. Starts at 1.

        - If `int`: Only steps within the range of 1 to int are predicted.
        - If `list`: List of ints. Only the steps contained in the list 
        are predicted.
        - If `None`: As many steps are predicted as were defined at 
        initialization.
    last_window : pandas DataFrame, default None
        Series values used to create the predictors (lags) needed to 
        predict `steps`.
        If `last_window = None`, the values stored in `self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the prediction 
        process. See skforecast.exceptions.warn_skforecast_categories for more
        information.
    check_inputs : bool, default True
        If `True`, the input is checked for possible warnings and errors 
        with the `check_predict_input` function. This argument is created 
        for internal use and is not recommended to be changed.
    levels : Ignored
        Not used, present here for API consistency by convention.

    Returns
    -------
    X_predict : pandas DataFrame
        Pandas DataFrame with the predictors for each step. The index 
        is the same as the prediction index.

    """

    set_skforecast_warnings(suppress_warnings, action='ignore')

    (
        Xs,
        Xs_col_names,
        steps,
        prediction_index
    ) = self._create_predict_inputs(
            steps        = steps,
            last_window  = last_window,
            exog         = exog,
            check_inputs = check_inputs
        )

    X_predict = pd.DataFrame(
                    data    = np.concatenate(Xs, axis=0), 
                    columns = Xs_col_names, 
                    index   = prediction_index
                )
    X_predict.insert(0, 'level', np.tile([self.level], len(steps)))

    if self.exog_in_:
        categorical_features = any(
            not pd.api.types.is_numeric_dtype(dtype) or pd.api.types.is_bool_dtype(dtype) 
            for dtype in set(self.exog_dtypes_out_.values())
        )
        if categorical_features:
            X_predict = X_predict.astype(self.exog_dtypes_out_)

    if self.transformer_series is not None or self.differentiation is not None:
        warnings.warn(
            "The output matrix is in the transformed scale due to the "
            "inclusion of transformations or differentiation in the Forecaster. "
            "As a result, any predictions generated using this matrix will also "
            "be in the transformed scale. Please refer to the documentation "
            "for more details: "
            "https://skforecast.org/latest/user_guides/training-and-prediction-matrices.html",
            DataTransformationWarning
        )

    set_skforecast_warnings(suppress_warnings, action='default')

    return X_predict

_direct_predict ¶


_direct_predict(steps, Xs)

Generate predictions for the specified steps using the fitted estimators.

This method optimizes prediction for common estimator types: - LinearModel: Uses direct dot product with coefficients - LGBMRegressor: Uses booster_.predict for faster inference - XGBRegressor: Uses get_booster().inplace_predict for faster inference - Other estimators: Uses standard predict method

Parameters:

Name	Type	Description	Default
`steps`	`list[int]`	List of steps to predict. Each step corresponds to an estimator in `self.estimators_`.	required
`Xs`	`list[ndarray]`	List of numpy arrays with the predictors for each step. Each array has shape (1, n_features).	required

Returns:

Name	Type	Description
`predictions`	`numpy ndarray`	Predicted values for each step. Shape: (len(steps),)

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def _direct_predict(
    self,
    steps: list[int],
    Xs: list[np.ndarray]
) -> np.ndarray:
    """
    Generate predictions for the specified steps using the fitted estimators.

    This method optimizes prediction for common estimator types:
    - LinearModel: Uses direct dot product with coefficients
    - LGBMRegressor: Uses booster_.predict for faster inference
    - XGBRegressor: Uses get_booster().inplace_predict for faster inference
    - Other estimators: Uses standard predict method

    Parameters
    ----------
    steps : list[int]
        List of steps to predict. Each step corresponds to an estimator
        in `self.estimators_`.
    Xs : list[np.ndarray]
        List of numpy arrays with the predictors for each step.
        Each array has shape (1, n_features).

    Returns
    -------
    predictions : numpy ndarray
        Predicted values for each step. Shape: (len(steps),)

    """

    estimators = [self.estimators_[step] for step in steps]

    estimator_name = type(self.estimator).__name__
    is_linear = isinstance(self.estimator, LinearModel)
    is_lightgbm = estimator_name == 'LGBMRegressor'
    is_xgboost = estimator_name == 'XGBRegressor'

    with warnings.catch_warnings():
        warnings.filterwarnings(
            "ignore", 
            message="X does not have valid feature names", 
            category=UserWarning
        )
        if is_linear:
            predictions = np.array([
                np.dot(X.ravel(), estimator.coef_) + estimator.intercept_
                for estimator, X in zip(estimators, Xs)
            ])
        elif is_lightgbm:
            predictions = np.array([
                estimator.booster_.predict(X).item()
                for estimator, X in zip(estimators, Xs)
            ])
        elif is_xgboost:
            predictions = np.array([
                estimator.get_booster().inplace_predict(X).item()
                for estimator, X in zip(estimators, Xs)
            ])
        else:
            predictions = np.array([
                estimator.predict(X).ravel().item()
                for estimator, X in zip(estimators, Xs)
            ])

    return predictions

predict ¶


predict(
    steps=None,
    last_window=None,
    exog=None,
    suppress_warnings=False,
    check_inputs=True,
    levels=None,
)

Predict n steps ahead

Parameters:

Name	Type	Description	Default
`steps`	`(int, list, None)`	Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. If `int`: Only steps within the range of 1 to int are predicted. If `list`: List of ints. Only the steps contained in the list are predicted. If `None`: As many steps are predicted as were defined at initialization.	`None`
`last_window`	`pandas DataFrame`	Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in `self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`
`check_inputs`	`bool`	If `True`, the input is checked for possible warnings and errors with the `check_predict_input` function. This argument is created for internal use and is not recommended to be changed.	`True`
`levels`	`Ignored`	Not used, present here for API consistency by convention.	`None`

Returns:

Name	Type	Description
`predictions`	`pandas DataFrame`	Long-format DataFrame with the predictions. The columns are `level` and `pred`.

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def predict(
    self,
    steps: int | list[int] | None = None,
    last_window: pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    suppress_warnings: bool = False,
    check_inputs: bool = True,
    levels: Any = None
) -> pd.DataFrame:
    """
    Predict n steps ahead

    Parameters
    ----------
    steps : int, list, None, default None
        Predict n steps. The value of `steps` must be less than or equal to the 
        value of steps defined when initializing the forecaster. Starts at 1.

        - If `int`: Only steps within the range of 1 to int are predicted.
        - If `list`: List of ints. Only the steps contained in the list 
        are predicted.
        - If `None`: As many steps are predicted as were defined at 
        initialization.
    last_window : pandas DataFrame, default None
        Series values used to create the predictors (lags) needed to 
        predict `steps`.
        If `last_window = None`, the values stored in `self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the prediction 
        process. See skforecast.exceptions.warn_skforecast_categories for more
        information.
    check_inputs : bool, default True
        If `True`, the input is checked for possible warnings and errors 
        with the `check_predict_input` function. This argument is created 
        for internal use and is not recommended to be changed.
    levels : Ignored
        Not used, present here for API consistency by convention.

    Returns
    -------
    predictions : pandas DataFrame
        Long-format DataFrame with the predictions. The columns are `level`
        and `pred`.

    """

    set_skforecast_warnings(suppress_warnings, action='ignore')

    (
        Xs,
        _,
        steps,
        prediction_index
    ) = self._create_predict_inputs(
            steps        = steps,
            last_window  = last_window,
            exog         = exog,
            check_inputs = check_inputs,
        )

    predictions = self._direct_predict(steps=steps, Xs=Xs)

    if self.differentiation is not None:
        predictions = (
            self.differentiator_[self.level]
            .inverse_transform_next_window(predictions)
        )

    predictions = transform_numpy(
                      array             = predictions,
                      transformer       = self.transformer_series_[self.level],
                      fit               = False,
                      inverse_transform = True
                  )

    # TODO: This DataFrame has freq because it only contain 1 level
    # TODO: Adapt to multiple levels
    # n_steps, n_levels = predictions.shape
    # predictions = pd.DataFrame(
    #     {"level": np.tile(levels, n_steps), "pred": predictions.ravel()},
    #     index = np.repeat(prediction_index, n_levels),
    # )
    predictions = pd.DataFrame(
        {"level": np.tile([self.level], len(steps)), "pred": predictions},
        index = prediction_index,
    )

    set_skforecast_warnings(suppress_warnings, action='default')

    return predictions

predict_bootstrapping ¶


predict_bootstrapping(
    steps=None,
    last_window=None,
    exog=None,
    n_boot=250,
    random_state=123,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    suppress_warnings=False,
    levels=None,
)

Generate multiple forecasting predictions using a bootstrapping process. By sampling from a collection of past observed errors (the residuals), each iteration of bootstrapping generates a different set of predictions. See the References section for more information.

Parameters:

Name	Type	Description	Default
`steps`	`(int, list, None)`	Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. If `int`: Only steps within the range of 1 to int are predicted. If `list`: List of ints. Only the steps contained in the list are predicted. If `None`: As many steps are predicted as were defined at initialization.	`None`
`last_window`	`pandas DataFrame`	Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in`self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`n_boot`	`int`	Number of bootstrapping iterations to perform when estimating prediction intervals.	`250`
`use_in_sample_residuals`	`bool`	If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method.	`True`
`use_binned_residuals`	`bool`	If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly.	`True`
`random_state`	`int`	Seed for the random number generator to ensure reproducibility.	`123`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`
`levels`	`Ignored`	Not used, present here for API consistency by convention.	`None`

Returns:

Name	Type	Description
`boot_predictions`	`pandas DataFrame`	Long-format DataFrame with the bootstrapping predictions. The columns are `level`, `pred_boot_0`, `pred_boot_1`, ..., `pred_boot_n_boot`.

References

.. [1] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def predict_bootstrapping(
    self,
    steps: int | list[int] | None = None,
    last_window: pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    n_boot: int = 250,
    random_state: int = 123,
    use_in_sample_residuals: bool = True,
    use_binned_residuals: bool = True,
    suppress_warnings: bool = False,
    levels: Any = None
) -> pd.DataFrame:
    """
    Generate multiple forecasting predictions using a bootstrapping process. 
    By sampling from a collection of past observed errors (the residuals),
    each iteration of bootstrapping generates a different set of predictions. 
    See the References section for more information. 

    Parameters
    ----------
    steps : int, list, None, default None
        Predict n steps. The value of `steps` must be less than or equal to the 
        value of steps defined when initializing the forecaster. Starts at 1.

        - If `int`: Only steps within the range of 1 to int are predicted.
        - If `list`: List of ints. Only the steps contained in the list 
        are predicted.
        - If `None`: As many steps are predicted as were defined at 
        initialization.
    last_window : pandas DataFrame, default None
        Series values used to create the predictors (lags) needed to 
        predict `steps`.
        If `last_window = None`, the values stored in` self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.     
    n_boot : int, default 250
        Number of bootstrapping iterations to perform when estimating prediction
        intervals.            
    use_in_sample_residuals : bool, default True
        If `True`, residuals from the training data are used as proxy of
        prediction error to create predictions. 
        If `False`, out of sample residuals (calibration) are used. 
        Out-of-sample residuals must be precomputed using Forecaster's
        `set_out_sample_residuals()` method.
    use_binned_residuals : bool, default True
        If `True`, residuals are selected based on the predicted values 
        (binned selection).
        If `False`, residuals are selected randomly.
    random_state : int, default 123
        Seed for the random number generator to ensure reproducibility.   
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the prediction 
        process. See skforecast.exceptions.warn_skforecast_categories for more
        information.
    levels : Ignored
        Not used, present here for API consistency by convention.

    Returns
    -------
    boot_predictions : pandas DataFrame
        Long-format DataFrame with the bootstrapping predictions. The columns
        are `level`, `pred_boot_0`, `pred_boot_1`, ..., `pred_boot_n_boot`.

    References
    ----------
    .. [1] MAPIE - Model Agnostic Prediction Interval Estimator.
           https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method

    """

    set_skforecast_warnings(suppress_warnings, action='ignore')

    (
        Xs,
        _,
        steps,
        prediction_index
    ) = self._create_predict_inputs(
            steps                   = steps, 
            last_window             = last_window, 
            exog                    = exog,
            predict_probabilistic   = True, 
            use_in_sample_residuals = use_in_sample_residuals,
            use_binned_residuals    = use_binned_residuals
        )

    if use_in_sample_residuals:
        residuals = self.in_sample_residuals_[self.level]
        residuals_by_bin = self.in_sample_residuals_by_bin_[self.level]
    else:
        residuals = self.out_sample_residuals_[self.level]
        residuals_by_bin = self.out_sample_residuals_by_bin_[self.level]

    # NOTE: Predictors and residuals are transformed and differentiated
    predictions = self._direct_predict(steps=steps, Xs=Xs)

    rng = np.random.default_rng(seed=random_state)
    if not use_binned_residuals:
        sampled_residuals = residuals[
            rng.integers(low=0, high=residuals.size, size=(len(steps), n_boot))
        ]
    else:
        predicted_bins = self.binner[self.level].transform(predictions)
        sampled_residuals = np.full(
                                shape      = (predicted_bins.size, n_boot),
                                fill_value = np.nan,
                                order      = 'C',
                                dtype      = float
                            )
        for i, bin in enumerate(predicted_bins):
            sampled_residuals[i, :] = residuals_by_bin[bin][
                rng.integers(low=0, high=residuals_by_bin[bin].size, size=n_boot)
            ]

    boot_predictions = np.tile(predictions, (n_boot, 1)).T
    boot_columns = [f"pred_boot_{i}" for i in range(n_boot)]
    boot_predictions = boot_predictions + sampled_residuals

    if self.differentiation is not None:
        boot_predictions = (
            self.differentiator_[self.level]
            .inverse_transform_next_window(boot_predictions)
        )

    if self.transformer_series_[self.level]:
        boot_predictions = transform_numpy(
                               array             = boot_predictions,
                               transformer       = self.transformer_series_[self.level],
                               fit               = False,
                               inverse_transform = True
                           )

    # TODO: This DataFrame has freq because it only contain 1 level
    # TODO: Adapt to multiple levels
    boot_predictions = pd.DataFrame(
                           data    = boot_predictions,
                           index   = prediction_index,
                           columns = boot_columns
                       )
    boot_predictions.insert(0, 'level', np.tile([self.level], len(steps)))

    set_skforecast_warnings(suppress_warnings, action='default')

    return boot_predictions

_predict_interval_conformal ¶


_predict_interval_conformal(
    steps=None,
    last_window=None,
    exog=None,
    nominal_coverage=0.95,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
)

Generate prediction intervals using the conformal prediction split method [1]_.

Parameters:

Name	Type	Description	Default
`steps`	`(int, list, None)`	Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. If `int`: Only steps within the range of 1 to int are predicted. If `list`: List of ints. Only the steps contained in the list are predicted. If `None`: As many steps are predicted as were defined at initialization.	`None`
`last_window`	`pandas DataFrame`	Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in`self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`nominal_coverage`	`float`	Nominal coverage, also known as expected coverage, of the prediction intervals. Must be between 0 and 1.	`0.95`
`use_in_sample_residuals`	`bool`	If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method.	`True`
`use_binned_residuals`	`bool`	If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly.	`True`

Returns:

Name	Type	Description
`predictions`	`pandas DataFrame`	Values predicted by the forecaster and their estimated interval. pred: predictions. lower_bound: lower bound of the interval. upper_bound: upper bound of the interval.

References

.. [1] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def _predict_interval_conformal(
    self,
    steps: int | list[int] | None = None,
    last_window: pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    nominal_coverage: float = 0.95,
    use_in_sample_residuals: bool = True,
    use_binned_residuals: bool = True
) -> pd.DataFrame:
    """
    Generate prediction intervals using the conformal prediction 
    split method [1]_.

    Parameters
    ----------
    steps : int, list, None, default None
        Predict n steps. The value of `steps` must be less than or equal to the 
        value of steps defined when initializing the forecaster. Starts at 1.

        - If `int`: Only steps within the range of 1 to int are predicted.
        - If `list`: List of ints. Only the steps contained in the list 
        are predicted.
        - If `None`: As many steps are predicted as were defined at 
        initialization.
    last_window : pandas DataFrame, default None
        Series values used to create the predictors (lags) needed to 
        predict `steps`.
        If `last_window = None`, the values stored in` self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    nominal_coverage : float, default 0.95
        Nominal coverage, also known as expected coverage, of the prediction
        intervals. Must be between 0 and 1.
    use_in_sample_residuals : bool, default True
        If `True`, residuals from the training data are used as proxy of
        prediction error to create predictions. 
        If `False`, out of sample residuals (calibration) are used. 
        Out-of-sample residuals must be precomputed using Forecaster's
        `set_out_sample_residuals()` method.
    use_binned_residuals : bool, default True
        If `True`, residuals are selected based on the predicted values 
        (binned selection).
        If `False`, residuals are selected randomly.

    Returns
    -------
    predictions : pandas DataFrame
        Values predicted by the forecaster and their estimated interval.

        - pred: predictions.
        - lower_bound: lower bound of the interval.
        - upper_bound: upper bound of the interval.

    References
    ----------
    .. [1] MAPIE - Model Agnostic Prediction Interval Estimator.
           https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method

    """

    (
        Xs,
        _,
        steps,
        prediction_index
    ) = self._create_predict_inputs(
            steps                   = steps, 
            last_window             = last_window, 
            exog                    = exog,
            predict_probabilistic   = True, 
            use_in_sample_residuals = use_in_sample_residuals,
            use_binned_residuals    = use_binned_residuals
        )

    if use_in_sample_residuals:
        residuals = self.in_sample_residuals_[self.level]
        residuals_by_bin = self.in_sample_residuals_by_bin_[self.level]
    else:
        residuals = self.out_sample_residuals_[self.level]
        residuals_by_bin = self.out_sample_residuals_by_bin_[self.level]

    # NOTE: Predictors and residuals are transformed and differentiated
    predictions = self._direct_predict(steps=steps, Xs=Xs)

    if use_binned_residuals:
        correction_factor_by_bin = {
            k: np.quantile(np.abs(v), nominal_coverage)
            for k, v in residuals_by_bin.items()
        }
        replace_func = np.vectorize(lambda x: correction_factor_by_bin[x])
        predictions_bin = self.binner[self.level].transform(predictions)
        correction_factor = replace_func(predictions_bin)
    else:
        correction_factor = np.quantile(np.abs(residuals), nominal_coverage)

    lower_bound = predictions - correction_factor
    upper_bound = predictions + correction_factor
    predictions = np.column_stack([predictions, lower_bound, upper_bound])

    if self.differentiation is not None:
        predictions = (
            self.differentiator_[self.level]
            .inverse_transform_next_window(predictions)
        )

    if self.transformer_series_[self.level]:
        predictions = transform_numpy(
                          array             = predictions,
                          transformer       = self.transformer_series_[self.level],
                          fit               = False,
                          inverse_transform = True
                      )

    predictions = pd.DataFrame(
                      data    = predictions,
                      index   = prediction_index,
                      columns = ["pred", "lower_bound", "upper_bound"]
                  )
    predictions.insert(0, 'level', np.tile([self.level], len(steps)))

    return predictions

predict_interval ¶


predict_interval(
    steps=None,
    last_window=None,
    exog=None,
    method="conformal",
    interval=[5, 95],
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
    suppress_warnings=False,
    levels=None,
)

Predict n steps ahead and estimate prediction intervals using either bootstrapping or conformal prediction methods. Refer to the References section for additional details on these methods.

Parameters:

Name	Type	Description	Default
`steps`	`(int, list, None)`	Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. If `int`: Only steps within the range of 1 to int are predicted. If `list`: List of ints. Only the steps contained in the list are predicted. If `None`: As many steps are predicted as were defined at initialization.	`None`
`last_window`	`pandas DataFrame`	Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in`self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`method`	`str`	Technique used to estimate prediction intervals. Available options: 'bootstrapping': Bootstrapping is used to generate prediction intervals [1]_. 'conformal': Employs the conformal prediction split method for interval estimation [2]_.	`'conformal'`
`interval`	`(float, list, tuple)`	Confidence level of the prediction interval. Interpretation depends on the method used: If `float`, represents the nominal (expected) coverage (between 0 and 1). For instance, `interval=0.95` corresponds to `[2.5, 97.5]` percentiles. If `list` or `tuple`, defines the exact percentiles to compute, which must be between 0 and 100 inclusive. For example, interval of 95% should be as `interval = [2.5, 97.5]`. When using `method='conformal'`, the interval must be a float or a list/tuple defining a symmetric interval.	`[5, 95]`
`n_boot`	`int`	Number of bootstrapping iterations to perform when estimating prediction intervals.	`250`
`use_in_sample_residuals`	`bool`	If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method.	`True`
`use_binned_residuals`	`bool`	If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly.	`True`
`random_state`	`int`	Seed for the random number generator to ensure reproducibility.	`123`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`
`levels`	`Ignored`	Not used, present here for API consistency by convention.	`None`

Returns:

Name	Type	Description
`predictions`	`pandas DataFrame`	Long-format DataFrame with the predictions and the lower and upper bounds of the estimated interval. The columns are `level`, `pred`, `lower_bound`, `upper_bound`.

References

.. [1] Forecasting: Principles and Practice (3^rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html

.. [2] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def predict_interval(
    self,
    steps: int | list[int] | None = None,
    last_window: pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    method: str = 'conformal',
    interval: float | list[float] | tuple[float] = [5, 95],
    n_boot: int = 250,
    use_in_sample_residuals: bool = True,
    use_binned_residuals: bool = True,
    random_state: int = 123,
    suppress_warnings: bool = False,
    levels: Any = None
) -> pd.DataFrame:
    """
    Predict n steps ahead and estimate prediction intervals using either 
    bootstrapping or conformal prediction methods. Refer to the References 
    section for additional details on these methods.

    Parameters
    ----------
    steps : int, list, None, default None
        Predict n steps. The value of `steps` must be less than or equal to the 
        value of steps defined when initializing the forecaster. Starts at 1.

        - If `int`: Only steps within the range of 1 to int are predicted.
        - If `list`: List of ints. Only the steps contained in the list 
        are predicted.
        - If `None`: As many steps are predicted as were defined at 
        initialization.
    last_window : pandas DataFrame, default None
        Series values used to create the predictors (lags) needed to 
        predict `steps`.
        If `last_window = None`, the values stored in` self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    method : str, default 'conformal'
        Technique used to estimate prediction intervals. Available options:

        - 'bootstrapping': Bootstrapping is used to generate prediction 
        intervals [1]_.
        - 'conformal': Employs the conformal prediction split method for 
        interval estimation [2]_.
    interval : float, list, tuple, default [5, 95]
        Confidence level of the prediction interval. Interpretation depends 
        on the method used:

        - If `float`, represents the nominal (expected) coverage (between 0 
        and 1). For instance, `interval=0.95` corresponds to `[2.5, 97.5]` 
        percentiles.
        - If `list` or `tuple`, defines the exact percentiles to compute, which 
        must be between 0 and 100 inclusive. For example, interval 
        of 95% should be as `interval = [2.5, 97.5]`.
        - When using `method='conformal'`, the interval must be a float or 
        a list/tuple defining a symmetric interval.
    n_boot : int, default 250
        Number of bootstrapping iterations to perform when estimating prediction
        intervals.
    use_in_sample_residuals : bool, default True
        If `True`, residuals from the training data are used as proxy of
        prediction error to create predictions. 
        If `False`, out of sample residuals (calibration) are used. 
        Out-of-sample residuals must be precomputed using Forecaster's
        `set_out_sample_residuals()` method.
    use_binned_residuals : bool, default True
        If `True`, residuals are selected based on the predicted values 
        (binned selection).
        If `False`, residuals are selected randomly.
    random_state : int, default 123
        Seed for the random number generator to ensure reproducibility.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the prediction 
        process. See skforecast.exceptions.warn_skforecast_categories for more
        information.
    levels : Ignored
        Not used, present here for API consistency by convention.

    Returns
    -------
    predictions : pandas DataFrame
        Long-format DataFrame with the predictions and the lower and upper
        bounds of the estimated interval. The columns are `level`, `pred`,
        `lower_bound`, `upper_bound`.

    References
    ----------
    .. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos.
           https://otexts.com/fpp3/prediction-intervals.html

    .. [2] MAPIE - Model Agnostic Prediction Interval Estimator.
           https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method

    """

    set_skforecast_warnings(suppress_warnings, action='ignore')

    if method == "bootstrapping":

        if isinstance(interval, (list, tuple)):
            check_interval(interval=interval, ensure_symmetric_intervals=False)
            interval = np.array(interval) / 100
        else:
            check_interval(alpha=interval, alpha_literal='interval')
            interval = np.array([0.5 - interval / 2, 0.5 + interval / 2])

        boot_predictions = self.predict_bootstrapping(
                               steps                   = steps,
                               last_window             = last_window,
                               exog                    = exog,
                               n_boot                  = n_boot,
                               random_state            = random_state,
                               use_in_sample_residuals = use_in_sample_residuals,
                               use_binned_residuals    = use_binned_residuals
                           )

        predictions = self.predict(
                          steps        = steps,
                          last_window  = last_window,
                          exog         = exog,
                          check_inputs = False
                      )

        boot_predictions[['lower_bound', 'upper_bound']] = (
            boot_predictions.iloc[:, 1:].quantile(q=interval, axis=1).transpose()
        )
        predictions = pd.concat([
            predictions, boot_predictions[['lower_bound', 'upper_bound']]
        ], axis=1)

    elif method == 'conformal':

        if isinstance(interval, (list, tuple)):
            check_interval(interval=interval, ensure_symmetric_intervals=True)
            nominal_coverage = (interval[1] - interval[0]) / 100
        else:
            check_interval(alpha=interval, alpha_literal='interval')
            nominal_coverage = interval

        predictions = self._predict_interval_conformal(
                          steps                   = steps,
                          last_window             = last_window,
                          exog                    = exog,
                          nominal_coverage        = nominal_coverage,
                          use_in_sample_residuals = use_in_sample_residuals,
                          use_binned_residuals    = use_binned_residuals
                      )
    else:
        raise ValueError(
            f"Invalid `method` '{method}'. Choose 'bootstrapping' or 'conformal'."
        )

    set_skforecast_warnings(suppress_warnings, action='default')

    return predictions

predict_quantiles ¶


predict_quantiles(
    steps=None,
    last_window=None,
    exog=None,
    quantiles=[0.05, 0.5, 0.95],
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
    suppress_warnings=False,
    levels=None,
)

Bootstrapping based predicted quantiles.

Parameters:

Name	Type	Description	Default
`steps`	`(int, list, None)`	Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. If `int`: Only steps within the range of 1 to int are predicted. If `list`: List of ints. Only the steps contained in the list are predicted. If `None`: As many steps are predicted as were defined at initialization.	`None`
`last_window`	`pandas DataFrame`	Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in`self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`quantiles`	`(list, tuple)`	Sequence of quantiles to compute, which must be between 0 and 1 inclusive. For example, quantiles of 0.05, 0.5 and 0.95 should be as `quantiles = [0.05, 0.5, 0.95]`.	`[0.05, 0.5, 0.95]`
`n_boot`	`int`	Number of bootstrapping iterations to perform when estimating quantiles.	`250`
`use_in_sample_residuals`	`bool`	If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method.	`True`
`use_binned_residuals`	`bool`	If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly.	`True`
`random_state`	`int`	Seed for the random number generator to ensure reproducibility.	`123`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`
`levels`	`Ignored`	Not used, present here for API consistency by convention.	`None`

Returns:

Name	Type	Description
`predictions`	`pandas DataFrame`	Long-format DataFrame with the quantiles predicted by the forecaster. For example, if `quantiles = [0.05, 0.5, 0.95]`, the columns are `level`, `q_0.05`, `q_0.5`, `q_0.95`.

References

.. [1] Forecasting: Principles and Practice (3^rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def predict_quantiles(
    self,
    steps: int | list[int] | None = None,
    last_window: pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    quantiles: list[float] | tuple[float] = [0.05, 0.5, 0.95],
    n_boot: int = 250,
    use_in_sample_residuals: bool = True,
    use_binned_residuals: bool = True,
    random_state: int = 123,
    suppress_warnings: bool = False,
    levels: Any = None
) -> pd.DataFrame:
    """
    Bootstrapping based predicted quantiles.

    Parameters
    ----------
    steps : int, list, None, default None
        Predict n steps. The value of `steps` must be less than or equal to the 
        value of steps defined when initializing the forecaster. Starts at 1.

        - If `int`: Only steps within the range of 1 to int are predicted.
        - If `list`: List of ints. Only the steps contained in the list 
        are predicted.
        - If `None`: As many steps are predicted as were defined at 
        initialization.
    last_window : pandas DataFrame, default None
        Series values used to create the predictors (lags) needed to 
        predict `steps`.
        If `last_window = None`, the values stored in` self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    quantiles : list, tuple, default [0.05, 0.5, 0.95]
        Sequence of quantiles to compute, which must be between 0 and 1 
        inclusive. For example, quantiles of 0.05, 0.5 and 0.95 should be as 
        `quantiles = [0.05, 0.5, 0.95]`.
    n_boot : int, default 250
        Number of bootstrapping iterations to perform when estimating quantiles.
    use_in_sample_residuals : bool, default True
        If `True`, residuals from the training data are used as proxy of
        prediction error to create predictions. 
        If `False`, out of sample residuals (calibration) are used. 
        Out-of-sample residuals must be precomputed using Forecaster's
        `set_out_sample_residuals()` method.
    use_binned_residuals : bool, default True
        If `True`, residuals are selected based on the predicted values 
        (binned selection).
        If `False`, residuals are selected randomly.
    random_state : int, default 123
        Seed for the random number generator to ensure reproducibility.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the prediction 
        process. See skforecast.exceptions.warn_skforecast_categories for more
        information.
    levels : Ignored
        Not used, present here for API consistency by convention.

    Returns
    -------
    predictions : pandas DataFrame
        Long-format DataFrame with the quantiles predicted by the forecaster.
        For example, if `quantiles = [0.05, 0.5, 0.95]`, the columns are
        `level`, `q_0.05`, `q_0.5`, `q_0.95`.

    References
    ----------
    .. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos.
           https://otexts.com/fpp3/prediction-intervals.html

    """

    set_skforecast_warnings(suppress_warnings, action='ignore')

    check_interval(quantiles=quantiles)

    predictions = self.predict_bootstrapping(
                      steps                   = steps,
                      last_window             = last_window,
                      exog                    = exog,
                      n_boot                  = n_boot,
                      random_state            = random_state,
                      use_in_sample_residuals = use_in_sample_residuals,
                      use_binned_residuals    = use_binned_residuals
                  )

    quantiles_cols = [f'q_{q}' for q in quantiles]
    predictions[quantiles_cols] = (
        predictions.iloc[:, 1:].quantile(q=quantiles, axis=1).transpose()
    )
    predictions = predictions[['level'] + quantiles_cols]

    set_skforecast_warnings(suppress_warnings, action='default')

    return predictions

predict_dist ¶


predict_dist(
    distribution,
    steps=None,
    last_window=None,
    exog=None,
    n_boot=250,
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=123,
    suppress_warnings=False,
    levels=None,
)

Fit a given probability distribution for each step. After generating multiple forecasting predictions through a bootstrapping process, each step is fitted to the given distribution.

Parameters:

Name	Type	Description	Default
`distribution`	`object`	A distribution object from scipy.stats with methods `_pdf` and `fit`. For example scipy.stats.norm.	required
`steps`	`(int, list, None)`	Predict n steps. The value of `steps` must be less than or equal to the value of steps defined when initializing the forecaster. Starts at 1. If `int`: Only steps within the range of 1 to int are predicted. If `list`: List of ints. Only the steps contained in the list are predicted. If `None`: As many steps are predicted as were defined at initialization.	`None`
`last_window`	`pandas DataFrame`	Series values used to create the predictors (lags) needed to predict `steps`. If `last_window = None`, the values stored in`self.last_window_` are used to calculate the initial predictors, and the predictions start right after training data.	`None`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s.	`None`
`n_boot`	`int`	Number of bootstrapping iterations to perform when estimating prediction intervals.	`250`
`use_in_sample_residuals`	`bool`	If `True`, residuals from the training data are used as proxy of prediction error to create predictions. If `False`, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's `set_out_sample_residuals()` method.	`True`
`use_binned_residuals`	`bool`	If `True`, residuals are selected based on the predicted values (binned selection). If `False`, residuals are selected randomly.	`True`
`random_state`	`int`	Seed for the random number generator to ensure reproducibility.	`123`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the prediction process. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`
`levels`	`Ignored`	Not used, present here for API consistency by convention.	`None`

Returns:

Name	Type	Description
`predictions`	`pandas DataFrame`	Long-format DataFrame with the parameters of the fitted distribution for each step. The columns are `level`, `param_0`, `param_1`, ..., `param_n`, where `param_i` are the parameters of the distribution.

References

.. [1] Forecasting: Principles and Practice (3^rd ed) Rob J Hyndman and George Athanasopoulos. https://otexts.com/fpp3/prediction-intervals.html

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def predict_dist(
    self,
    distribution: object,
    steps: int | list[int] | None = None,
    last_window: pd.DataFrame | None = None,
    exog: pd.Series | pd.DataFrame | None = None,
    n_boot: int = 250,
    use_in_sample_residuals: bool = True,
    use_binned_residuals: bool = True,
    random_state: int = 123,
    suppress_warnings: bool = False,
    levels: Any = None
) -> pd.DataFrame:
    """
    Fit a given probability distribution for each step. After generating 
    multiple forecasting predictions through a bootstrapping process, each 
    step is fitted to the given distribution.

    Parameters
    ----------
    distribution : object
        A distribution object from scipy.stats with methods `_pdf` and `fit`. 
        For example scipy.stats.norm.
    steps : int, list, None, default None
        Predict n steps. The value of `steps` must be less than or equal to the 
        value of steps defined when initializing the forecaster. Starts at 1.

        - If `int`: Only steps within the range of 1 to int are predicted.
        - If `list`: List of ints. Only the steps contained in the list 
        are predicted.
        - If `None`: As many steps are predicted as were defined at 
        initialization.
    last_window : pandas DataFrame, default None
        Series values used to create the predictors (lags) needed to 
        predict `steps`.
        If `last_window = None`, the values stored in` self.last_window_` are
        used to calculate the initial predictors, and the predictions start
        right after training data.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s.
    n_boot : int, default 250
        Number of bootstrapping iterations to perform when estimating prediction
        intervals.
    use_in_sample_residuals : bool, default True
        If `True`, residuals from the training data are used as proxy of
        prediction error to create predictions. 
        If `False`, out of sample residuals (calibration) are used. 
        Out-of-sample residuals must be precomputed using Forecaster's
        `set_out_sample_residuals()` method.
    use_binned_residuals : bool, default True
        If `True`, residuals are selected based on the predicted values 
        (binned selection).
        If `False`, residuals are selected randomly.
    random_state : int, default 123
        Seed for the random number generator to ensure reproducibility.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the prediction 
        process. See skforecast.exceptions.warn_skforecast_categories for more
        information.
    levels : Ignored
        Not used, present here for API consistency by convention.

    Returns
    -------
    predictions : pandas DataFrame
        Long-format DataFrame with the parameters of the fitted distribution
        for each step. The columns are `level`, `param_0`, `param_1`, ..., 
        `param_n`, where `param_i` are the parameters of the distribution.

    References
    ----------
    .. [1] Forecasting: Principles and Practice (3rd ed) Rob J Hyndman and George Athanasopoulos.
           https://otexts.com/fpp3/prediction-intervals.html

    """

    if not hasattr(distribution, "_pdf") or not callable(getattr(distribution, "fit", None)):
        raise TypeError(
            "`distribution` must be a valid probability distribution object "
            "from scipy.stats, with methods `_pdf` and `fit`."
        )

    set_skforecast_warnings(suppress_warnings, action='ignore')

    predictions = self.predict_bootstrapping(
                      steps                   = steps,
                      last_window             = last_window,
                      exog                    = exog,
                      n_boot                  = n_boot,
                      random_state            = random_state,
                      use_in_sample_residuals = use_in_sample_residuals,
                      use_binned_residuals    = use_binned_residuals
                  )

    param_names = [
        p for p in inspect.signature(distribution._pdf).parameters if not p == "x"
    ] + ["loc", "scale"]

    predictions[param_names] = (
        predictions.iloc[:, 1:].apply(
            lambda x: distribution.fit(x), axis=1, result_type='expand'
        )
    )
    predictions = predictions[['level'] + param_names]

    set_skforecast_warnings(suppress_warnings, action='default')

    return predictions

set_params ¶


set_params(params)

Set new values to the parameters of the scikit-learn model stored in the forecaster. It is important to note that all models share the same configuration of parameters and hyperparameters. After calling this method, the forecaster is reset to an unfitted state. The fit method must be called before prediction.

Parameters:

Name	Type	Description	Default
`params`	`dict`	Parameters values.	required

Returns:

Type	Description
`None`

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def set_params(
    self, 
    params: dict[str, object]
) -> None:
    """
    Set new values to the parameters of the scikit-learn model stored in the
    forecaster. It is important to note that all models share the same 
    configuration of parameters and hyperparameters. After calling this method, 
    the forecaster is reset to an unfitted state. The `fit` method must be 
    called before prediction.

    Parameters
    ----------
    params : dict
        Parameters values.

    Returns
    -------
    None

    """

    self.estimator = clone(self.estimator)
    self.estimator.set_params(**params)
    self.estimators_ = {
        step: clone(self.estimator)
        for step in self.steps
    }
    self.is_fitted = False

set_fit_kwargs ¶


set_fit_kwargs(fit_kwargs)

Set new values for the additional keyword arguments passed to the fit method of the estimator.

Parameters:

Name	Type	Description	Default
`fit_kwargs`	`dict`	Dict of the form {"argument": new_value}.	required

Returns:

Type	Description
`None`

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def set_fit_kwargs(
    self, 
    fit_kwargs: dict[str, object]
) -> None:
    """
    Set new values for the additional keyword arguments passed to the `fit` 
    method of the estimator.

    Parameters
    ----------
    fit_kwargs : dict
        Dict of the form {"argument": new_value}.

    Returns
    -------
    None

    """

    self.fit_kwargs = check_select_fit_kwargs(self.estimator, fit_kwargs=fit_kwargs)

set_lags ¶


set_lags(lags=None)

Set new value to the attribute lags. Attributes lags_names, max_lag and window_size are also updated.

Parameters:

Name	Type	Description	Default
`lags`	`int, list, numpy ndarray, range, dict`	Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1. `int`: include lags from 1 to `lags` (included). `list`, `1d numpy ndarray` or `range`: include only lags present in `lags`, all elements must be int. `dict`: create different lags for each series. {'series_column_name': lags}. `None`: no lags are included as predictors.	`None`

Returns:

Type	Description
`None`

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def set_lags(
    self, 
    lags: int | list[int] | np.ndarray[int] | range[int] | dict[str, int | list] | None = None,
) -> None:
    """
    Set new value to the attribute `lags`. Attributes `lags_names`, 
    `max_lag` and `window_size` are also updated.

    Parameters
    ----------
    lags : int, list, numpy ndarray, range, dict, default None
        Lags used as predictors. Index starts at 1, so lag 1 is equal to t-1.

        - `int`: include lags from 1 to `lags` (included).
        - `list`, `1d numpy ndarray` or `range`: include only lags present in 
        `lags`, all elements must be int.
        - `dict`: create different lags for each series. {'series_column_name': lags}.
        - `None`: no lags are included as predictors. 

    Returns
    -------
    None

    """

    if self.window_features is None and lags is None:
        raise ValueError(
            "At least one of the arguments `lags` or `window_features` "
            "must be different from None. This is required to create the "
            "predictors used in training the forecaster."
        )

    if isinstance(lags, dict):
        self.lags = {}
        self.lags_names = {}
        list_max_lags = []
        for key in lags:
            if lags[key] is None:
                self.lags[key] = None
                self.lags_names[key] = None
            else:
                self.lags[key], lags_names, max_lag = initialize_lags(
                    forecaster_name = type(self).__name__,
                    lags            = lags[key]
                )
                self.lags_names[key] = (
                    [f'{key}_{lag}' for lag in lags_names] 
                     if lags_names is not None 
                     else None
                )
                if max_lag is not None:
                    list_max_lags.append(max_lag)

        self.max_lag = max(list_max_lags) if len(list_max_lags) != 0 else None
    else:
        self.lags, self.lags_names, self.max_lag = initialize_lags(
            forecaster_name = type(self).__name__, 
            lags            = lags
        )

    # Repeated here in case of lags is a dict with all values as None
    if self.window_features is None and (lags is None or self.max_lag is None):
        raise ValueError(
            "At least one of the arguments `lags` or `window_features` "
            "must be different from None. This is required to create the "
            "predictors used in training the forecaster."
        )

    self.window_size = max(
        [ws for ws in [self.max_lag, self.max_size_window_features] 
         if ws is not None]
    )
    if self.differentiation is not None:
        self.window_size += self.differentiation
        self.differentiator.set_params(window_size=self.window_size)

set_window_features ¶


set_window_features(window_features=None)

Set new value to the attribute window_features. Attributes max_size_window_features, window_features_names, window_features_class_names and window_size are also updated.

Parameters:

Name	Type	Description	Default
`window_features`	`(object, list)`	Instance or list of instances used to create window features. Window features are created from the original time series and are included as predictors.	`None`

Returns:

Type	Description
`None`

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def set_window_features(
    self, 
    window_features: object | list[object] | None = None
) -> None:
    """
    Set new value to the attribute `window_features`. Attributes 
    `max_size_window_features`, `window_features_names`, 
    `window_features_class_names` and `window_size` are also updated.

    Parameters
    ----------
    window_features : object, list, default None
        Instance or list of instances used to create window features. Window features
        are created from the original time series and are included as predictors.

    Returns
    -------
    None

    """

    if window_features is None and self.max_lag is None:
        raise ValueError(
            "At least one of the arguments `lags` or `window_features` "
            "must be different from None. This is required to create the "
            "predictors used in training the forecaster."
        )

    self.window_features, self.window_features_names, self.max_size_window_features = (
        initialize_window_features(window_features)
    )
    self.window_features_class_names = None
    if window_features is not None:
        self.window_features_class_names = [
            type(wf).__name__ for wf in self.window_features
        ] 
    self.window_size = max(
        [ws for ws in [self.max_lag, self.max_size_window_features] 
         if ws is not None]
    )
    if self.differentiation is not None:
        self.window_size += self.differentiation
        self.differentiator.set_params(window_size=self.window_size)

set_in_sample_residuals ¶


set_in_sample_residuals(
    series,
    exog=None,
    random_state=123,
    suppress_warnings=False,
)

Set in-sample residuals in case they were not calculated during the training process.

In-sample residuals are calculated as the difference between the true values and the predictions made by the forecaster using the training data. The following internal attributes are updated:

in_sample_residuals_: Dictionary containing a numpy ndarray with the residuals for each series in the form {series: residuals}.
binner_intervals_: intervals used to bin the residuals are calculated using the quantiles of the predicted values.
in_sample_residuals_by_bin_: residuals are binned according to the predicted value they are associated with and stored in a dictionary, where the keys are the intervals of the predicted values and the values are the residuals associated with that range.

A total of 10_000 residuals are stored in the attribute in_sample_residuals_. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_.

Parameters:

Name	Type	Description	Default
`series`	`pandas DataFrame`	Training time series.	required
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s. Must have the same number of observations as `y` and their indexes must be aligned so that y[i] is regressed on exog[i].	`None`
`random_state`	`int`	Sets a seed to the random sampling for reproducible output.	`123`
`suppress_warnings`	`bool`	If `True`, skforecast warnings will be suppressed during the sampling process. See skforecast.exceptions.warn_skforecast_categories for more information.	`False`

Returns:

Type	Description
`None`

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def set_in_sample_residuals(
    self,
    series: pd.DataFrame,
    exog: pd.Series | pd.DataFrame | None = None,
    random_state: int = 123,
    suppress_warnings: bool = False
) -> None:
    """
    Set in-sample residuals in case they were not calculated during the
    training process. 

    In-sample residuals are calculated as the difference between the true 
    values and the predictions made by the forecaster using the training 
    data. The following internal attributes are updated:

    + `in_sample_residuals_`: Dictionary containing a numpy ndarray with the
    residuals for each series in the form `{series: residuals}`.
    + `binner_intervals_`: intervals used to bin the residuals are calculated
    using the quantiles of the predicted values.
    + `in_sample_residuals_by_bin_`: residuals are binned according to the
    predicted value they are associated with and stored in a dictionary, where
    the keys are the intervals of the predicted values and the values are
    the residuals associated with that range. 

    A total of 10_000 residuals are stored in the attribute `in_sample_residuals_`.
    If the number of residuals is greater than 10_000, a random sample of
    10_000 residuals is stored. The number of residuals stored per bin is
    limited to `10_000 // self.binner.n_bins_`.

    Parameters
    ----------
    series : pandas DataFrame
        Training time series.
    exog : pandas Series, pandas DataFrame, default None
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `y` and their indexes must be aligned so
        that y[i] is regressed on exog[i].
    random_state : int, default 123
        Sets a seed to the random sampling for reproducible output.
    suppress_warnings : bool, default False
        If `True`, skforecast warnings will be suppressed during the sampling 
        process. See skforecast.exceptions.warn_skforecast_categories for more
        information.

    Returns
    -------
    None

    """

    set_skforecast_warnings(suppress_warnings, action='ignore')

    if not self.is_fitted:
        raise NotFittedError(
            "This forecaster is not fitted yet. Call `fit` with appropriate "
            "arguments before using `set_in_sample_residuals()`."
        )

    check_y(y=series[self.level], series_id='`series`')
    series_index_range = check_extract_values_and_index(
        data=series, data_label='`series`', return_values=False
    )[1][[0, -1]]
    if not series_index_range.equals(self.training_range_):
        raise IndexError(
            f"The index range of `series` does not match the range "
            f"used during training. Please ensure the index is aligned "
            f"with the training data.\n"
            f"    Expected : {self.training_range_}\n"
            f"    Received : {series_index_range}"
        )

    # NOTE: This attributes are modified in _create_train_X_y, store original values
    original_exog_in_ = self.exog_in_
    original_X_train_window_features_names_out_ = self.X_train_window_features_names_out_
    original_X_train_direct_exog_names_out_ = self.X_train_direct_exog_names_out_

    (
        X_train,
        y_train,
        _,
        _,
        _,
        _,
        X_train_features_names_out_,
        *_
    ) = self._create_train_X_y(series=series, exog=exog)

    if not X_train_features_names_out_ == self.X_train_features_names_out_:

        # NOTE: Reset attributes modified in _create_train_X_y to their original values
        self.exog_in_ = original_exog_in_
        self.X_train_window_features_names_out_ = original_X_train_window_features_names_out_
        self.X_train_direct_exog_names_out_ = original_X_train_direct_exog_names_out_

        raise ValueError(
            f"Feature mismatch detected after matrix creation. The features "
            f"generated from the provided data do not match those used during "
            f"the training process. To correctly set in-sample residuals, "
            f"ensure that the same data and preprocessing steps are applied.\n"
            f"    Expected output : {self.X_train_features_names_out_}\n"
            f"    Current output  : {X_train_features_names_out_}"
        )

    y_true_steps = []
    y_pred_steps = []
    self.in_sample_residuals_ = {}
    for step in self.steps:
        X_train_step, y_train_step = self.filter_train_X_y_for_step(
                                         step          = step,
                                         X_train       = X_train,
                                         y_train       = y_train,
                                         remove_suffix = True
                                     )

        y_true_steps.append(y_train_step.to_numpy())
        y_pred_steps.append(self.estimators_[step].predict(X_train_step))

    self._binning_in_sample_residuals(
        level                     = self.level,
        y_true                    = np.concatenate(y_true_steps),
        y_pred                    = np.concatenate(y_pred_steps),
        store_in_sample_residuals = True,
        random_state              = random_state
    )

    # NOTE: Reset attributes modified in _create_train_X_y to their original values
    self.exog_in_ = original_exog_in_
    self.X_train_window_features_names_out_ = original_X_train_window_features_names_out_
    self.X_train_direct_exog_names_out_ = original_X_train_direct_exog_names_out_

    set_skforecast_warnings(suppress_warnings, action='default')

set_out_sample_residuals ¶


set_out_sample_residuals(
    y_true, y_pred, append=False, random_state=123
)

Set new values to the attribute out_sample_residuals_. Out of sample residuals are meant to be calculated using observations that did not participate in the training process. y_true and y_pred are expected to be in the original scale of the time series. Residuals are calculated as y_true - y_pred, after applying the necessary transformations and differentiations if the forecaster includes them (self.transformer_series and self.differentiation).

A total of 10_000 residuals are stored in the attribute out_sample_residuals_. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored.

Parameters:

Name	Type	Description	Default
`y_true`	`dict`	Dictionary of numpy ndarrays or pandas Series with the true values of the time series for each series in the form {series: y_true}.	required
`y_pred`	`dict`	Dictionary of numpy ndarrays or pandas Series with the predicted values of the time series for each series in the form {series: y_pred}.	required
`append`	`bool`	If `True`, new residuals are added to the once already stored in the attribute `out_sample_residuals_`. If after appending the new residuals, the limit of 10_000 samples is exceeded, a random sample of 10_000 is kept.	`False`
`random_state`	`int`	Sets a seed to the random sampling for reproducible output.	`123`

Returns:

Type	Description
`None`

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def set_out_sample_residuals(
    self,
    y_true: dict[str, np.ndarray | pd.Series],
    y_pred: dict[str, np.ndarray | pd.Series],
    append: bool = False,
    random_state: int = 123
) -> None:
    """
    Set new values to the attribute `out_sample_residuals_`. Out of sample
    residuals are meant to be calculated using observations that did not
    participate in the training process. `y_true` and `y_pred` are expected
    to be in the original scale of the time series. Residuals are calculated
    as `y_true` - `y_pred`, after applying the necessary transformations and
    differentiations if the forecaster includes them (`self.transformer_series`
    and `self.differentiation`).

    A total of 10_000 residuals are stored in the attribute `out_sample_residuals_`.
    If the number of residuals is greater than 10_000, a random sample of
    10_000 residuals is stored.

    Parameters
    ----------
    y_true : dict
        Dictionary of numpy ndarrays or pandas Series with the true values of
        the time series for each series in the form {series: y_true}.
    y_pred : dict
        Dictionary of numpy ndarrays or pandas Series with the predicted values
        of the time series for each series in the form {series: y_pred}.
    append : bool, default False
        If `True`, new residuals are added to the once already stored in the
        attribute `out_sample_residuals_`. If after appending the new residuals,
        the limit of 10_000 samples is exceeded, a random sample of 10_000 is
        kept.
    random_state : int, default 123
        Sets a seed to the random sampling for reproducible output.

    Returns
    -------
    None

    """

    if not self.is_fitted:
        raise NotFittedError(
            "This forecaster is not fitted yet. Call `fit` with appropriate "
            "arguments before using `set_out_sample_residuals()`."
        )

    if not isinstance(y_true, dict):
        raise TypeError(
            f"`y_true` must be a dictionary of numpy ndarrays or pandas Series. "
            f"Got {type(y_true)}."
        )

    if not isinstance(y_pred, dict):
        raise TypeError(
            f"`y_pred` must be a dictionary of numpy ndarrays or pandas Series. "
            f"Got {type(y_pred)}."
        )

    if not set(y_true.keys()) == set(y_pred.keys()):
        raise ValueError(
            f"`y_true` and `y_pred` must have the same keys. "
            f"Got {set(y_true.keys())} and {set(y_pred.keys())}."
        )

    for k in y_true.keys():
        if not isinstance(y_true[k], (np.ndarray, pd.Series)):
            raise TypeError(
                f"Values of `y_true` must be numpy ndarrays or pandas Series. "
                f"Got {type(y_true[k])} for series {k}."
            )
        if not isinstance(y_pred[k], (np.ndarray, pd.Series)):
            raise TypeError(
                f"Values of `y_pred` must be numpy ndarrays or pandas Series. "
                f"Got {type(y_pred[k])} for series {k}."
            )
        if len(y_true[k]) != len(y_pred[k]):
            raise ValueError(
                f"`y_true` and `y_pred` must have the same length. "
                f"Got {len(y_true[k])} and {len(y_pred[k])} for series {k}."
            )
        if isinstance(y_true[k], pd.Series) and isinstance(y_pred[k], pd.Series):
            if not y_true[k].index.equals(y_pred[k].index):
                raise ValueError(
                    f"When containing pandas Series, elements in `y_true` and "
                    f"`y_pred` must have the same index. Error in series {k}."
                )

    if not set(y_pred.keys()) == {self.level}:
        raise ValueError(
            f"`y_pred` and `y_true` must have only the key '{self.level}'. " 
            f"Got {set(y_pred.keys())}."
        )

    y_true = y_true[self.level]
    y_pred = y_pred[self.level]
    if not isinstance(y_pred, np.ndarray):
        y_pred = y_pred.to_numpy()
    if not isinstance(y_true, np.ndarray):
        y_true = y_true.to_numpy()

    if self.transformer_series:
        y_true = transform_numpy(
                     array             = y_true,
                     transformer       = self.transformer_series_[self.level],
                     fit               = False,
                     inverse_transform = False
                 )
        y_pred = transform_numpy(
                     array             = y_pred,
                     transformer       = self.transformer_series_[self.level],
                     fit               = False,
                     inverse_transform = False
                 )

    if self.differentiation is not None:
        differentiator = copy(self.differentiator)
        differentiator.set_params(window_size=None)
        y_true = differentiator.fit_transform(y_true)[self.differentiation:]
        y_pred = differentiator.fit_transform(y_pred)[self.differentiation:]

    data = pd.DataFrame(
        {'prediction': y_pred, 'residuals': y_true - y_pred}
    ).dropna()
    y_pred = data['prediction'].to_numpy()
    residuals = data['residuals'].to_numpy()

    data['bin'] = self.binner[self.level].transform(y_pred).astype(int)
    residuals_by_bin = data.groupby('bin')['residuals'].apply(np.array).to_dict()

    if self.out_sample_residuals_ is None:
        self.out_sample_residuals_ = {self.level: None}
        self.out_sample_residuals_by_bin_ = {self.level: None}

    out_sample_residuals = (
        np.array([]) 
        if self.out_sample_residuals_[self.level] is None
        else self.out_sample_residuals_[self.level]
    )
    out_sample_residuals_by_bin = (
        {} 
        if self.out_sample_residuals_by_bin_[self.level] is None
        else self.out_sample_residuals_by_bin_[self.level]
    )
    if append:
        out_sample_residuals = np.concatenate([out_sample_residuals, residuals])
        for k, v in residuals_by_bin.items():
            if k in out_sample_residuals_by_bin:
                out_sample_residuals_by_bin[k] = np.concatenate(
                    (out_sample_residuals_by_bin[k], v)
                )
            else:
                out_sample_residuals_by_bin[k] = v
    else:
        out_sample_residuals = residuals
        out_sample_residuals_by_bin = residuals_by_bin

    max_samples = 10_000 // self.binner[self.level].n_bins_
    rng = np.random.default_rng(seed=random_state)
    for k, v in out_sample_residuals_by_bin.items():
        if len(v) > max_samples:
            sample = rng.choice(a=v, size=max_samples, replace=False)
            out_sample_residuals_by_bin[k] = sample

    for k in self.binner_intervals_.get(self.level, {}).keys():
        if k not in out_sample_residuals_by_bin:
            out_sample_residuals_by_bin[k] = np.array([])

    empty_bins = [
        k for k, v in out_sample_residuals_by_bin.items() 
        if v.size == 0
    ]
    if empty_bins:
        warnings.warn(
            f"The following bins of level '{self.level}' have no out of sample residuals: "
            f"{empty_bins}. No predicted values fall in the interval "
            f"{[self.binner_intervals_[self.level][bin] for bin in empty_bins]}. "
            f"Empty bins will be filled with a random sample of residuals.", 
            ResidualsUsageWarning
        )
        empty_bin_size = min(max_samples, len(out_sample_residuals))
        for k in empty_bins:
            out_sample_residuals_by_bin[k] = rng.choice(
                a       = out_sample_residuals,
                size    = empty_bin_size,
                replace = False
            )

    if len(out_sample_residuals) > 10_000:
        out_sample_residuals = rng.choice(
            a       = out_sample_residuals, 
            size    = 10_000, 
            replace = False
        )

    self.out_sample_residuals_[self.level] = out_sample_residuals
    self.out_sample_residuals_by_bin_[self.level] = out_sample_residuals_by_bin

get_feature_importances ¶


get_feature_importances(step, sort_importance=True)

Return feature importance of the model stored in the forecaster for a specific step. Since a separate model is created for each forecast time step, it is necessary to select the model from which retrieve information. Only valid when estimator stores internally the feature importances in the attribute feature_importances_ or coef_. Otherwise, it returns
None.

Parameters:

Name	Type	Description	Default
`step`	`int`	Model from which retrieve information (a separate model is created for each forecast time step). First step is 1.	required
`sort_importance`	`bool`	If `True`, sorts the feature importances in descending order.	`True`

Returns:

Name	Type	Description
`feature_importances`	`pandas DataFrame`	Feature importances associated with each predictor.

Source code in skforecast/direct/_forecaster_direct_multivariate.py

def get_feature_importances(
    self,
    step: int,
    sort_importance: bool = True
) -> pd.DataFrame:
    """
    Return feature importance of the model stored in the forecaster for a
    specific step. Since a separate model is created for each forecast time
    step, it is necessary to select the model from which retrieve information.
    Only valid when estimator stores internally the feature importances in
    the attribute `feature_importances_` or `coef_`. Otherwise, it returns  
    `None`.

    Parameters
    ----------
    step : int
        Model from which retrieve information (a separate model is created 
        for each forecast time step). First step is 1.
    sort_importance: bool, default True
        If `True`, sorts the feature importances in descending order.

    Returns
    -------
    feature_importances : pandas DataFrame
        Feature importances associated with each predictor.

    """

    if not isinstance(step, int):
        raise TypeError(
            f"`step` must be an integer. Got {type(step)}."
        )

    if not self.is_fitted:
        raise NotFittedError(
            "This forecaster is not fitted yet. Call `fit` with appropriate "
            "arguments before using `get_feature_importances()`."
        )

    if (step < 1) or (step > self.max_step):
        raise ValueError(
            f"The step must have a value from 1 to the maximum number of steps "
            f"({self.max_step}). Got {step}."
        )

    if isinstance(self.estimator, Pipeline):
        estimator = self.estimators_[step][-1]
    else:
        estimator = self.estimators_[step]

    n_lags = len(list(
        chain(*[v for v in self.lags_.values() if v is not None])
    ))
    n_window_features = (
        len(self.X_train_window_features_names_out_) if self.window_features is not None else 0
    )
    idx_columns_autoreg = np.arange(n_lags + n_window_features)
    if self.exog_in_:
        idx_columns_exog = np.flatnonzero(
            [name.endswith(f"step_{step}") for name in self.X_train_features_names_out_]
        )
    else:
        idx_columns_exog = np.array([], dtype=int)

    idx_columns = np.concatenate((idx_columns_autoreg, idx_columns_exog))
    idx_columns = [int(x) for x in idx_columns]  # Required since numpy 2.0
    feature_names = [
        self.X_train_features_names_out_[i].replace(f"_step_{step}", "") 
        for i in idx_columns
    ]

    if hasattr(estimator, 'feature_importances_'):
        feature_importances = estimator.feature_importances_
    elif hasattr(estimator, 'coef_'):
        feature_importances = estimator.coef_
    else:
        warnings.warn(
            f"Impossible to access feature importances for estimator of type "
            f"{type(estimator)}. This method is only valid when the "
            f"estimator stores internally the feature importances in the "
            f"attribute `feature_importances_` or `coef_`."
        )
        feature_importances = None

    if feature_importances is not None:
        feature_importances = pd.DataFrame({
                                  'feature': feature_names,
                                  'importance': feature_importances
                              })
        if sort_importance:
            feature_importances = feature_importances.sort_values(
                                      by='importance', ascending=False
                                  )

    return feature_importances

ForecasterDirectMultiVariate¶

skforecast.direct._forecaster_direct_multivariate.ForecasterDirectMultiVariate ¶

Attributes¶

estimator instance-attribute ¶

level instance-attribute ¶

lags_ instance-attribute ¶

transformer_series instance-attribute ¶

transformer_series_ instance-attribute ¶

transformer_exog instance-attribute ¶

weight_func instance-attribute ¶

source_code_weight_func instance-attribute ¶

differentiation instance-attribute ¶

differentiation_max instance-attribute ¶

differentiator instance-attribute ¶

differentiator_ instance-attribute ¶

last_window_ instance-attribute ¶

index_type_ instance-attribute ¶

index_freq_ instance-attribute ¶

training_range_ instance-attribute ¶

series_names_in_ instance-attribute ¶

exog_in_ instance-attribute ¶

exog_names_in_ instance-attribute ¶

exog_type_in_ instance-attribute ¶

exog_dtypes_in_ instance-attribute ¶

exog_dtypes_out_ instance-attribute ¶

X_train_series_names_in_ instance-attribute ¶

X_train_window_features_names_out_ instance-attribute ¶

X_train_exog_names_out_ instance-attribute ¶

X_train_direct_exog_names_out_ instance-attribute ¶

X_train_features_names_out_ instance-attribute ¶

in_sample_residuals_ instance-attribute ¶

out_sample_residuals_ instance-attribute ¶

in_sample_residuals_by_bin_ instance-attribute ¶

out_sample_residuals_by_bin_ instance-attribute ¶

creation_date instance-attribute ¶

is_fitted instance-attribute ¶

fit_date instance-attribute ¶

skforecast_version instance-attribute ¶

python_version instance-attribute ¶

forecaster_id instance-attribute ¶

_probabilistic_mode instance-attribute ¶

dropna_from_series instance-attribute ¶

encoding instance-attribute ¶

steps instance-attribute ¶

max_step instance-attribute ¶

estimators_ instance-attribute ¶

lags instance-attribute ¶

lags_names instance-attribute ¶

max_lag instance-attribute ¶

window_size instance-attribute ¶

window_features_class_names instance-attribute ¶

fit_kwargs instance-attribute ¶

binner instance-attribute ¶

binner_intervals_ instance-attribute ¶

binner_kwargs instance-attribute ¶

n_jobs instance-attribute ¶

Functions¶

_repr_html_ ¶

_create_data_to_return_dict ¶

_create_lags ¶

_create_window_features ¶

_create_train_X_y ¶

create_train_X_y ¶

filter_train_X_y_for_step ¶

_train_test_split_one_step_ahead ¶

create_sample_weights ¶

fit ¶

_binning_in_sample_residuals ¶

_create_predict_inputs ¶

create_predict_X ¶

_direct_predict ¶

predict ¶

predict_bootstrapping ¶

_predict_interval_conformal ¶

predict_interval ¶

predict_quantiles ¶

predict_dist ¶

set_params ¶

set_fit_kwargs ¶

set_lags ¶

`ForecasterDirectMultiVariate`¶

estimator `instance-attribute` ¶

level `instance-attribute` ¶

lags_ `instance-attribute` ¶

transformer_series `instance-attribute` ¶

transformer_series_ `instance-attribute` ¶

transformer_exog `instance-attribute` ¶

weight_func `instance-attribute` ¶

source_code_weight_func `instance-attribute` ¶

differentiation `instance-attribute` ¶

differentiation_max `instance-attribute` ¶

differentiator `instance-attribute` ¶

differentiator_ `instance-attribute` ¶

last_window_ `instance-attribute` ¶

index_type_ `instance-attribute` ¶

index_freq_ `instance-attribute` ¶

training_range_ `instance-attribute` ¶

series_names_in_ `instance-attribute` ¶

exog_in_ `instance-attribute` ¶

exog_names_in_ `instance-attribute` ¶

exog_type_in_ `instance-attribute` ¶

exog_dtypes_in_ `instance-attribute` ¶

exog_dtypes_out_ `instance-attribute` ¶

X_train_series_names_in_ `instance-attribute` ¶

X_train_window_features_names_out_ `instance-attribute` ¶

X_train_exog_names_out_ `instance-attribute` ¶

X_train_direct_exog_names_out_ `instance-attribute` ¶

X_train_features_names_out_ `instance-attribute` ¶

in_sample_residuals_ `instance-attribute` ¶

out_sample_residuals_ `instance-attribute` ¶

in_sample_residuals_by_bin_ `instance-attribute` ¶

out_sample_residuals_by_bin_ `instance-attribute` ¶

creation_date `instance-attribute` ¶

is_fitted `instance-attribute` ¶

fit_date `instance-attribute` ¶

skforecast_version `instance-attribute` ¶

python_version `instance-attribute` ¶

forecaster_id `instance-attribute` ¶

_probabilistic_mode `instance-attribute` ¶

dropna_from_series `instance-attribute` ¶

encoding `instance-attribute` ¶

steps `instance-attribute` ¶

max_step `instance-attribute` ¶

estimators_ `instance-attribute` ¶

lags `instance-attribute` ¶

lags_names `instance-attribute` ¶

max_lag `instance-attribute` ¶

window_size `instance-attribute` ¶

window_features_class_names `instance-attribute` ¶

fit_kwargs `instance-attribute` ¶

binner `instance-attribute` ¶

binner_intervals_ `instance-attribute` ¶

binner_kwargs `instance-attribute` ¶

n_jobs `instance-attribute` ¶