Skip to content

model_selection_sarimax

backtesting_sarimax(forecaster, y, steps, metric, initial_train_size, fixed_train_size=True, exog=None, refit=False, alpha=None, interval=None, verbose=False)

Backtesting of ForecasterSarimax.

If refit is False, the model is trained only once using the initial_train_size first observations. If refit is True, the model is trained in each iteration increasing the training set. A copy of the original forecaster is created so it is not modified during the process.

Parameters:

Name Type Description Default
forecaster ForecasterSarimax

Forecaster model.

required
y Series

Training time series.

required
steps int

Number of steps to predict.

required
metric Union[str, Callable, list]

Metric used to quantify the goodness of fit of the model.

If string: {'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'mean_squared_log_error'}

If Callable: Function with arguments y_true, y_pred that returns a float.

If list: List containing several strings and/or Callable.

required
initial_train_size int

Number of samples in the initial train split. The backtest forecaster is trained using the first initial_train_size observations.

required
fixed_train_size bool

If True, train size doesn't increase but moves by steps in each iteration.

True
exog Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i].

None
refit bool

Whether to re-fit the forecaster in each iteration.

False
alpha Optional[float]

The confidence intervals for the forecasts are (1 - alpha) %. If both, alpha and interval are provided, alpha will be used.

None
interval Optional[list]

Confidence of the prediction interval estimated. The values must be symmetric. Sequence of percentiles to compute, which must be between 0 and 100 inclusive. For example, interval of 95% should be as interval = [2.5, 97.5]. If both, alpha and interval are provided, alpha will be used.

None
verbose bool

Print number of folds and index of training and validation sets used for backtesting.

False

Returns:

Type Description
Tuple[Union[float, list], pandas.core.frame.DataFrame]

Value(s) of the metric(s).

Source code in skforecast\model_selection_sarimax\model_selection_sarimax.py
def backtesting_sarimax(
    forecaster,
    y: pd.Series,
    steps: int,
    metric: Union[str, Callable, list],
    initial_train_size: int,
    fixed_train_size: bool=True,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
    refit: bool=False,
    alpha: Optional[float]=None,
    interval: Optional[list]=None,
    verbose: bool=False
) -> Tuple[Union[float, list], pd.DataFrame]:
    """
    Backtesting of ForecasterSarimax.

    If `refit` is False, the model is trained only once using the `initial_train_size`
    first observations. If `refit` is True, the model is trained in each iteration
    increasing the training set. A copy of the original forecaster is created so 
    it is not modified during the process.

    Parameters
    ----------
    forecaster : ForecasterSarimax
        Forecaster model.

    y : pandas Series
        Training time series.

    steps : int
        Number of steps to predict.

    metric : str, Callable, list
        Metric used to quantify the goodness of fit of the model.

        If string:
            {'mean_squared_error', 'mean_absolute_error',
             'mean_absolute_percentage_error', 'mean_squared_log_error'}

        If Callable:
            Function with arguments y_true, y_pred that returns a float.

        If list:
            List containing several strings and/or Callable.

    initial_train_size : int
        Number of samples in the initial train split. The backtest forecaster is
        trained using the first `initial_train_size` observations.

    fixed_train_size : bool, default `True`
        If True, train size doesn't increase but moves by `steps` in each iteration.

    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `y` and should be aligned so that y[i] is
        regressed on exog[i].

    refit : bool, default `False`
        Whether to re-fit the forecaster in each iteration.

    alpha : float, default `0.05`
        The confidence intervals for the forecasts are (1 - alpha) %.
        If both, `alpha` and `interval` are provided, `alpha` will be used.

    interval : list, default `None`
        Confidence of the prediction interval estimated. The values must be
        symmetric. Sequence of percentiles to compute, which must be between 
        0 and 100 inclusive. For example, interval of 95% should be as 
        `interval = [2.5, 97.5]`. If both, `alpha` and `interval` are 
        provided, `alpha` will be used.

    verbose : bool, default `False`
        Print number of folds and index of training and validation sets used for backtesting.

    Returns 
    -------
    metrics_value : float, list
        Value(s) of the metric(s).

    backtest_predictions : pandas DataFrame
        Value of predictions and their estimated interval if `interval` is not `None`.
            column pred = predictions.
            column lower_bound = lower bound of the interval.
            column upper_bound = upper bound interval of the interval.

    """

    if initial_train_size is None:
        raise ValueError(
            '`initial_train_size` must be an int smaller than the length of `y`.'
        )

    if initial_train_size is not None and initial_train_size >= len(y):
        raise ValueError(
            '`initial_train_size` must be an int smaller than the length of `y`.'
        )

    if initial_train_size is not None and initial_train_size < forecaster.window_size:
        raise ValueError(
            (f"`initial_train_size` must be greater than "
             f"forecaster's window_size ({forecaster.window_size}).")
        )

    if not isinstance(refit, bool):
        raise TypeError(
            f'`refit` must be boolean: `True`, `False`.'
        )

    if refit:
        metrics_values, backtest_predictions = _backtesting_sarimax_refit(
            forecaster          = forecaster,
            y                   = y,
            steps               = steps,
            metric              = metric,
            initial_train_size  = initial_train_size,
            fixed_train_size    = fixed_train_size,
            exog                = exog,
            alpha               = alpha,
            interval            = interval,
            verbose             = verbose
        )
    else:
        metrics_values, backtest_predictions = _backtesting_sarimax_no_refit(
            forecaster          = forecaster,
            y                   = y,
            steps               = steps,
            metric              = metric,
            initial_train_size  = initial_train_size,
            exog                = exog,
            alpha               = alpha,
            interval            = interval,
            verbose             = verbose
        )

    return metrics_values, backtest_predictions

grid_search_sarimax(forecaster, y, param_grid, steps, metric, initial_train_size, fixed_train_size=True, exog=None, refit=False, return_best=True, verbose=True)

Exhaustive search over specified parameter values for a ForecasterSarimax object.

Validation is done using time series backtesting.

Parameters:

Name Type Description Default
forecaster ForecasterSarimax

Forcaster model.

required
y Series

Training time series values.

required
param_grid dict

Dictionary with parameters names (str) as keys and lists of parameter settings to try as values.

required
steps int

Number of steps to predict.

required
metric Union[str, Callable, list]

Metric used to quantify the goodness of fit of the model.

If string: {'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'mean_squared_log_error'}

If Callable: Function with arguments y_true, y_pred that returns a float.

If list: List containing several strings and/or Callable.

required
initial_train_size int

Number of samples in the initial train split. The backtest forecaster is trained using the first initial_train_size observations.

required
fixed_train_size bool

If True, train size doesn't increase but moves by steps in each iteration.

True
exog Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i].

None
refit bool

Whether to re-fit the forecaster in each iteration of backtesting.

False
return_best bool

Refit the forecaster using the best found parameters on the whole data.

True
verbose bool

Print number of folds used for cv or backtesting.

True

Returns:

Type Description
DataFrame

Results for each combination of parameters. column lags = predictions. column params = lower bound of the interval. column metric = metric value estimated for the combination of parameters. additional n columns with param = value.

Source code in skforecast\model_selection_sarimax\model_selection_sarimax.py
def grid_search_sarimax(
    forecaster,
    y: pd.Series,
    param_grid: dict,
    steps: int,
    metric: Union[str, Callable, list],
    initial_train_size: int,
    fixed_train_size: bool=True,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
    refit: bool=False,
    return_best: bool=True,
    verbose: bool=True
) -> pd.DataFrame:
    """
    Exhaustive search over specified parameter values for a ForecasterSarimax object.
    Validation is done using time series backtesting.

    Parameters
    ----------
    forecaster : ForecasterSarimax
        Forcaster model.

    y : pandas Series
        Training time series values. 

    param_grid : dict
        Dictionary with parameters names (`str`) as keys and lists of parameter
        settings to try as values.

    steps : int
        Number of steps to predict.

    metric : str, Callable, list
        Metric used to quantify the goodness of fit of the model.

        If string:
            {'mean_squared_error', 'mean_absolute_error',
             'mean_absolute_percentage_error', 'mean_squared_log_error'}

        If Callable:
            Function with arguments y_true, y_pred that returns a float.

        If list:
            List containing several strings and/or Callable.

    initial_train_size : int 
        Number of samples in the initial train split. The backtest forecaster is
        trained using the first `initial_train_size` observations.

    fixed_train_size : bool, default `True`
        If True, train size doesn't increase but moves by `steps` in each iteration.

    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `y` and should be aligned so that y[i] is
        regressed on exog[i].

    refit : bool, default `False`
        Whether to re-fit the forecaster in each iteration of backtesting.

    return_best : bool, default `True`
        Refit the `forecaster` using the best found parameters on the whole data.

    verbose : bool, default `True`
        Print number of folds used for cv or backtesting.

    Returns 
    -------
    results : pandas DataFrame
        Results for each combination of parameters.
            column lags = predictions.
            column params = lower bound of the interval.
            column metric = metric value estimated for the combination of parameters.
            additional n columns with param = value.

    """

    param_grid = list(ParameterGrid(param_grid))

    results = _evaluate_grid_hyperparameters_sarimax(
        forecaster          = forecaster,
        y                   = y,
        param_grid          = param_grid,
        steps               = steps,
        metric              = metric,
        initial_train_size  = initial_train_size,
        fixed_train_size    = fixed_train_size,
        exog                = exog,
        refit               = refit,
        return_best         = return_best,
        verbose             = verbose
    )

    return results

random_search_sarimax(forecaster, y, param_distributions, steps, metric, initial_train_size, fixed_train_size=True, exog=None, refit=False, n_iter=10, random_state=123, return_best=True, verbose=True)

Random search over specified parameter values or distributions for a Forecaster object.

Validation is done using time series backtesting.

Parameters:

Name Type Description Default
forecaster ForecasterSarimax

Forcaster model.

required
y Series

Training time series.

required
param_distributions dict

Dictionary with parameters names (str) as keys and distributions or lists of parameters to try.

required
steps int

Number of steps to predict.

required
metric Union[str, Callable, list]

Metric used to quantify the goodness of fit of the model.

If string: {'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'mean_squared_log_error'}

If Callable: Function with arguments y_true, y_pred that returns a float.

If list: List containing several strings and/or Callable.

required
initial_train_size int

Number of samples in the initial train split. The backtest forecaster is trained using the first initial_train_size observations.

required
fixed_train_size bool

If True, train size doesn't increase but moves by steps in each iteration.

True
exog Union[pandas.core.series.Series, pandas.core.frame.DataFrame]

Exogenous variable/s included as predictor/s. Must have the same number of observations as y and should be aligned so that y[i] is regressed on exog[i].

None
refit bool

Whether to re-fit the forecaster in each iteration of backtesting.

False
n_iter int

Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution.

10
random_state int

Sets a seed to the random sampling for reproducible output.

123
return_best bool

Refit the forecaster using the best found parameters on the whole data.

True
verbose bool

Print number of folds used for cv or backtesting.

True

Returns:

Type Description
DataFrame

Results for each combination of parameters. column lags = predictions. column params = lower bound of the interval. column metric = metric value estimated for the combination of parameters. additional n columns with param = value.

Source code in skforecast\model_selection_sarimax\model_selection_sarimax.py
def random_search_sarimax(
    forecaster,
    y: pd.Series,
    param_distributions: dict,
    steps: int,
    metric: Union[str, Callable, list],
    initial_train_size: int,
    fixed_train_size: bool=True,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
    refit: bool=False,
    n_iter: int=10,
    random_state: int=123,
    return_best: bool=True,
    verbose: bool=True
) -> pd.DataFrame:
    """
    Random search over specified parameter values or distributions for a Forecaster object.
    Validation is done using time series backtesting.

    Parameters
    ----------
    forecaster : ForecasterSarimax
        Forcaster model.

    y : pandas Series
        Training time series. 

    param_distributions : dict
        Dictionary with parameters names (`str`) as keys and 
        distributions or lists of parameters to try.

    steps : int
        Number of steps to predict.

    metric : str, Callable, list
        Metric used to quantify the goodness of fit of the model.

        If string:
            {'mean_squared_error', 'mean_absolute_error',
             'mean_absolute_percentage_error', 'mean_squared_log_error'}

        If Callable:
            Function with arguments y_true, y_pred that returns a float.

        If list:
            List containing several strings and/or Callable.

    initial_train_size : int 
        Number of samples in the initial train split. The backtest forecaster is
        trained using the first `initial_train_size` observations.

    fixed_train_size : bool, default `True`
        If True, train size doesn't increase but moves by `steps` in each iteration.

    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `y` and should be aligned so that y[i] is
        regressed on exog[i].

    refit : bool, default `False`
        Whether to re-fit the forecaster in each iteration of backtesting.

    n_iter : int, default `10`
        Number of parameter settings that are sampled. 
        n_iter trades off runtime vs quality of the solution.

    random_state : int, default `123`
        Sets a seed to the random sampling for reproducible output.

    return_best : bool, default `True`
        Refit the `forecaster` using the best found parameters on the whole data.

    verbose : bool, default `True`
        Print number of folds used for cv or backtesting.

    Returns 
    -------
    results : pandas DataFrame
        Results for each combination of parameters.
            column lags = predictions.
            column params = lower bound of the interval.
            column metric = metric value estimated for the combination of parameters.
            additional n columns with param = value.

    """

    param_grid = list(ParameterSampler(param_distributions, n_iter=n_iter, random_state=random_state))

    results = _evaluate_grid_hyperparameters_sarimax(
        forecaster          = forecaster,
        y                   = y,
        param_grid          = param_grid,
        steps               = steps,
        metric              = metric,
        initial_train_size  = initial_train_size,
        fixed_train_size    = fixed_train_size,
        exog                = exog,
        refit               = refit,
        return_best         = return_best,
        verbose             = verbose
    )

    return results