`model_selection_sarimax`¶

`backtesting_sarimax(forecaster, y, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, exog=None, refit=False, alpha=None, interval=None, n_jobs='auto', verbose=False, show_progress=True)` ¶

Backtesting of ForecasterSarimax.

If refit is False, the model will be trained only once using the initial_train_size first observations.
If refit is True, the model is trained on each iteration, increasing the training set.
If refit is an integer, the model will be trained every that number of iterations.

A copy of the original forecaster is created so that it is not modified during the process.

Parameters:

Name	Type	Description	Default
`forecaster`	`ForecasterSarimax`	Forecaster model.	required
`y`	`pandas Series`	Training time series.	required
`steps`	`int`	Number of steps to predict.	required
`metric`	`str, Callable, list`	Metric used to quantify the goodness of fit of the model. If `string`: {'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'mean_squared_log_error'} If `Callable`: Function with arguments y_true, y_pred that returns a float. If `list`: List containing multiple strings and/or Callables.	required
`initial_train_size`	`int`	Number of samples in the initial train split. The backtest forecaster is trained using the first `initial_train_size` observations.	required
`fixed_train_size`	`bool`	If True, train size doesn't increase but moves by `steps` in each iteration.	`True`
`gap`	`int`	Number of samples to be excluded after the end of each training set and before the test set.	`0`
`allow_incomplete_fold`	`bool`	Last fold is allowed to have a smaller number of samples than the `test_size`. If `False`, the last fold is excluded.	`True`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s. Must have the same number of observations as `y` and should be aligned so that y[i] is regressed on exog[i].	`None`
`refit`	`bool, int`	Whether to re-fit the forecaster in each iteration. If `refit` is an integer, the Forecaster will be trained every that number of iterations.	`False`
`alpha`	`float`	The confidence intervals for the forecasts are (1 - alpha) %. If both, `alpha` and `interval` are provided, `alpha` will be used.	`0.05`
`interval`	`list`	Confidence of the prediction interval estimated. The values must be symmetric. Sequence of percentiles to compute, which must be between 0 and 100 inclusive. For example, interval of 95% should be as `interval = [2.5, 97.5]`. If both, `alpha` and `interval` are provided, `alpha` will be used.	`None`
`n_jobs`	`int, auto`	The number of jobs to run in parallel. If `-1`, then the number of jobs is set to the number of cores. If 'auto', `n_jobs` is set using the function skforecast.utils.select_n_jobs_backtesting. New in version 0.9.0	`'auto'`
`verbose`	`bool`	Print number of folds and index of training and validation sets used for backtesting.	`False`
`show_progress`	`bool`	Whether to show a progress bar.	`True`

Returns:

Name Type Description

metrics_value

float, list

Value(s) of the metric(s).

backtest_predictions

pandas DataFrame

Value of predictions and their estimated interval if interval is not None.

column pred: predictions.
column lower_bound: lower bound of the interval.
column upper_bound: upper bound of the interval.

Source code in skforecast\model_selection_sarimax\model_selection_sarimax.py

def backtesting_sarimax(
    forecaster,
    y: pd.Series,
    steps: int,
    metric: Union[str, Callable, list],
    initial_train_size: int,
    fixed_train_size: bool=True,
    gap: int=0,
    allow_incomplete_fold: bool=True,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
    refit: Optional[Union[bool, int]]=False,
    alpha: Optional[float]=None,
    interval: Optional[list]=None,
    n_jobs: Optional[Union[int, str]]='auto',
    verbose: bool=False,
    show_progress: bool=True
) -> Tuple[Union[float, list], pd.DataFrame]:
    """
    Backtesting of ForecasterSarimax.

    - If `refit` is `False`, the model will be trained only once using the 
    `initial_train_size` first observations. 
    - If `refit` is `True`, the model is trained on each iteration, increasing
    the training set. 
    - If `refit` is an `integer`, the model will be trained every that number 
    of iterations.

    A copy of the original forecaster is created so that it is not modified during 
    the process.

    Parameters
    ----------
    forecaster : ForecasterSarimax
        Forecaster model.
    y : pandas Series
        Training time series.
    steps : int
        Number of steps to predict.
    metric : str, Callable, list
        Metric used to quantify the goodness of fit of the model.

            - If `string`: {'mean_squared_error', 'mean_absolute_error',
             'mean_absolute_percentage_error', 'mean_squared_log_error'}
            - If `Callable`: Function with arguments y_true, y_pred that returns 
            a float.
            - If `list`: List containing multiple strings and/or Callables.
    initial_train_size : int
        Number of samples in the initial train split. The backtest forecaster is
        trained using the first `initial_train_size` observations.
    fixed_train_size : bool, default `True`
        If True, train size doesn't increase but moves by `steps` in each iteration.
    gap : int, default `0`
        Number of samples to be excluded after the end of each training set and 
        before the test set.
    allow_incomplete_fold : bool, default `True`
        Last fold is allowed to have a smaller number of samples than the 
        `test_size`. If `False`, the last fold is excluded.
    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `y` and should be aligned so that y[i] is
        regressed on exog[i].
    refit : bool, int, default `False`
        Whether to re-fit the forecaster in each iteration. If `refit` is an integer, 
        the Forecaster will be trained every that number of iterations.
    alpha : float, default `0.05`
        The confidence intervals for the forecasts are (1 - alpha) %.
        If both, `alpha` and `interval` are provided, `alpha` will be used.
    interval : list, default `None`
        Confidence of the prediction interval estimated. The values must be
        symmetric. Sequence of percentiles to compute, which must be between 
        0 and 100 inclusive. For example, interval of 95% should be as 
        `interval = [2.5, 97.5]`. If both, `alpha` and `interval` are 
        provided, `alpha` will be used.
    n_jobs : int, 'auto', default `'auto'`
        The number of jobs to run in parallel. If `-1`, then the number of jobs is 
        set to the number of cores. If 'auto', `n_jobs` is set using the function
        skforecast.utils.select_n_jobs_backtesting.
        **New in version 0.9.0**     
    verbose : bool, default `False`
        Print number of folds and index of training and validation sets used 
        for backtesting.
    show_progress: bool, default `True`
        Whether to show a progress bar.

    Returns
    -------
    metrics_value : float, list
        Value(s) of the metric(s).
    backtest_predictions : pandas DataFrame
        Value of predictions and their estimated interval if `interval` is not `None`.

            - column pred: predictions.
            - column lower_bound: lower bound of the interval.
            - column upper_bound: upper bound of the interval.

    """

    if type(forecaster).__name__ not in ['ForecasterSarimax']:
        raise TypeError(
            ("`forecaster` must be of type `ForecasterSarimax`, for all other "
             "types of forecasters use the functions available in the other "
             "`model_selection` modules.")
        )

    check_backtesting_input(
        forecaster            = forecaster,
        steps                 = steps,
        metric                = metric,
        y                     = y,
        initial_train_size    = initial_train_size,
        fixed_train_size      = fixed_train_size,
        gap                   = gap,
        allow_incomplete_fold = allow_incomplete_fold,
        refit                 = refit,
        interval              = interval,
        alpha                 = alpha,
        n_jobs                = n_jobs,
        verbose               = verbose,
        show_progress         = show_progress
    )

    metrics_values, backtest_predictions = _backtesting_sarimax(
        forecaster            = forecaster,
        y                     = y,
        steps                 = steps,
        metric                = metric,
        initial_train_size    = initial_train_size,
        fixed_train_size      = fixed_train_size,
        gap                   = gap,
        allow_incomplete_fold = allow_incomplete_fold,
        exog                  = exog,
        refit                 = refit,
        alpha                 = alpha,
        interval              = interval,
        n_jobs                = n_jobs,
        verbose               = verbose,
        show_progress         = show_progress
    )

    return metrics_values, backtest_predictions

`grid_search_sarimax(forecaster, y, param_grid, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, exog=None, refit=False, return_best=True, n_jobs='auto', verbose=True, show_progress=True)` ¶

Exhaustive search over specified parameter values for a ForecasterSarimax object. Validation is done using time series backtesting.

Parameters:

Name	Type	Description	Default
`forecaster`	`ForecasterSarimax`	Forecaster model.	required
`y`	`pandas Series`	Training time series.	required
`param_grid`	`dict`	Dictionary with parameters names (`str`) as keys and lists of parameter settings to try as values.	required
`steps`	`int`	Number of steps to predict.	required
`metric`	`str, Callable, list`	Metric used to quantify the goodness of fit of the model. If `string`: {'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'mean_squared_log_error'} If `Callable`: Function with arguments y_true, y_pred that returns a float. If `list`: List containing multiple strings and/or Callables.	required
`initial_train_size`	`int`	Number of samples in the initial train split. The backtest forecaster is trained using the first `initial_train_size` observations.	required
`fixed_train_size`	`bool`	If True, train size doesn't increase but moves by `steps` in each iteration.	`True`
`gap`	`int`	Number of samples to be excluded after the end of each training set and before the test set.	`0`
`allow_incomplete_fold`	`bool`	Last fold is allowed to have a smaller number of samples than the `test_size`. If `False`, the last fold is excluded.	`True`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s. Must have the same number of observations as `y` and should be aligned so that y[i] is regressed on exog[i].	`None`
`refit`	`bool, int`	Whether to re-fit the forecaster in each iteration. If `refit` is an integer, the Forecaster will be trained every that number of iterations.	`False`
`return_best`	`bool`	Refit the `forecaster` using the best found parameters on the whole data.	`True`
`n_jobs`	`int, auto`	The number of jobs to run in parallel. If `-1`, then the number of jobs is set to the number of cores. If 'auto', `n_jobs` is set using the function skforecast.utils.select_n_jobs_backtesting. New in version 0.9.0	`'auto'`
`verbose`	`bool`	Print number of folds used for cv or backtesting.	`True`
`show_progress`	`bool`	Whether to show a progress bar.	`True`

Returns:

Name Type Description

results

pandas DataFrame

Results for each combination of parameters.

column params: parameters configuration for each iteration.
column metric: metric value estimated for each iteration.
additional n columns with param = value.

Source code in skforecast\model_selection_sarimax\model_selection_sarimax.py

def grid_search_sarimax(
    forecaster,
    y: pd.Series,
    param_grid: dict,
    steps: int,
    metric: Union[str, Callable, list],
    initial_train_size: int,
    fixed_train_size: bool=True,
    gap: int=0,
    allow_incomplete_fold: bool=True,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
    refit: Optional[Union[bool, int]]=False,
    return_best: bool=True,
    n_jobs: Optional[Union[int, str]]='auto',
    verbose: bool=True,
    show_progress: bool=True
) -> pd.DataFrame:
    """
    Exhaustive search over specified parameter values for a ForecasterSarimax object.
    Validation is done using time series backtesting.

    Parameters
    ----------
    forecaster : ForecasterSarimax
        Forecaster model.
    y : pandas Series
        Training time series. 
    param_grid : dict
        Dictionary with parameters names (`str`) as keys and lists of parameter
        settings to try as values.
    steps : int
        Number of steps to predict.
    metric : str, Callable, list
        Metric used to quantify the goodness of fit of the model.

            - If `string`: {'mean_squared_error', 'mean_absolute_error',
             'mean_absolute_percentage_error', 'mean_squared_log_error'}
            - If `Callable`: Function with arguments y_true, y_pred that returns 
            a float.
            - If `list`: List containing multiple strings and/or Callables.
    initial_train_size : int 
        Number of samples in the initial train split. The backtest forecaster is
        trained using the first `initial_train_size` observations.
    fixed_train_size : bool, default `True`
        If True, train size doesn't increase but moves by `steps` in each iteration.
    gap : int, default `0`
        Number of samples to be excluded after the end of each training set and 
        before the test set.
    allow_incomplete_fold : bool, default `True`
        Last fold is allowed to have a smaller number of samples than the 
        `test_size`. If `False`, the last fold is excluded.
    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `y` and should be aligned so that y[i] is
        regressed on exog[i].
    refit : bool, int, default `False`
        Whether to re-fit the forecaster in each iteration. If `refit` is an integer, 
        the Forecaster will be trained every that number of iterations.
    return_best : bool, default `True`
        Refit the `forecaster` using the best found parameters on the whole data.
    n_jobs : int, 'auto', default `'auto'`
        The number of jobs to run in parallel. If `-1`, then the number of jobs is 
        set to the number of cores. If 'auto', `n_jobs` is set using the function
        skforecast.utils.select_n_jobs_backtesting.
        **New in version 0.9.0**
    verbose : bool, default `True`
        Print number of folds used for cv or backtesting.
    show_progress: bool, default `True`
        Whether to show a progress bar.

    Returns
    -------
    results : pandas DataFrame
        Results for each combination of parameters.

            - column params: parameters configuration for each iteration.
            - column metric: metric value estimated for each iteration.
            - additional n columns with param = value.

    """

    param_grid = list(ParameterGrid(param_grid))

    results = _evaluate_grid_hyperparameters_sarimax(
        forecaster            = forecaster,
        y                     = y,
        param_grid            = param_grid,
        steps                 = steps,
        metric                = metric,
        initial_train_size    = initial_train_size,
        fixed_train_size      = fixed_train_size,
        gap                   = gap,
        allow_incomplete_fold = allow_incomplete_fold,
        exog                  = exog,
        refit                 = refit,
        return_best           = return_best,
        n_jobs                = n_jobs,
        verbose               = verbose,
        show_progress         = show_progress
    )

    return results

`random_search_sarimax(forecaster, y, param_distributions, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, exog=None, refit=False, n_iter=10, random_state=123, return_best=True, n_jobs='auto', verbose=True, show_progress=True)` ¶

Random search over specified parameter values or distributions for a Forecaster object. Validation is done using time series backtesting.

Parameters:

Name	Type	Description	Default
`forecaster`	`ForecasterSarimax`	Forecaster model.	required
`y`	`pandas Series`	Training time series.	required
`param_distributions`	`dict`	Dictionary with parameters names (`str`) as keys and distributions or lists of parameters to try.	required
`steps`	`int`	Number of steps to predict.	required
`metric`	`str, Callable, list`	Metric used to quantify the goodness of fit of the model. If `string`: {'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'mean_squared_log_error'} If `Callable`: Function with arguments y_true, y_pred that returns a float. If `list`: List containing multiple strings and/or Callables.	required
`initial_train_size`	`int`	Number of samples in the initial train split. The backtest forecaster is trained using the first `initial_train_size` observations.	required
`fixed_train_size`	`bool`	If True, train size doesn't increase but moves by `steps` in each iteration.	`True`
`gap`	`int`	Number of samples to be excluded after the end of each training set and before the test set.	`0`
`allow_incomplete_fold`	`bool`	Last fold is allowed to have a smaller number of samples than the `test_size`. If `False`, the last fold is excluded.	`True`
`exog`	`pandas Series, pandas DataFrame`	Exogenous variable/s included as predictor/s. Must have the same number of observations as `y` and should be aligned so that y[i] is regressed on exog[i].	`None`
`refit`	`bool, int`	Whether to re-fit the forecaster in each iteration. If `refit` is an integer, the Forecaster will be trained every that number of iterations.	`False`
`n_iter`	`int`	Number of parameter settings that are sampled. n_iter trades off runtime vs quality of the solution.	`10`
`random_state`	`int`	Sets a seed to the random sampling for reproducible output.	`123`
`return_best`	`bool`	Refit the `forecaster` using the best found parameters on the whole data.	`True`
`n_jobs`	`int, auto`	The number of jobs to run in parallel. If `-1`, then the number of jobs is set to the number of cores. If 'auto', `n_jobs` is set using the function skforecast.utils.select_n_jobs_backtesting. New in version 0.9.0	`'auto'`
`verbose`	`bool`	Print number of folds used for cv or backtesting.	`True`
`show_progress`	`bool`	Whether to show a progress bar.	`True`

Returns:

Name Type Description

results

pandas DataFrame

Results for each combination of parameters.

column params: parameters configuration for each iteration.
column metric: metric value estimated for each iteration.
additional n columns with param = value.

Source code in skforecast\model_selection_sarimax\model_selection_sarimax.py

def random_search_sarimax(
    forecaster,
    y: pd.Series,
    param_distributions: dict,
    steps: int,
    metric: Union[str, Callable, list],
    initial_train_size: int,
    fixed_train_size: bool=True,
    gap: int=0,
    allow_incomplete_fold: bool=True,
    exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
    refit: Optional[Union[bool, int]]=False,
    n_iter: int=10,
    random_state: int=123,
    return_best: bool=True,
    n_jobs: Optional[Union[int, str]]='auto',
    verbose: bool=True,
    show_progress: bool=True
) -> pd.DataFrame:
    """
    Random search over specified parameter values or distributions for a Forecaster 
    object. Validation is done using time series backtesting.

    Parameters
    ----------
    forecaster : ForecasterSarimax
        Forecaster model.
    y : pandas Series
        Training time series. 
    param_distributions : dict
        Dictionary with parameters names (`str`) as keys and 
        distributions or lists of parameters to try.
    steps : int
        Number of steps to predict.
    metric : str, Callable, list
        Metric used to quantify the goodness of fit of the model.

            - If `string`: {'mean_squared_error', 'mean_absolute_error',
             'mean_absolute_percentage_error', 'mean_squared_log_error'}
            - If `Callable`: Function with arguments y_true, y_pred that returns 
            a float.
            - If `list`: List containing multiple strings and/or Callables.
    initial_train_size : int 
        Number of samples in the initial train split. The backtest forecaster is
        trained using the first `initial_train_size` observations.
    fixed_train_size : bool, default `True`
        If True, train size doesn't increase but moves by `steps` in each iteration.
    gap : int, default `0`
        Number of samples to be excluded after the end of each training set and 
        before the test set.
    allow_incomplete_fold : bool, default `True`
        Last fold is allowed to have a smaller number of samples than the 
        `test_size`. If `False`, the last fold is excluded.
    exog : pandas Series, pandas DataFrame, default `None`
        Exogenous variable/s included as predictor/s. Must have the same
        number of observations as `y` and should be aligned so that y[i] is
        regressed on exog[i].
    refit : bool, int, default `False`
        Whether to re-fit the forecaster in each iteration. If `refit` is an integer, 
        the Forecaster will be trained every that number of iterations.
    n_iter : int, default `10`
        Number of parameter settings that are sampled. 
        n_iter trades off runtime vs quality of the solution.
    random_state : int, default `123`
        Sets a seed to the random sampling for reproducible output.
    return_best : bool, default `True`
        Refit the `forecaster` using the best found parameters on the whole data.
    n_jobs : int, 'auto', default `'auto'`
        The number of jobs to run in parallel. If `-1`, then the number of jobs is 
        set to the number of cores. If 'auto', `n_jobs` is set using the function
        skforecast.utils.select_n_jobs_backtesting.
        **New in version 0.9.0**
    verbose : bool, default `True`
        Print number of folds used for cv or backtesting.
    show_progress: bool, default `True`
        Whether to show a progress bar.

    Returns
    -------
    results : pandas DataFrame
        Results for each combination of parameters.

            - column params: parameters configuration for each iteration.
            - column metric: metric value estimated for each iteration.
            - additional n columns with param = value.

    """

    param_grid = list(ParameterSampler(param_distributions, n_iter=n_iter, random_state=random_state))

    results = _evaluate_grid_hyperparameters_sarimax(
        forecaster            = forecaster,
        y                     = y,
        param_grid            = param_grid,
        steps                 = steps,
        metric                = metric,
        initial_train_size    = initial_train_size,
        fixed_train_size      = fixed_train_size,
        gap                   = gap,
        allow_incomplete_fold = allow_incomplete_fold,
        exog                  = exog,
        refit                 = refit,
        return_best           = return_best,
        n_jobs                = n_jobs,
        verbose               = verbose,
        show_progress         = show_progress
    )

    return results

model_selection_sarimax¶

backtesting_sarimax(forecaster, y, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, exog=None, refit=False, alpha=None, interval=None, n_jobs='auto', verbose=False, show_progress=True) ¶

grid_search_sarimax(forecaster, y, param_grid, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, exog=None, refit=False, return_best=True, n_jobs='auto', verbose=True, show_progress=True) ¶

random_search_sarimax(forecaster, y, param_distributions, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, exog=None, refit=False, n_iter=10, random_state=123, return_best=True, n_jobs='auto', verbose=True, show_progress=True) ¶

`model_selection_sarimax`¶

`backtesting_sarimax(forecaster, y, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, exog=None, refit=False, alpha=None, interval=None, n_jobs='auto', verbose=False, show_progress=True)` ¶

`grid_search_sarimax(forecaster, y, param_grid, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, exog=None, refit=False, return_best=True, n_jobs='auto', verbose=True, show_progress=True)` ¶

`random_search_sarimax(forecaster, y, param_distributions, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, exog=None, refit=False, n_iter=10, random_state=123, return_best=True, n_jobs='auto', verbose=True, show_progress=True)` ¶