model_selection_multiseries
¶
backtesting_forecaster_multiseries(forecaster, series, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, levels=None, exog=None, refit=False, interval=None, n_boot=500, random_state=123, in_sample_residuals=True, verbose=False, show_progress=True)
¶
Backtesting for multi-series and multivariate forecasters.
If refit
is False, the model is trained only once using the initial_train_size
first observations. If refit
is True, the model is trained in each iteration
increasing the training set. A copy of the original forecaster is created so
it is not modified during the process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
forecaster |
ForecasterAutoregMultiSeries, ForecasterAutoregMultiSeriesCustom, ForecasterAutoregMultiVariate |
Forecaster model. |
required |
series |
DataFrame |
Training time series. |
required |
steps |
int |
Number of steps to predict. |
required |
metric |
Union[str, Callable, list] |
Metric used to quantify the goodness of fit of the model. If string: {'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'mean_squared_log_error'} If Callable: Function with arguments y_true, y_pred that returns a float. If list: List containing multiple strings and/or Callables. |
required |
initial_train_size |
Optional[int] |
Number of samples in the initial train split. If
|
required |
fixed_train_size |
bool |
If True, train size doesn't increase but moves by |
True |
gap |
int |
Number of samples to be excluded after the end of each training set and before the test set. |
0 |
allow_incomplete_fold |
bool |
Last fold is allowed to have a smaller number of samples than the
|
True |
levels |
Union[str, list] |
Time series to be predicted. If |
None |
exog |
Union[pandas.core.series.Series, pandas.core.frame.DataFrame] |
Exogenous variable/s included as predictor/s. Must have the same
number of observations as |
None |
refit |
bool |
Whether to re-fit the forecaster in each iteration. |
False |
interval |
Optional[list] |
Confidence of the prediction interval estimated. Sequence of percentiles
to compute, which must be between 0 and 100 inclusive. If |
None |
n_boot |
int |
Number of bootstrapping iterations used to estimate prediction intervals. |
500 |
random_state |
int |
Sets a seed to the random generator, so that boot intervals are always deterministic. |
123 |
in_sample_residuals |
bool |
If |
True |
verbose |
bool |
Print number of folds and index of training and validation sets used for backtesting. |
False |
show_progress |
bool |
Whether to show a progress bar. Defaults to True. |
True |
Returns:
Type | Description |
---|---|
Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame] |
Value(s) of the metric(s). Index are the levels and columns the metrics. |
Source code in skforecast/model_selection_multiseries/model_selection_multiseries.py
def backtesting_forecaster_multiseries(
forecaster,
series: pd.DataFrame,
steps: int,
metric: Union[str, Callable, list],
initial_train_size: Optional[int],
fixed_train_size: bool=True,
gap: int=0,
allow_incomplete_fold: bool=True,
levels: Optional[Union[str, list]]=None,
exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
refit: bool=False,
interval: Optional[list]=None,
n_boot: int=500,
random_state: int=123,
in_sample_residuals: bool=True,
verbose: bool=False,
show_progress: bool=True
) -> Tuple[pd.DataFrame, pd.DataFrame]:
"""
Backtesting for multi-series and multivariate forecasters.
If `refit` is False, the model is trained only once using the `initial_train_size`
first observations. If `refit` is True, the model is trained in each iteration
increasing the training set. A copy of the original forecaster is created so
it is not modified during the process.
Parameters
----------
forecaster : ForecasterAutoregMultiSeries, ForecasterAutoregMultiSeriesCustom, ForecasterAutoregMultiVariate
Forecaster model.
series : pandas DataFrame
Training time series.
steps : int
Number of steps to predict.
metric : str, Callable, list
Metric used to quantify the goodness of fit of the model.
If string:
{'mean_squared_error', 'mean_absolute_error',
'mean_absolute_percentage_error', 'mean_squared_log_error'}
If Callable:
Function with arguments y_true, y_pred that returns a float.
If list:
List containing multiple strings and/or Callables.
initial_train_size : int, default `None`
Number of samples in the initial train split. If `None` and `forecaster` is already
trained, no initial train is done and all data is used to evaluate the model. However,
the first `len(forecaster.last_window)` observations are needed to create the
initial predictors, so no predictions are calculated for them.
`None` is only allowed when `refit` is `False`.
fixed_train_size : bool, default `True`
If True, train size doesn't increase but moves by `steps` in each iteration.
gap : int, default `0`
Number of samples to be excluded after the end of each training set and
before the test set.
allow_incomplete_fold : bool, default `True`
Last fold is allowed to have a smaller number of samples than the
`test_size`. If `False`, the last fold is excluded.
levels : str, list, default `None`
Time series to be predicted. If `None` all levels will be predicted.
**New in version 0.6.0**
exog : pandas Series, pandas DataFrame, default `None`
Exogenous variable/s included as predictor/s. Must have the same
number of observations as `y` and should be aligned so that y[i] is
regressed on exog[i].
refit : bool, default `False`
Whether to re-fit the forecaster in each iteration.
interval : list, default `None`
Confidence of the prediction interval estimated. Sequence of percentiles
to compute, which must be between 0 and 100 inclusive. If `None`, no
intervals are estimated. Only available for forecaster of type ForecasterAutoreg
and ForecasterAutoregCustom.
n_boot : int, default `500`
Number of bootstrapping iterations used to estimate prediction
intervals.
random_state : int, default `123`
Sets a seed to the random generator, so that boot intervals are always
deterministic.
in_sample_residuals : bool, default `True`
If `True`, residuals from the training data are used as proxy of
prediction error to create prediction intervals. If `False`, out_sample_residuals
are used if they are already stored inside the forecaster.
verbose : bool, default `False`
Print number of folds and index of training and validation sets used
for backtesting.
show_progress: bool, default `True`
Whether to show a progress bar. Defaults to True.
Returns
-------
metrics_levels : pandas DataFrame
Value(s) of the metric(s). Index are the levels and columns the metrics.
backtest_predictions : pandas DataFrame
Value of predictions and their estimated interval if `interval` is not `None`.
If there is more than one level, this structure will be repeated for each of them.
column pred = predictions.
column lower_bound = lower bound of the interval.
column upper_bound = upper bound interval of the interval.
"""
if type(forecaster).__name__ not in ['ForecasterAutoregMultiSeries',
'ForecasterAutoregMultiSeriesCustom',
'ForecasterAutoregMultiVariate']:
raise TypeError(
("`forecaster` must be of type `ForecasterAutoregMultiSeries`, "
"`ForecasterAutoregMultiSeriesCustom` or `ForecasterAutoregMultiVariate`, "
"for all other types of forecasters use the functions available in "
f"the `model_selection` module. Got {type(forecaster).__name__}")
)
check_backtesting_input(
forecaster = forecaster,
steps = steps,
metric = metric,
series = series,
initial_train_size = initial_train_size,
fixed_train_size = fixed_train_size,
gap = gap,
allow_incomplete_fold = allow_incomplete_fold,
refit = refit,
interval = interval,
n_boot = n_boot,
random_state = random_state,
in_sample_residuals = in_sample_residuals,
verbose = verbose,
show_progress = show_progress
)
if type(forecaster).__name__ in ['ForecasterAutoregMultiSeries',
'ForecasterAutoregMultiSeriesCustom'] \
and levels is not None and not isinstance(levels, (str, list)):
raise TypeError(
("`levels` must be a `list` of column names, a `str` of a column name "
"or `None` when using a `ForecasterAutoregMultiSeries` or "
"`ForecasterAutoregMultiSeriesCustom`. If the forecaster is of type "
"`ForecasterAutoregMultiVariate`, this argument is ignored.")
)
if type(forecaster).__name__ == 'ForecasterAutoregMultiVariate' \
and levels and levels != forecaster.level and levels != [forecaster.level]:
warnings.warn(
(f"`levels` argument have no use when the forecaster is of type "
f"`ForecasterAutoregMultiVariate`. The level of this forecaster is "
f"{forecaster.level}, to predict another level, change the `level` "
f"argument when initializing the forecaster."),
IgnoredArgumentWarning
)
if refit:
metrics_levels, backtest_predictions = _backtesting_forecaster_multiseries_refit(
forecaster = forecaster,
series = series,
steps = steps,
levels = levels,
metric = metric,
initial_train_size = initial_train_size,
fixed_train_size = fixed_train_size,
gap = gap,
allow_incomplete_fold = allow_incomplete_fold,
exog = exog,
interval = interval,
n_boot = n_boot,
random_state = random_state,
in_sample_residuals = in_sample_residuals,
verbose = verbose,
show_progress = show_progress
)
else:
metrics_levels, backtest_predictions = _backtesting_forecaster_multiseries_no_refit(
forecaster = forecaster,
series = series,
steps = steps,
levels = levels,
metric = metric,
initial_train_size = initial_train_size,
gap = gap,
allow_incomplete_fold = allow_incomplete_fold,
exog = exog,
interval = interval,
n_boot = n_boot,
random_state = random_state,
in_sample_residuals = in_sample_residuals,
verbose = verbose,
show_progress = show_progress
)
return metrics_levels, backtest_predictions
grid_search_forecaster_multiseries(forecaster, series, param_grid, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, levels=None, exog=None, lags_grid=None, refit=False, return_best=True, verbose=True)
¶
Exhaustive search over specified parameter values for a Forecaster object.
Validation is done using multi-series backtesting.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
forecaster |
ForecasterAutoregMultiSeries, ForecasterAutoregMultiSeriesCustom, ForecasterAutoregMultiVariate |
Forcaster model. |
required |
series |
DataFrame |
Training time series. |
required |
param_grid |
dict |
Dictionary with parameters names ( |
required |
steps |
int |
Number of steps to predict. |
required |
metric |
Union[str, Callable, list] |
Metric used to quantify the goodness of fit of the model. If string: {'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'mean_squared_log_error'} If Callable: Function with arguments y_true, y_pred that returns a float. If list: List containing multiple strings and/or Callables. |
required |
initial_train_size |
int |
Number of samples in the initial train split. |
required |
fixed_train_size |
bool |
If True, train size doesn't increase but moves by |
True |
gap |
int |
Number of samples to be excluded after the end of each training set and before the test set. |
0 |
allow_incomplete_fold |
bool |
Last fold is allowed to have a smaller number of samples than the
|
True |
levels |
Union[str, list] |
level ( |
None |
exog |
Union[pandas.core.series.Series, pandas.core.frame.DataFrame] |
Exogenous variable/s included as predictor/s. Must have the same
number of observations as |
None |
lags_grid |
Optional[list] |
Lists of |
None |
refit |
bool |
Whether to re-fit the forecaster in each iteration of backtesting. |
False |
return_best |
bool |
Refit the |
True |
verbose |
bool |
Print number of folds used for cv or backtesting. |
True |
Returns:
Type | Description |
---|---|
DataFrame |
Results for each combination of parameters. column levels = levels. column lags = predictions. column params = lower bound of the interval. column metric = metric(s) value(s) estimated for each combination of parameters. The resulting metric will be the average of the optimization of all levels. additional n columns with param = value. |
Source code in skforecast/model_selection_multiseries/model_selection_multiseries.py
def grid_search_forecaster_multiseries(
forecaster,
series: pd.DataFrame,
param_grid: dict,
steps: int,
metric: Union[str, Callable, list],
initial_train_size: int,
fixed_train_size: bool=True,
gap: int=0,
allow_incomplete_fold: bool=True,
levels: Optional[Union[str, list]]=None,
exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
lags_grid: Optional[list]=None,
refit: bool=False,
return_best: bool=True,
verbose: bool=True
) -> pd.DataFrame:
"""
Exhaustive search over specified parameter values for a Forecaster object.
Validation is done using multi-series backtesting.
Parameters
----------
forecaster : ForecasterAutoregMultiSeries, ForecasterAutoregMultiSeriesCustom, ForecasterAutoregMultiVariate
Forcaster model.
series : pandas DataFrame
Training time series.
param_grid : dict
Dictionary with parameters names (`str`) as keys and lists of parameter
settings to try as values.
steps : int
Number of steps to predict.
metric : str, Callable, list
Metric used to quantify the goodness of fit of the model.
If string:
{'mean_squared_error', 'mean_absolute_error',
'mean_absolute_percentage_error', 'mean_squared_log_error'}
If Callable:
Function with arguments y_true, y_pred that returns a float.
If list:
List containing multiple strings and/or Callables.
initial_train_size : int
Number of samples in the initial train split.
fixed_train_size : bool, default `True`
If True, train size doesn't increase but moves by `steps` in each iteration.
gap : int, default `0`
Number of samples to be excluded after the end of each training set and
before the test set.
allow_incomplete_fold : bool, default `True`
Last fold is allowed to have a smaller number of samples than the
`test_size`. If `False`, the last fold is excluded.
levels : str, list, default `None`
level (`str`) or levels (`list`) at which the forecaster is optimized.
If `None`, all levels are taken into account. The resulting metric will be
the average of the optimization of all levels.
exog : pandas Series, pandas DataFrame, default `None`
Exogenous variable/s included as predictor/s. Must have the same
number of observations as `y` and should be aligned so that y[i] is
regressed on exog[i].
lags_grid : list of int, lists, np.narray or range, default `None`
Lists of `lags` to try. Only used if forecaster is an instance of
`ForecasterAutoregMultiSeries` or `ForecasterAutoregMultiVariate`.
refit : bool, default `False`
Whether to re-fit the forecaster in each iteration of backtesting.
return_best : bool, default `True`
Refit the `forecaster` using the best found parameters on the whole data.
verbose : bool, default `True`
Print number of folds used for cv or backtesting.
Returns
-------
results : pandas DataFrame
Results for each combination of parameters.
column levels = levels.
column lags = predictions.
column params = lower bound of the interval.
column metric = metric(s) value(s) estimated for each combination of
parameters. The resulting metric will be the average
of the optimization of all levels.
additional n columns with param = value.
"""
param_grid = list(ParameterGrid(param_grid))
results = _evaluate_grid_hyperparameters_multiseries(
forecaster = forecaster,
series = series,
param_grid = param_grid,
steps = steps,
metric = metric,
initial_train_size = initial_train_size,
fixed_train_size = fixed_train_size,
gap = gap,
allow_incomplete_fold = allow_incomplete_fold,
levels = levels,
exog = exog,
lags_grid = lags_grid,
refit = refit,
return_best = return_best,
verbose = verbose
)
return results
random_search_forecaster_multiseries(forecaster, series, param_distributions, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, levels=None, exog=None, lags_grid=None, refit=False, n_iter=10, random_state=123, return_best=True, verbose=True)
¶
Random search over specified parameter values or distributions for a Forecaster
object. Validation is done using multi-series backtesting.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
forecaster |
ForecasterAutoregMultiSeries, ForecasterAutoregMultiSeriesCustom, ForecasterAutoregMultiVariate |
Forcaster model. |
required |
series |
DataFrame |
Training time series. |
required |
param_distributions |
dict |
Dictionary with parameters names ( |
required |
steps |
int |
Number of steps to predict. |
required |
metric |
Union[str, Callable, list] |
Metric used to quantify the goodness of fit of the model. If string: {'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'mean_squared_log_error'} If Callable: Function with arguments y_true, y_pred that returns a float. If list: List containing multiple strings and/or Callables. |
required |
initial_train_size |
int |
Number of samples in the initial train split. |
required |
fixed_train_size |
bool |
If True, train size doesn't increase but moves by |
True |
gap |
int |
Number of samples to be excluded after the end of each training set and before the test set. |
0 |
allow_incomplete_fold |
bool |
Last fold is allowed to have a smaller number of samples than the
|
True |
levels |
Union[str, list] |
level ( |
None |
exog |
Union[pandas.core.series.Series, pandas.core.frame.DataFrame] |
Exogenous variable/s included as predictor/s. Must have the same
number of observations as |
None |
lags_grid |
Optional[list] |
Lists of |
None |
refit |
bool |
Whether to re-fit the forecaster in each iteration of backtesting. |
False |
n_iter |
int |
Number of parameter settings that are sampled per lags configuration. n_iter trades off runtime vs quality of the solution. |
10 |
random_state |
int |
Sets a seed to the random sampling for reproducible output. |
123 |
return_best |
bool |
Refit the |
True |
verbose |
bool |
Print number of folds used for cv or backtesting. |
True |
Returns:
Type | Description |
---|---|
DataFrame |
Results for each combination of parameters. column levels = levels. column lags = predictions. column params = lower bound of the interval. column metric = metric(s) value(s) estimated for each combination of parameters. The resulting metric will be the average of the optimization of all levels. additional n columns with param = value. |
Source code in skforecast/model_selection_multiseries/model_selection_multiseries.py
def random_search_forecaster_multiseries(
forecaster,
series: pd.DataFrame,
param_distributions: dict,
steps: int,
metric: Union[str, Callable, list],
initial_train_size: int,
fixed_train_size: bool=True,
gap: int=0,
allow_incomplete_fold: bool=True,
levels: Optional[Union[str, list]]=None,
exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
lags_grid: Optional[list]=None,
refit: bool=False,
n_iter: int=10,
random_state: int=123,
return_best: bool=True,
verbose: bool=True
) -> pd.DataFrame:
"""
Random search over specified parameter values or distributions for a Forecaster
object. Validation is done using multi-series backtesting.
Parameters
----------
forecaster : ForecasterAutoregMultiSeries, ForecasterAutoregMultiSeriesCustom, ForecasterAutoregMultiVariate
Forcaster model.
series : pandas DataFrame
Training time series.
param_distributions : dict
Dictionary with parameters names (`str`) as keys and
distributions or lists of parameters to try.
steps : int
Number of steps to predict.
metric : str, Callable, list
Metric used to quantify the goodness of fit of the model.
If string:
{'mean_squared_error', 'mean_absolute_error',
'mean_absolute_percentage_error', 'mean_squared_log_error'}
If Callable:
Function with arguments y_true, y_pred that returns a float.
If list:
List containing multiple strings and/or Callables.
initial_train_size : int
Number of samples in the initial train split.
fixed_train_size : bool, default `True`
If True, train size doesn't increase but moves by `steps` in each iteration.
gap : int, default `0`
Number of samples to be excluded after the end of each training set and
before the test set.
allow_incomplete_fold : bool, default `True`
Last fold is allowed to have a smaller number of samples than the
`test_size`. If `False`, the last fold is excluded.
levels : str, list, default `None`
level (`str`) or levels (`list`) at which the forecaster is optimized.
If `None`, all levels are taken into account. The resulting metric will be
the average of the optimization of all levels.
exog : pandas Series, pandas DataFrame, default `None`
Exogenous variable/s included as predictor/s. Must have the same
number of observations as `y` and should be aligned so that y[i] is
regressed on exog[i].
lags_grid : list of int, lists, np.narray or range, default `None`
Lists of `lags` to try. Only used if forecaster is an instance of
`ForecasterAutoregMultiSeries` or `ForecasterAutoregMultiVariate`.
refit : bool, default `False`
Whether to re-fit the forecaster in each iteration of backtesting.
n_iter : int, default `10`
Number of parameter settings that are sampled per lags configuration.
n_iter trades off runtime vs quality of the solution.
random_state : int, default `123`
Sets a seed to the random sampling for reproducible output.
return_best : bool, default `True`
Refit the `forecaster` using the best found parameters on the whole data.
verbose : bool, default `True`
Print number of folds used for cv or backtesting.
Returns
-------
results : pandas DataFrame
Results for each combination of parameters.
column levels = levels.
column lags = predictions.
column params = lower bound of the interval.
column metric = metric(s) value(s) estimated for each combination of
parameters. The resulting metric will be the average
of the optimization of all levels.
additional n columns with param = value.
"""
param_grid = list(ParameterSampler(param_distributions, n_iter=n_iter,
random_state=random_state))
results = _evaluate_grid_hyperparameters_multiseries(
forecaster = forecaster,
series = series,
param_grid = param_grid,
steps = steps,
metric = metric,
initial_train_size = initial_train_size,
fixed_train_size = fixed_train_size,
gap = gap,
allow_incomplete_fold = allow_incomplete_fold,
levels = levels,
exog = exog,
lags_grid = lags_grid,
refit = refit,
return_best = return_best,
verbose = verbose
)
return results
backtesting_forecaster_multivariate(forecaster, series, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, levels=None, exog=None, refit=False, interval=None, n_boot=500, random_state=123, in_sample_residuals=True, verbose=False, show_progress=True)
¶
This function is an alias of backtesting_forecaster_multiseries.
Backtesting for multi-series and multivariate forecasters.
If refit
is False, the model is trained only once using the initial_train_size
first observations. If refit
is True, the model is trained in each iteration
increasing the training set. A copy of the original forecaster is created so
it is not modified during the process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
forecaster |
ForecasterAutoregMultiSeries, ForecasterAutoregMultiSeriesCustom, ForecasterAutoregMultiVariate |
Forecaster model. |
required |
series |
DataFrame |
Training time series. |
required |
steps |
int |
Number of steps to predict. |
required |
metric |
Union[str, Callable, list] |
Metric used to quantify the goodness of fit of the model. If string: {'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'mean_squared_log_error'} If Callable: Function with arguments y_true, y_pred that returns a float. If list: List containing multiple strings and/or Callables. |
required |
initial_train_size |
Optional[int] |
Number of samples in the initial train split. If
|
required |
fixed_train_size |
bool |
If True, train size doesn't increase but moves by |
True |
gap |
int |
Number of samples to be excluded after the end of each training set and before the test set. |
0 |
allow_incomplete_fold |
bool |
Last fold is allowed to have a smaller number of samples than the
|
True |
levels |
Union[str, list] |
Time series to be predicted. If |
None |
exog |
Union[pandas.core.series.Series, pandas.core.frame.DataFrame] |
Exogenous variable/s included as predictor/s. Must have the same
number of observations as |
None |
refit |
bool |
Whether to re-fit the forecaster in each iteration. |
False |
interval |
Optional[list] |
Confidence of the prediction interval estimated. Sequence of percentiles
to compute, which must be between 0 and 100 inclusive. If |
None |
n_boot |
int |
Number of bootstrapping iterations used to estimate prediction intervals. |
500 |
random_state |
int |
Sets a seed to the random generator, so that boot intervals are always deterministic. |
123 |
in_sample_residuals |
bool |
If |
True |
verbose |
bool |
Print number of folds and index of training and validation sets used for backtesting. |
False |
show_progress |
bool |
Whether to show a progress bar. Defaults to True. |
True |
Returns:
Type | Description |
---|---|
Tuple[pandas.core.frame.DataFrame, pandas.core.frame.DataFrame] |
Value(s) of the metric(s). Index are the levels and columns the metrics. |
Source code in skforecast/model_selection_multiseries/model_selection_multiseries.py
def backtesting_forecaster_multivariate(
forecaster,
series: pd.DataFrame,
steps: int,
metric: Union[str, Callable, list],
initial_train_size: Optional[int],
fixed_train_size: bool=True,
gap: int=0,
allow_incomplete_fold: bool=True,
levels: Optional[Union[str, list]]=None,
exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
refit: bool=False,
interval: Optional[list]=None,
n_boot: int=500,
random_state: int=123,
in_sample_residuals: bool=True,
verbose: bool=False,
show_progress: bool=True
) -> Tuple[pd.DataFrame, pd.DataFrame]:
"""
This function is an alias of backtesting_forecaster_multiseries.
Backtesting for multi-series and multivariate forecasters.
If `refit` is False, the model is trained only once using the `initial_train_size`
first observations. If `refit` is True, the model is trained in each iteration
increasing the training set. A copy of the original forecaster is created so
it is not modified during the process.
Parameters
----------
forecaster : ForecasterAutoregMultiSeries, ForecasterAutoregMultiSeriesCustom, ForecasterAutoregMultiVariate
Forecaster model.
series : pandas DataFrame
Training time series.
steps : int
Number of steps to predict.
metric : str, Callable, list
Metric used to quantify the goodness of fit of the model.
If string:
{'mean_squared_error', 'mean_absolute_error',
'mean_absolute_percentage_error', 'mean_squared_log_error'}
If Callable:
Function with arguments y_true, y_pred that returns a float.
If list:
List containing multiple strings and/or Callables.
initial_train_size : int, default `None`
Number of samples in the initial train split. If `None` and `forecaster` is already
trained, no initial train is done and all data is used to evaluate the model. However,
the first `len(forecaster.last_window)` observations are needed to create the
initial predictors, so no predictions are calculated for them.
`None` is only allowed when `refit` is `False`.
fixed_train_size : bool, default `True`
If True, train size doesn't increase but moves by `steps` in each iteration.
gap : int, default `0`
Number of samples to be excluded after the end of each training set and
before the test set.
allow_incomplete_fold : bool, default `True`
Last fold is allowed to have a smaller number of samples than the
`test_size`. If `False`, the last fold is excluded.
levels : str, list, default `None`
Time series to be predicted. If `None` all levels will be predicted.
**New in version 0.6.0**
exog : pandas Series, pandas DataFrame, default `None`
Exogenous variable/s included as predictor/s. Must have the same
number of observations as `y` and should be aligned so that y[i] is
regressed on exog[i].
refit : bool, default `False`
Whether to re-fit the forecaster in each iteration.
interval : list, default `None`
Confidence of the prediction interval estimated. Sequence of percentiles
to compute, which must be between 0 and 100 inclusive. If `None`, no
intervals are estimated. Only available for forecaster of type ForecasterAutoreg
and ForecasterAutoregCustom.
n_boot : int, default `500`
Number of bootstrapping iterations used to estimate prediction
intervals.
random_state : int, default `123`
Sets a seed to the random generator, so that boot intervals are always
deterministic.
in_sample_residuals : bool, default `True`
If `True`, residuals from the training data are used as proxy of
prediction error to create prediction intervals. If `False`, out_sample_residuals
are used if they are already stored inside the forecaster.
verbose : bool, default `False`
Print number of folds and index of training and validation sets used for backtesting.
show_progress: bool, default `True`
Whether to show a progress bar. Defaults to True.
Returns
-------
metrics_levels : pandas DataFrame
Value(s) of the metric(s). Index are the levels and columns the metrics.
backtest_predictions : pandas DataFrame
Value of predictions and their estimated interval if `interval` is not `None`.
If there is more than one level, this structure will be repeated for each of them.
column pred = predictions.
column lower_bound = lower bound of the interval.
column upper_bound = upper bound interval of the interval.
"""
metrics_levels, backtest_predictions = backtesting_forecaster_multiseries(
forecaster = forecaster,
series = series,
steps = steps,
metric = metric,
initial_train_size = initial_train_size,
fixed_train_size = fixed_train_size,
gap = gap,
allow_incomplete_fold = allow_incomplete_fold,
levels = levels,
exog = exog,
refit = refit,
interval = interval,
n_boot = n_boot,
random_state = random_state,
in_sample_residuals = in_sample_residuals,
verbose = verbose,
show_progress = show_progress
)
return metrics_levels, backtest_predictions
grid_search_forecaster_multivariate(forecaster, series, param_grid, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, levels=None, exog=None, lags_grid=None, refit=False, return_best=True, verbose=True)
¶
This function is an alias of grid_search_forecaster_multiseries.
Exhaustive search over specified parameter values for a Forecaster object. Validation is done using multi-series backtesting.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
forecaster |
ForecasterAutoregMultiSeries, ForecasterAutoregMultiSeriesCustom, ForecasterAutoregMultiVariate |
Forcaster model. |
required |
series |
DataFrame |
Training time series. |
required |
param_grid |
dict |
Dictionary with parameters names ( |
required |
steps |
int |
Number of steps to predict. |
required |
metric |
Union[str, Callable, list] |
Metric used to quantify the goodness of fit of the model. If string: {'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'mean_squared_log_error'} If Callable: Function with arguments y_true, y_pred that returns a float. If list: List containing multiple strings and/or Callables. |
required |
initial_train_size |
int |
Number of samples in the initial train split. |
required |
fixed_train_size |
bool |
If True, train size doesn't increase but moves by |
True |
gap |
int |
Number of samples to be excluded after the end of each training set and before the test set. |
0 |
allow_incomplete_fold |
bool |
Last fold is allowed to have a smaller number of samples than the
|
True |
levels |
Union[str, list] |
level ( |
None |
exog |
Union[pandas.core.series.Series, pandas.core.frame.DataFrame] |
Exogenous variable/s included as predictor/s. Must have the same
number of observations as |
None |
lags_grid |
Optional[list] |
Lists of |
None |
refit |
bool |
Whether to re-fit the forecaster in each iteration of backtesting. |
False |
return_best |
bool |
Refit the |
True |
verbose |
bool |
Print number of folds used for cv or backtesting. |
True |
Returns:
Type | Description |
---|---|
DataFrame |
Results for each combination of parameters. column levels = levels. column lags = predictions. column params = lower bound of the interval. column metric = metric(s) value(s) estimated for each combination of parameters. The resulting metric will be the average of the optimization of all levels. additional n columns with param = value. |
Source code in skforecast/model_selection_multiseries/model_selection_multiseries.py
def grid_search_forecaster_multivariate(
forecaster,
series: pd.DataFrame,
param_grid: dict,
steps: int,
metric: Union[str, Callable, list],
initial_train_size: int,
fixed_train_size: bool=True,
gap: int=0,
allow_incomplete_fold: bool=True,
levels: Optional[Union[str, list]]=None,
exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
lags_grid: Optional[list]=None,
refit: bool=False,
return_best: bool=True,
verbose: bool=True
) -> pd.DataFrame:
"""
This function is an alias of grid_search_forecaster_multiseries.
Exhaustive search over specified parameter values for a Forecaster object.
Validation is done using multi-series backtesting.
Parameters
----------
forecaster : ForecasterAutoregMultiSeries, ForecasterAutoregMultiSeriesCustom, ForecasterAutoregMultiVariate
Forcaster model.
series : pandas DataFrame
Training time series.
param_grid : dict
Dictionary with parameters names (`str`) as keys and lists of parameter
settings to try as values.
steps : int
Number of steps to predict.
metric : str, Callable, list
Metric used to quantify the goodness of fit of the model.
If string:
{'mean_squared_error', 'mean_absolute_error',
'mean_absolute_percentage_error', 'mean_squared_log_error'}
If Callable:
Function with arguments y_true, y_pred that returns a float.
If list:
List containing multiple strings and/or Callables.
initial_train_size : int
Number of samples in the initial train split.
fixed_train_size : bool, default `True`
If True, train size doesn't increase but moves by `steps` in each iteration.
gap : int, default `0`
Number of samples to be excluded after the end of each training set and
before the test set.
allow_incomplete_fold : bool, default `True`
Last fold is allowed to have a smaller number of samples than the
`test_size`. If `False`, the last fold is excluded.
levels : str, list, default `None`
level (`str`) or levels (`list`) at which the forecaster is optimized.
If `None`, all levels are taken into account. The resulting metric will be
the average of the optimization of all levels.
exog : pandas Series, pandas DataFrame, default `None`
Exogenous variable/s included as predictor/s. Must have the same
number of observations as `y` and should be aligned so that y[i] is
regressed on exog[i].
lags_grid : list of int, lists, np.narray or range, default `None`
Lists of `lags` to try. Only used if forecaster is an instance of
`ForecasterAutoregMultiSeries` or `ForecasterAutoregMultiVariate`.
refit : bool, default `False`
Whether to re-fit the forecaster in each iteration of backtesting.
return_best : bool, default `True`
Refit the `forecaster` using the best found parameters on the whole data.
verbose : bool, default `True`
Print number of folds used for cv or backtesting.
Returns
-------
results : pandas DataFrame
Results for each combination of parameters.
column levels = levels.
column lags = predictions.
column params = lower bound of the interval.
column metric = metric(s) value(s) estimated for each combination of
parameters. The resulting metric will be the average
of the optimization of all levels.
additional n columns with param = value.
"""
results = grid_search_forecaster_multiseries(
forecaster = forecaster,
series = series,
param_grid = param_grid,
steps = steps,
metric = metric,
initial_train_size = initial_train_size,
fixed_train_size = fixed_train_size,
gap = gap,
allow_incomplete_fold = allow_incomplete_fold,
levels = levels,
exog = exog,
lags_grid = lags_grid,
refit = refit,
return_best = return_best,
verbose = verbose
)
return results
random_search_forecaster_multivariate(forecaster, series, param_distributions, steps, metric, initial_train_size, fixed_train_size=True, gap=0, allow_incomplete_fold=True, levels=None, exog=None, lags_grid=None, refit=False, n_iter=10, random_state=123, return_best=True, verbose=True)
¶
This function is an alias of random_search_forecaster_multiseries.
Random search over specified parameter values or distributions for a Forecaster object. Validation is done using multi-series backtesting.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
forecaster |
ForecasterAutoregMultiSeries, ForecasterAutoregMultiSeriesCustom, ForecasterAutoregMultiVariate |
Forcaster model. |
required |
series |
DataFrame |
Training time series. |
required |
param_distributions |
dict |
Dictionary with parameters names ( |
required |
steps |
int |
Number of steps to predict. |
required |
metric |
Union[str, Callable, list] |
Metric used to quantify the goodness of fit of the model. If string: {'mean_squared_error', 'mean_absolute_error', 'mean_absolute_percentage_error', 'mean_squared_log_error'} If Callable: Function with arguments y_true, y_pred that returns a float. If list: List containing multiple strings and/or Callables. |
required |
initial_train_size |
int |
Number of samples in the initial train split. |
required |
fixed_train_size |
bool |
If True, train size doesn't increase but moves by |
True |
gap |
int |
Number of samples to be excluded after the end of each training set and before the test set. |
0 |
allow_incomplete_fold |
bool |
Last fold is allowed to have a smaller number of samples than the
|
True |
levels |
Union[str, list] |
level ( |
None |
exog |
Union[pandas.core.series.Series, pandas.core.frame.DataFrame] |
Exogenous variable/s included as predictor/s. Must have the same
number of observations as |
None |
lags_grid |
Optional[list] |
Lists of |
None |
refit |
bool |
Whether to re-fit the forecaster in each iteration of backtesting. |
False |
n_iter |
int |
Number of parameter settings that are sampled per lags configuration. n_iter trades off runtime vs quality of the solution. |
10 |
random_state |
int |
Sets a seed to the random sampling for reproducible output. |
123 |
return_best |
bool |
Refit the |
True |
verbose |
bool |
Print number of folds used for cv or backtesting. |
True |
Returns:
Type | Description |
---|---|
DataFrame |
Results for each combination of parameters. column levels = levels. column lags = predictions. column params = lower bound of the interval. column metric = metric(s) value(s) estimated for each combination of parameters. The resulting metric will be the average of the optimization of all levels. additional n columns with param = value. |
Source code in skforecast/model_selection_multiseries/model_selection_multiseries.py
def random_search_forecaster_multivariate(
forecaster,
series: pd.DataFrame,
param_distributions: dict,
steps: int,
metric: Union[str, Callable, list],
initial_train_size: int,
fixed_train_size: bool=True,
gap: int=0,
allow_incomplete_fold: bool=True,
levels: Optional[Union[str, list]]=None,
exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
lags_grid: Optional[list]=None,
refit: bool=False,
n_iter: int=10,
random_state: int=123,
return_best: bool=True,
verbose: bool=True
) -> pd.DataFrame:
"""
This function is an alias of random_search_forecaster_multiseries.
Random search over specified parameter values or distributions for a Forecaster
object. Validation is done using multi-series backtesting.
Parameters
----------
forecaster : ForecasterAutoregMultiSeries, ForecasterAutoregMultiSeriesCustom, ForecasterAutoregMultiVariate
Forcaster model.
series : pandas DataFrame
Training time series.
param_distributions : dict
Dictionary with parameters names (`str`) as keys and
distributions or lists of parameters to try.
steps : int
Number of steps to predict.
metric : str, Callable, list
Metric used to quantify the goodness of fit of the model.
If string:
{'mean_squared_error', 'mean_absolute_error',
'mean_absolute_percentage_error', 'mean_squared_log_error'}
If Callable:
Function with arguments y_true, y_pred that returns a float.
If list:
List containing multiple strings and/or Callables.
initial_train_size : int
Number of samples in the initial train split.
fixed_train_size : bool, default `True`
If True, train size doesn't increase but moves by `steps` in each iteration.
gap : int, default `0`
Number of samples to be excluded after the end of each training set and
before the test set.
allow_incomplete_fold : bool, default `True`
Last fold is allowed to have a smaller number of samples than the
`test_size`. If `False`, the last fold is excluded.
levels : str, list, default `None`
level (`str`) or levels (`list`) at which the forecaster is optimized.
If `None`, all levels are taken into account. The resulting metric will be
the average of the optimization of all levels.
exog : pandas Series, pandas DataFrame, default `None`
Exogenous variable/s included as predictor/s. Must have the same
number of observations as `y` and should be aligned so that y[i] is
regressed on exog[i].
lags_grid : list of int, lists, np.narray or range, default `None`
Lists of `lags` to try. Only used if forecaster is an instance of
`ForecasterAutoregMultiSeries` or `ForecasterAutoregMultiVariate`.
refit : bool, default `False`
Whether to re-fit the forecaster in each iteration of backtesting.
n_iter : int, default `10`
Number of parameter settings that are sampled per lags configuration.
n_iter trades off runtime vs quality of the solution.
random_state : int, default `123`
Sets a seed to the random sampling for reproducible output.
return_best : bool, default `True`
Refit the `forecaster` using the best found parameters on the whole data.
verbose : bool, default `True`
Print number of folds used for cv or backtesting.
Returns
-------
results : pandas DataFrame
Results for each combination of parameters.
column levels = levels.
column lags = predictions.
column params = lower bound of the interval.
column metric = metric(s) value(s) estimated for each combination of
parameters. The resulting metric will be the average
of the optimization of all levels.
additional n columns with param = value.
"""
results = random_search_forecaster_multiseries(
forecaster = forecaster,
series = series,
param_distributions = param_distributions,
steps = steps,
metric = metric,
initial_train_size = initial_train_size,
fixed_train_size = fixed_train_size,
gap = gap,
allow_incomplete_fold = allow_incomplete_fold,
levels = levels,
exog = exog,
lags_grid = lags_grid,
refit = refit,
n_iter = n_iter,
random_state = random_state,
return_best = return_best,
verbose = verbose
)
return results