ForecasterSarimax
¶
ForecasterSarimax
¶
This class turns ARIMA model from pmdarima library into a Forecaster
compatible with the skforecast API. New in version 0.7.0
Parameters:
Name | Type | Description | Default |
---|---|---|---|
regressor |
ARIMA |
An instance of an ARIMA from pmdarima library. This model internally wraps the statsmodels SARIMAX class. |
required |
transformer_y |
Optional[object] |
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API with methods: fit, transform, fit_transform and inverse_transform.
ColumnTransformers are not allowed since they do not have inverse_transform method.
The transformation is applied to |
None |
transformer_exog |
Optional[object] |
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API. The transformation is applied to |
None |
fit_kwargs |
Optional[dict] |
Additional arguments to be passed to the |
None |
forecaster_id |
Union[str, int] |
Name used as an identifier of the forecaster. |
None |
Attributes:
Name | Type | Description |
---|---|---|
regressor |
pmdarima.arima.ARIMA |
An instance of an ARIMA from pmdarima library. The model internally wraps the statsmodels SARIMAX class |
params |
dict |
Parameters of the sarimax model. |
transformer_y |
object transformer (preprocessor), default `None` |
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API with methods: fit, transform, fit_transform and inverse_transform.
ColumnTransformers are not allowed since they do not have inverse_transform method.
The transformation is applied to |
transformer_exog |
object transformer (preprocessor), default `None` |
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API. The transformation is applied to |
window_size |
int, `1` |
Not used, present here for API consistency by convention. |
last_window |
pandas Series |
Last window the forecaster has seen during training. It stores the
values needed to predict the next |
extended_index |
pandas Index |
When predicting using Check https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMAResults.append.html to know more about statsmodels append method. |
fitted |
Bool |
Tag to identify if the regressor has been fitted (trained). |
index_type |
type |
Type of index of the input used in training. |
index_freq |
str |
Frequency of Index of the input used in training. |
training_range |
pandas Index |
First and last values of index of the data used during training. |
included_exog |
bool |
If the forecaster has been trained using exogenous variable/s. |
exog_type |
type |
Type of exogenous variable/s used in training. |
exog_col_names |
list |
Names of columns of |
fit_kwargs |
dict |
Additional arguments to be passed to the |
creation_date |
str |
Date of creation. |
fit_date |
str |
Date of last fit. |
skforcast_version |
str |
Version of skforecast library used to create the forecaster. |
python_version |
str |
Version of python used to create the forecaster. |
forecaster_id |
str, int default `None` |
Name used as an identifier of the forecaster. |
Source code in skforecast/ForecasterSarimax/ForecasterSarimax.py
class ForecasterSarimax():
"""
This class turns ARIMA model from pmdarima library into a Forecaster
compatible with the skforecast API.
**New in version 0.7.0**
Parameters
----------
regressor : pmdarima.arima.ARIMA
An instance of an ARIMA from pmdarima library. This model internally wraps the
statsmodels SARIMAX class.
transformer_y : object transformer (preprocessor), default `None`
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API with methods: fit, transform, fit_transform and inverse_transform.
ColumnTransformers are not allowed since they do not have inverse_transform method.
The transformation is applied to `y` before training the forecaster.
transformer_exog : object transformer (preprocessor), default `None`
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API. The transformation is applied to `exog` before training the
forecaster. `inverse_transform` is not available when using ColumnTransformers.
fit_kwargs : dict, default `None`
Additional arguments to be passed to the `fit` method of the regressor.
**New in version 0.8.0**
forecaster_id : str, int default `None`
Name used as an identifier of the forecaster.
Attributes
----------
regressor : pmdarima.arima.ARIMA
An instance of an ARIMA from pmdarima library. The model internally wraps the
statsmodels SARIMAX class
params: dict
Parameters of the sarimax model.
transformer_y : object transformer (preprocessor), default `None`
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API with methods: fit, transform, fit_transform and inverse_transform.
ColumnTransformers are not allowed since they do not have inverse_transform method.
The transformation is applied to `y` before training the forecaster.
transformer_exog : object transformer (preprocessor), default `None`
An instance of a transformer (preprocessor) compatible with the scikit-learn
preprocessing API. The transformation is applied to `exog` before training the
forecaster. `inverse_transform` is not available when using ColumnTransformers.
window_size : int, `1`
Not used, present here for API consistency by convention.
last_window : pandas Series
Last window the forecaster has seen during training. It stores the
values needed to predict the next `step` immediately after the training data.
extended_index : pandas Index
When predicting using `last_window` and `last_window_exog`, the internal
statsmodels SARIMAX will be updated using its append method. To do this,
`last_window` data must start at the end of the index seen by the
forecaster, this is stored in forecaster.extended_index.
Check https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMAResults.append.html
to know more about statsmodels append method.
fitted : Bool
Tag to identify if the regressor has been fitted (trained).
index_type : type
Type of index of the input used in training.
index_freq : str
Frequency of Index of the input used in training.
training_range : pandas Index
First and last values of index of the data used during training.
included_exog : bool
If the forecaster has been trained using exogenous variable/s.
exog_type : type
Type of exogenous variable/s used in training.
exog_col_names : list
Names of columns of `exog` if `exog` used in training was a pandas
DataFrame.
fit_kwargs : dict
Additional arguments to be passed to the `fit` method of the regressor.
**New in version 0.8.0**
creation_date : str
Date of creation.
fit_date : str
Date of last fit.
skforcast_version : str
Version of skforecast library used to create the forecaster.
python_version : str
Version of python used to create the forecaster.
forecaster_id : str, int default `None`
Name used as an identifier of the forecaster.
"""
def __init__(
self,
regressor: ARIMA,
transformer_y: Optional[object]=None,
transformer_exog: Optional[object]=None,
fit_kwargs: Optional[dict]=None,
forecaster_id: Optional[Union[str, int]]=None
) -> None:
self.regressor = regressor
self.transformer_y = transformer_y
self.transformer_exog = transformer_exog
self.window_size = 1
self.last_window = None
self.extended_index = None
self.fitted = False
self.index_type = None
self.index_freq = None
self.training_range = None
self.included_exog = False
self.exog_type = None
self.exog_col_names = None
self.creation_date = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
self.fit_date = None
self.skforcast_version = skforecast.__version__
self.python_version = sys.version.split(" ")[0]
self.forecaster_id = forecaster_id
if not isinstance(self.regressor, pmdarima.arima.ARIMA):
raise TypeError(
(f"`regressor` must be an instance of type pmdarima.arima.ARIMA. "
f"Got {type(regressor)}.")
)
self.params = self.regressor.get_params(deep=True)
self.fit_kwargs = check_select_fit_kwargs(
regressor = regressor,
fit_kwargs = fit_kwargs
)
def __repr__(
self
) -> str:
"""
Information displayed when a ForecasterSarimax object is printed.
"""
info = (
f"{'=' * len(type(self).__name__)} \n"
f"{type(self).__name__} \n"
f"{'=' * len(type(self).__name__)} \n"
f"Regressor: {self.regressor} \n"
f"Regressor parameters: {self.params} \n"
f"fit_kwargs: {self.fit_kwargs} \n"
f"Window size: {self.window_size} \n"
f"Transformer for y: {self.transformer_y} \n"
f"Transformer for exog: {self.transformer_exog} \n"
f"Exogenous included: {self.included_exog} \n"
f"Type of exogenous variable: {self.exog_type} \n"
f"Exogenous variables names: {self.exog_col_names} \n"
f"Training range: {self.training_range.to_list() if self.fitted else None} \n"
f"Training index type: {str(self.index_type).split('.')[-1][:-2] if self.fitted else None} \n"
f"Training index frequency: {self.index_freq if self.fitted else None} \n"
f"Creation date: {self.creation_date} \n"
f"Last fit date: {self.fit_date} \n"
f"Index seen by the forecaster: {self.extended_index} \n"
f"Skforecast version: {self.skforcast_version} \n"
f"Python version: {self.python_version} \n"
f"Forecaster id: {self.forecaster_id} \n"
)
return info
def fit(
self,
y: pd.Series,
exog: Optional[Union[pd.Series, pd.DataFrame]]=None
) -> None:
"""
Training Forecaster.
Parameters
----------
y : pandas Series
Training time series.
exog : pandas Series, pandas DataFrame, default `None`
Exogenous variable/s included as predictor/s. Must have the same
number of observations as `y` and their indexes must be aligned so
that y[i] is regressed on exog[i].
Returns
-------
None
"""
check_y(y=y)
if exog is not None:
if len(exog) != len(y):
raise ValueError(
(f"`exog` must have same number of samples as `y`. "
f"length `exog`: ({len(exog)}), length `y`: ({len(y)})")
)
check_exog(exog=exog)
# Reset values in case the forecaster has already been fitted.
self.index_type = None
self.index_freq = None
self.last_window = None
self.extended_index = None
self.included_exog = False
self.exog_type = None
self.exog_col_names = None
self.X_train_col_names = None
self.in_sample_residuals = None
self.fitted = False
self.training_range = None
if exog is not None:
self.included_exog = True
self.exog_type = type(exog)
self.exog_col_names = \
exog.columns.to_list() if isinstance(exog, pd.DataFrame) else exog.name
y = transform_series(
series = y,
transformer = self.transformer_y,
fit = True,
inverse_transform = False
)
if exog is not None:
if isinstance(exog, pd.Series):
# pmdarima.arima.ARIMA only accepts DataFrames or 2d-arrays as exog
exog = exog.to_frame()
exog = transform_dataframe(
df = exog,
transformer = self.transformer_exog,
fit = True,
inverse_transform = False
)
self.regressor.fit(y=y, X=exog, **self.fit_kwargs)
self.fitted = True
self.fit_date = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
self.training_range = y.index[[0, -1]]
self.index_type = type(y.index)
if isinstance(y.index, pd.DatetimeIndex):
self.index_freq = y.index.freqstr
else:
self.index_freq = y.index.step
self.last_window = y.copy()
self.extended_index = self.regressor.arima_res_.fittedvalues.index.copy()
self.params = self.regressor.get_params(deep=True)
def predict(
self,
steps: int,
last_window: Optional[pd.Series]=None,
last_window_exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
exog: Optional[Union[pd.Series, pd.DataFrame]]=None
) -> pd.Series:
"""
Forecast future values
Generate predictions (forecasts) n steps in the future. Note that if
exogenous variables were used in the model fit, they will be expected
for the predict procedure and will fail otherwise.
When predicting using `last_window` and `last_window_exog`, the internal
statsmodels SARIMAX will be updated using its append method. To do this,
`last_window` data must start at the end of the index seen by the
forecaster, this is stored in forecaster.extended_index.
Check https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMAResults.append.html
to know more about statsmodels append method.
Parameters
----------
steps : int
Number of future steps predicted.
last_window : pandas Series, default `None`
Series values used to create the predictors needed in the
predictions. Used to make predictions unrelated to the original data.
Values have to start at the end of the training data.
last_window_exog : pandas Series, pandas DataFrame, default `None`
Values of the exogenous variables aligned with `last_window`. Only
needed when `last_window` is not None and the forecaster has been
trained including exogenous variables. Used to make predictions
unrelated to the original data. Values have to start at the end
of the training data.
exog : pandas Series, pandas DataFrame, default `None`
Value of the exogenous variable/s for the next steps.
Returns
-------
predictions : pandas Series
Predicted values.
"""
# Needs to be a new variable to avoid arima_res_.append if not needed
last_window_check = last_window.copy() if last_window is not None else self.last_window.copy()
check_predict_input(
forecaster_name = type(self).__name__,
steps = steps,
fitted = self.fitted,
included_exog = self.included_exog,
index_type = self.index_type,
index_freq = self.index_freq,
window_size = self.window_size,
last_window = last_window_check,
last_window_exog = last_window_exog,
exog = exog,
exog_type = self.exog_type,
exog_col_names = self.exog_col_names,
interval = None,
alpha = None,
max_steps = None,
levels = None,
series_col_names = None
)
# If last_window_exog is provided but no last_window
if last_window is None and last_window_exog is not None:
raise ValueError(
("To make predictions unrelated to the original data, both "
"`last_window` and `last_window_exog` must be provided.")
)
# Check if forecaster needs exog
if last_window is not None and last_window_exog is None and self.included_exog:
raise ValueError(
("Forecaster trained with exogenous variable/s. To make predictions "
"unrelated to the original data, same variable/s must be provided "
"using `last_window_exog`.")
)
if last_window is not None:
# If predictions do not follow directly from the end of the training
# data. The internal statsmodels SARIMAX model needs to be updated
# using its append method. The data needs to start at the end of the
# training series.
# check index append values
expected_index = expand_index(index=self.extended_index, steps=1)[0]
if expected_index != last_window.index[0]:
raise ValueError(
(f"To make predictions unrelated to the original data, `last_window` "
f"has to start at the end of the index seen by the forecaster.\n"
f" Series last index : {self.extended_index[-1]}.\n"
f" Expected index : {expected_index}.\n"
f" `last_window` index start : {last_window.index[0]}.")
)
last_window = transform_series(
series = last_window.copy(),
transformer = self.transformer_y,
fit = False,
inverse_transform = False
)
# TODO -----------------------------------------------------------------------------------------------------
# This is done because pmdarima deletes the series name
# Check issue: https://github.com/alkaline-ml/pmdarima/issues/535
last_window.name = None
# ----------------------------------------------------------------------------------------------------------
# last_window_exog
if last_window_exog is not None:
# check index last_window_exog
if expected_index != last_window_exog.index[0]:
raise ValueError(
(f"To make predictions unrelated to the original data, `last_window_exog` "
f"has to start at the end of the index seen by the forecaster.\n"
f" Series last index : {self.extended_index[-1]}.\n"
f" Expected index : {expected_index}.\n"
f" `last_window_exog` index start : {last_window_exog.index[0]}.")
)
if isinstance(last_window_exog, pd.Series):
# pmdarima.arima.ARIMA only accepts DataFrames or 2d-arrays as exog
last_window_exog = last_window_exog.to_frame()
last_window_exog = transform_dataframe(
df = last_window_exog,
transformer = self.transformer_exog,
fit = False,
inverse_transform = False
)
self.regressor.arima_res_ = self.regressor.arima_res_.append(
endog = last_window,
exog = last_window_exog,
refit = False
)
self.extended_index = self.regressor.arima_res_.fittedvalues.index
# Exog
if exog is not None:
if isinstance(exog, pd.Series):
# pmdarima.arima.ARIMA only accepts DataFrames or 2d-arrays as exog
exog = exog.to_frame()
exog = transform_dataframe(
df = exog,
transformer = self.transformer_exog,
fit = False,
inverse_transform = False
)
exog = exog.iloc[:steps, ]
# Get following n steps predictions
predictions = self.regressor.predict(
n_periods = steps,
X = exog
)
# Reverse the transformation if needed
predictions = transform_series(
series = predictions,
transformer = self.transformer_y,
fit = False,
inverse_transform = True
)
predictions.name = 'pred'
return predictions
def predict_interval(
self,
steps: int,
last_window: Optional[pd.Series]=None,
last_window_exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
alpha: float=0.05,
interval: list=None,
) -> pd.DataFrame:
"""
Forecast future values and their confidence intervals
Generate predictions (forecasts) n steps in the future with confidence
intervals. Note that if exogenous variables were used in the model fit,
they will be expected for the predict procedure and will fail otherwise.
When predicting using `last_window` and `last_window_exog`, the internal
statsmodels SARIMAX will be updated using its append method. To do this,
`last_window` data must start at the end of the index seen by the
forecaster, this is stored in forecaster.extended_index.
Check https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMAResults.append.html
to know more about statsmodels append method.
Parameters
----------
steps : int
Number of future steps predicted.
last_window : pandas Series, default `None`
Series values used to create the predictors needed in the
predictions. Used to make predictions unrelated to the original data.
Values have to start at the end of the training data.
last_window_exog : pandas Series, pandas DataFrame, default `None`
Values of the exogenous variables aligned with `last_window`. Only
need when `last_window` is not None and the forecaster has been
trained including exogenous variables.
exog : pandas Series, pandas DataFrame, default `None`
Exogenous variable/s included as predictor/s.
alpha : float, default `0.05`
The confidence intervals for the forecasts are (1 - alpha) %.
If both, `alpha` and `interval` are provided, `alpha` will be used.
interval : list, default `None`
Confidence of the prediction interval estimated. The values must be
symmetric. Sequence of percentiles to compute, which must be between
0 and 100 inclusive. For example, interval of 95% should be as
`interval = [2.5, 97.5]`. If both, `alpha` and `interval` are
provided, `alpha` will be used.
Returns
-------
predictions : pandas DataFrame
Values predicted by the forecaster and their estimated interval:
- pred: predictions.
- lower_bound: lower bound of the interval.
- upper_bound: upper bound interval of the interval.
"""
# Needs to be a new variable to avoid arima_res_.append if not needed
last_window_check = last_window.copy() if last_window is not None else self.last_window.copy()
check_predict_input(
forecaster_name = type(self).__name__,
steps = steps,
fitted = self.fitted,
included_exog = self.included_exog,
index_type = self.index_type,
index_freq = self.index_freq,
window_size = self.window_size,
last_window = last_window_check,
last_window_exog = last_window_exog,
exog = exog,
exog_type = self.exog_type,
exog_col_names = self.exog_col_names,
interval = interval,
alpha = alpha,
max_steps = None,
levels = None,
series_col_names = None
)
# If last_window_exog is provided but no last_window
if last_window is None and last_window_exog is not None:
raise ValueError(
("To make predictions unrelated to the original data, both "
"`last_window` and `last_window_exog` must be provided.")
)
# Check if forecaster needs exog
if last_window is not None and last_window_exog is None and self.included_exog:
raise ValueError(
("Forecaster trained with exogenous variable/s. To make predictions "
"unrelated to the original data, same variable/s must be provided "
"using `last_window_exog`.")
)
# If interval and alpha take alpha, if interval transform to alpha
if alpha is None:
if 100 - interval[1] != interval[0]:
raise ValueError(
(f"When using `interval` in ForecasterSarimax, it must be symmetrical. "
f"For example, interval of 95% should be as `interval = [2.5, 97.5]`. "
f"Got {interval}.")
)
alpha = 2*(100 - interval[1])/100
if last_window is not None:
# If predictions do not follow directly from the end of the training
# data. The internal statsmodels SARIMAX model needs to be updated
# using its append method. The data needs to start at the end of the
# training series.
# check index append values
expected_index = expand_index(index=self.extended_index, steps=1)[0]
if expected_index != last_window.index[0]:
raise ValueError(
(f"To make predictions unrelated to the original data, `last_window` "
f"has to start at the end of the index seen by the forecaster.\n"
f" Series last index : {self.extended_index[-1]}.\n"
f" Expected index : {expected_index}.\n"
f" `last_window` index start : {last_window.index[0]}.")
)
last_window = transform_series(
series = last_window,
transformer = self.transformer_y,
fit = False,
inverse_transform = False
)
# TODO -----------------------------------------------------------------------------------------------------
# This is done because pmdarima deletes the series name
# Check issue: https://github.com/alkaline-ml/pmdarima/issues/535
last_window.name = None
# ----------------------------------------------------------------------------------------------------------
# Transform last_window_exog
if last_window_exog is not None:
# check index last_window_exog
if expected_index != last_window_exog.index[0]:
raise ValueError(
(f"To make predictions unrelated to the original data, `last_window_exog` "
f"has to start at the end of the index seen by the forecaster.\n"
f" Series last index : {self.extended_index[-1]}.\n"
f" Expected index : {expected_index}.\n"
f" `last_window_exog` index start : {last_window_exog.index[0]}.")
)
if isinstance(last_window_exog, pd.Series):
# pmdarima.arima.ARIMA only accepts DataFrames or 2d-arrays as exog
last_window_exog = last_window_exog.to_frame()
last_window_exog = transform_dataframe(
df = last_window_exog,
transformer = self.transformer_exog,
fit = False,
inverse_transform = False
)
self.regressor.arima_res_ = self.regressor.arima_res_.append(
endog = last_window,
exog = last_window_exog,
refit = False
)
self.extended_index = self.regressor.arima_res_.fittedvalues.index
# Exog
if exog is not None:
if isinstance(exog, pd.Series):
# pmdarima.arima.ARIMA only accepts DataFrames or 2d-arrays as exog
exog = exog.to_frame()
exog = transform_dataframe(
df = exog,
transformer = self.transformer_exog,
fit = False,
inverse_transform = False
)
exog = exog.iloc[:steps, ]
# Get following n steps predictions with intervals
predicted_mean, conf_int = self.regressor.predict(
n_periods = steps,
X = exog,
alpha = alpha,
return_conf_int = True
)
predictions = predicted_mean.to_frame(name="pred")
predictions['lower_bound'] = conf_int[:, 0]
predictions['upper_bound'] = conf_int[:, 1]
# Reverse the transformation if needed
if self.transformer_y:
for col in predictions.columns:
predictions[col] = transform_series(
series = predictions[col],
transformer = self.transformer_y,
fit = False,
inverse_transform = True
)
return predictions
def set_params(
self,
params: dict
) -> None:
"""
Set new values to the parameters of the model stored in the forecaster.
Parameters
----------
params : dict
Parameters values.
Returns
-------
self
"""
self.regressor = clone(self.regressor)
self.regressor.set_params(**params)
self.params = self.regressor.get_params(deep=True)
def set_fit_kwargs(
self,
fit_kwargs: dict
) -> None:
"""
Set new values for the additional keyword arguments passed to the `fit`
method of the regressor.
Parameters
----------
fit_kwargs : dict
Dict of the form {"argument": new_value}.
Returns
-------
None
"""
self.fit_kwargs = check_select_fit_kwargs(self.regressor, fit_kwargs=fit_kwargs)
def get_feature_importances(
self
) -> pd.DataFrame:
"""
Return feature importances of the regressor stored in the
forecaster.
Parameters
----------
self
Returns
-------
feature_importances : pandas DataFrame
Feature importances associated with each predictor.
"""
if not self.fitted:
raise NotFittedError(
("This forecaster is not fitted yet. Call `fit` with appropriate "
"arguments before using `get_feature_importances()`.")
)
feature_importances = self.regressor.params().to_frame().reset_index()
feature_importances.columns = ['feature', 'importance']
return feature_importances
def get_feature_importance(
self
) -> pd.DataFrame:
"""
This method has been replaced by `get_feature_importances()`.
Return feature importance of the regressor stored in the
forecaster.
Parameters
----------
self
Returns
-------
feature_importances : pandas DataFrame
Feature importances associated with each predictor.
"""
warnings.warn(
("get_feature_importance() method has been renamed to get_feature_importances()."
"This method will be removed in skforecast 0.9.0.")
)
return self.get_feature_importances()
fit(self, y, exog=None)
¶
Training Forecaster.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
y |
Series |
Training time series. |
required |
exog |
Union[pandas.core.series.Series, pandas.core.frame.DataFrame] |
Exogenous variable/s included as predictor/s. Must have the same
number of observations as |
None |
Source code in skforecast/ForecasterSarimax/ForecasterSarimax.py
def fit(
self,
y: pd.Series,
exog: Optional[Union[pd.Series, pd.DataFrame]]=None
) -> None:
"""
Training Forecaster.
Parameters
----------
y : pandas Series
Training time series.
exog : pandas Series, pandas DataFrame, default `None`
Exogenous variable/s included as predictor/s. Must have the same
number of observations as `y` and their indexes must be aligned so
that y[i] is regressed on exog[i].
Returns
-------
None
"""
check_y(y=y)
if exog is not None:
if len(exog) != len(y):
raise ValueError(
(f"`exog` must have same number of samples as `y`. "
f"length `exog`: ({len(exog)}), length `y`: ({len(y)})")
)
check_exog(exog=exog)
# Reset values in case the forecaster has already been fitted.
self.index_type = None
self.index_freq = None
self.last_window = None
self.extended_index = None
self.included_exog = False
self.exog_type = None
self.exog_col_names = None
self.X_train_col_names = None
self.in_sample_residuals = None
self.fitted = False
self.training_range = None
if exog is not None:
self.included_exog = True
self.exog_type = type(exog)
self.exog_col_names = \
exog.columns.to_list() if isinstance(exog, pd.DataFrame) else exog.name
y = transform_series(
series = y,
transformer = self.transformer_y,
fit = True,
inverse_transform = False
)
if exog is not None:
if isinstance(exog, pd.Series):
# pmdarima.arima.ARIMA only accepts DataFrames or 2d-arrays as exog
exog = exog.to_frame()
exog = transform_dataframe(
df = exog,
transformer = self.transformer_exog,
fit = True,
inverse_transform = False
)
self.regressor.fit(y=y, X=exog, **self.fit_kwargs)
self.fitted = True
self.fit_date = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
self.training_range = y.index[[0, -1]]
self.index_type = type(y.index)
if isinstance(y.index, pd.DatetimeIndex):
self.index_freq = y.index.freqstr
else:
self.index_freq = y.index.step
self.last_window = y.copy()
self.extended_index = self.regressor.arima_res_.fittedvalues.index.copy()
self.params = self.regressor.get_params(deep=True)
get_feature_importance(self)
¶
This method has been replaced by get_feature_importances()
.
Return feature importance of the regressor stored in the forecaster.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
self |
None |
required |
Returns:
Type | Description |
---|---|
DataFrame |
Feature importances associated with each predictor. |
Source code in skforecast/ForecasterSarimax/ForecasterSarimax.py
def get_feature_importance(
self
) -> pd.DataFrame:
"""
This method has been replaced by `get_feature_importances()`.
Return feature importance of the regressor stored in the
forecaster.
Parameters
----------
self
Returns
-------
feature_importances : pandas DataFrame
Feature importances associated with each predictor.
"""
warnings.warn(
("get_feature_importance() method has been renamed to get_feature_importances()."
"This method will be removed in skforecast 0.9.0.")
)
return self.get_feature_importances()
get_feature_importances(self)
¶
Return feature importances of the regressor stored in the
forecaster.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
self |
None |
required |
Returns:
Type | Description |
---|---|
DataFrame |
Feature importances associated with each predictor. |
Source code in skforecast/ForecasterSarimax/ForecasterSarimax.py
def get_feature_importances(
self
) -> pd.DataFrame:
"""
Return feature importances of the regressor stored in the
forecaster.
Parameters
----------
self
Returns
-------
feature_importances : pandas DataFrame
Feature importances associated with each predictor.
"""
if not self.fitted:
raise NotFittedError(
("This forecaster is not fitted yet. Call `fit` with appropriate "
"arguments before using `get_feature_importances()`.")
)
feature_importances = self.regressor.params().to_frame().reset_index()
feature_importances.columns = ['feature', 'importance']
return feature_importances
predict(self, steps, last_window=None, last_window_exog=None, exog=None)
¶
Forecast future values
Generate predictions (forecasts) n steps in the future. Note that if exogenous variables were used in the model fit, they will be expected for the predict procedure and will fail otherwise.
When predicting using last_window
and last_window_exog
, the internal
statsmodels SARIMAX will be updated using its append method. To do this,
last_window
data must start at the end of the index seen by the
forecaster, this is stored in forecaster.extended_index.
Check https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMAResults.append.html to know more about statsmodels append method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
steps |
int |
Number of future steps predicted. |
required |
last_window |
Optional[pandas.core.series.Series] |
Series values used to create the predictors needed in the predictions. Used to make predictions unrelated to the original data. Values have to start at the end of the training data. |
None |
last_window_exog |
Union[pandas.core.series.Series, pandas.core.frame.DataFrame] |
Values of the exogenous variables aligned with |
None |
exog |
Union[pandas.core.series.Series, pandas.core.frame.DataFrame] |
Value of the exogenous variable/s for the next steps. |
None |
Returns:
Type | Description |
---|---|
Series |
Predicted values. |
Source code in skforecast/ForecasterSarimax/ForecasterSarimax.py
def predict(
self,
steps: int,
last_window: Optional[pd.Series]=None,
last_window_exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
exog: Optional[Union[pd.Series, pd.DataFrame]]=None
) -> pd.Series:
"""
Forecast future values
Generate predictions (forecasts) n steps in the future. Note that if
exogenous variables were used in the model fit, they will be expected
for the predict procedure and will fail otherwise.
When predicting using `last_window` and `last_window_exog`, the internal
statsmodels SARIMAX will be updated using its append method. To do this,
`last_window` data must start at the end of the index seen by the
forecaster, this is stored in forecaster.extended_index.
Check https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMAResults.append.html
to know more about statsmodels append method.
Parameters
----------
steps : int
Number of future steps predicted.
last_window : pandas Series, default `None`
Series values used to create the predictors needed in the
predictions. Used to make predictions unrelated to the original data.
Values have to start at the end of the training data.
last_window_exog : pandas Series, pandas DataFrame, default `None`
Values of the exogenous variables aligned with `last_window`. Only
needed when `last_window` is not None and the forecaster has been
trained including exogenous variables. Used to make predictions
unrelated to the original data. Values have to start at the end
of the training data.
exog : pandas Series, pandas DataFrame, default `None`
Value of the exogenous variable/s for the next steps.
Returns
-------
predictions : pandas Series
Predicted values.
"""
# Needs to be a new variable to avoid arima_res_.append if not needed
last_window_check = last_window.copy() if last_window is not None else self.last_window.copy()
check_predict_input(
forecaster_name = type(self).__name__,
steps = steps,
fitted = self.fitted,
included_exog = self.included_exog,
index_type = self.index_type,
index_freq = self.index_freq,
window_size = self.window_size,
last_window = last_window_check,
last_window_exog = last_window_exog,
exog = exog,
exog_type = self.exog_type,
exog_col_names = self.exog_col_names,
interval = None,
alpha = None,
max_steps = None,
levels = None,
series_col_names = None
)
# If last_window_exog is provided but no last_window
if last_window is None and last_window_exog is not None:
raise ValueError(
("To make predictions unrelated to the original data, both "
"`last_window` and `last_window_exog` must be provided.")
)
# Check if forecaster needs exog
if last_window is not None and last_window_exog is None and self.included_exog:
raise ValueError(
("Forecaster trained with exogenous variable/s. To make predictions "
"unrelated to the original data, same variable/s must be provided "
"using `last_window_exog`.")
)
if last_window is not None:
# If predictions do not follow directly from the end of the training
# data. The internal statsmodels SARIMAX model needs to be updated
# using its append method. The data needs to start at the end of the
# training series.
# check index append values
expected_index = expand_index(index=self.extended_index, steps=1)[0]
if expected_index != last_window.index[0]:
raise ValueError(
(f"To make predictions unrelated to the original data, `last_window` "
f"has to start at the end of the index seen by the forecaster.\n"
f" Series last index : {self.extended_index[-1]}.\n"
f" Expected index : {expected_index}.\n"
f" `last_window` index start : {last_window.index[0]}.")
)
last_window = transform_series(
series = last_window.copy(),
transformer = self.transformer_y,
fit = False,
inverse_transform = False
)
# TODO -----------------------------------------------------------------------------------------------------
# This is done because pmdarima deletes the series name
# Check issue: https://github.com/alkaline-ml/pmdarima/issues/535
last_window.name = None
# ----------------------------------------------------------------------------------------------------------
# last_window_exog
if last_window_exog is not None:
# check index last_window_exog
if expected_index != last_window_exog.index[0]:
raise ValueError(
(f"To make predictions unrelated to the original data, `last_window_exog` "
f"has to start at the end of the index seen by the forecaster.\n"
f" Series last index : {self.extended_index[-1]}.\n"
f" Expected index : {expected_index}.\n"
f" `last_window_exog` index start : {last_window_exog.index[0]}.")
)
if isinstance(last_window_exog, pd.Series):
# pmdarima.arima.ARIMA only accepts DataFrames or 2d-arrays as exog
last_window_exog = last_window_exog.to_frame()
last_window_exog = transform_dataframe(
df = last_window_exog,
transformer = self.transformer_exog,
fit = False,
inverse_transform = False
)
self.regressor.arima_res_ = self.regressor.arima_res_.append(
endog = last_window,
exog = last_window_exog,
refit = False
)
self.extended_index = self.regressor.arima_res_.fittedvalues.index
# Exog
if exog is not None:
if isinstance(exog, pd.Series):
# pmdarima.arima.ARIMA only accepts DataFrames or 2d-arrays as exog
exog = exog.to_frame()
exog = transform_dataframe(
df = exog,
transformer = self.transformer_exog,
fit = False,
inverse_transform = False
)
exog = exog.iloc[:steps, ]
# Get following n steps predictions
predictions = self.regressor.predict(
n_periods = steps,
X = exog
)
# Reverse the transformation if needed
predictions = transform_series(
series = predictions,
transformer = self.transformer_y,
fit = False,
inverse_transform = True
)
predictions.name = 'pred'
return predictions
predict_interval(self, steps, last_window=None, last_window_exog=None, exog=None, alpha=0.05, interval=None)
¶
Forecast future values and their confidence intervals
Generate predictions (forecasts) n steps in the future with confidence intervals. Note that if exogenous variables were used in the model fit, they will be expected for the predict procedure and will fail otherwise.
When predicting using last_window
and last_window_exog
, the internal
statsmodels SARIMAX will be updated using its append method. To do this,
last_window
data must start at the end of the index seen by the
forecaster, this is stored in forecaster.extended_index.
Check https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMAResults.append.html to know more about statsmodels append method.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
steps |
int |
Number of future steps predicted. |
required |
last_window |
Optional[pandas.core.series.Series] |
Series values used to create the predictors needed in the predictions. Used to make predictions unrelated to the original data. Values have to start at the end of the training data. |
None |
last_window_exog |
Union[pandas.core.series.Series, pandas.core.frame.DataFrame] |
Values of the exogenous variables aligned with |
None |
exog |
Union[pandas.core.series.Series, pandas.core.frame.DataFrame] |
Exogenous variable/s included as predictor/s. |
None |
alpha |
float |
The confidence intervals for the forecasts are (1 - alpha) %.
If both, |
0.05 |
interval |
list |
Confidence of the prediction interval estimated. The values must be
symmetric. Sequence of percentiles to compute, which must be between
0 and 100 inclusive. For example, interval of 95% should be as
|
None |
Returns:
Type | Description |
---|---|
DataFrame |
Values predicted by the forecaster and their estimated interval:
|
Source code in skforecast/ForecasterSarimax/ForecasterSarimax.py
def predict_interval(
self,
steps: int,
last_window: Optional[pd.Series]=None,
last_window_exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
exog: Optional[Union[pd.Series, pd.DataFrame]]=None,
alpha: float=0.05,
interval: list=None,
) -> pd.DataFrame:
"""
Forecast future values and their confidence intervals
Generate predictions (forecasts) n steps in the future with confidence
intervals. Note that if exogenous variables were used in the model fit,
they will be expected for the predict procedure and will fail otherwise.
When predicting using `last_window` and `last_window_exog`, the internal
statsmodels SARIMAX will be updated using its append method. To do this,
`last_window` data must start at the end of the index seen by the
forecaster, this is stored in forecaster.extended_index.
Check https://www.statsmodels.org/dev/generated/statsmodels.tsa.arima.model.ARIMAResults.append.html
to know more about statsmodels append method.
Parameters
----------
steps : int
Number of future steps predicted.
last_window : pandas Series, default `None`
Series values used to create the predictors needed in the
predictions. Used to make predictions unrelated to the original data.
Values have to start at the end of the training data.
last_window_exog : pandas Series, pandas DataFrame, default `None`
Values of the exogenous variables aligned with `last_window`. Only
need when `last_window` is not None and the forecaster has been
trained including exogenous variables.
exog : pandas Series, pandas DataFrame, default `None`
Exogenous variable/s included as predictor/s.
alpha : float, default `0.05`
The confidence intervals for the forecasts are (1 - alpha) %.
If both, `alpha` and `interval` are provided, `alpha` will be used.
interval : list, default `None`
Confidence of the prediction interval estimated. The values must be
symmetric. Sequence of percentiles to compute, which must be between
0 and 100 inclusive. For example, interval of 95% should be as
`interval = [2.5, 97.5]`. If both, `alpha` and `interval` are
provided, `alpha` will be used.
Returns
-------
predictions : pandas DataFrame
Values predicted by the forecaster and their estimated interval:
- pred: predictions.
- lower_bound: lower bound of the interval.
- upper_bound: upper bound interval of the interval.
"""
# Needs to be a new variable to avoid arima_res_.append if not needed
last_window_check = last_window.copy() if last_window is not None else self.last_window.copy()
check_predict_input(
forecaster_name = type(self).__name__,
steps = steps,
fitted = self.fitted,
included_exog = self.included_exog,
index_type = self.index_type,
index_freq = self.index_freq,
window_size = self.window_size,
last_window = last_window_check,
last_window_exog = last_window_exog,
exog = exog,
exog_type = self.exog_type,
exog_col_names = self.exog_col_names,
interval = interval,
alpha = alpha,
max_steps = None,
levels = None,
series_col_names = None
)
# If last_window_exog is provided but no last_window
if last_window is None and last_window_exog is not None:
raise ValueError(
("To make predictions unrelated to the original data, both "
"`last_window` and `last_window_exog` must be provided.")
)
# Check if forecaster needs exog
if last_window is not None and last_window_exog is None and self.included_exog:
raise ValueError(
("Forecaster trained with exogenous variable/s. To make predictions "
"unrelated to the original data, same variable/s must be provided "
"using `last_window_exog`.")
)
# If interval and alpha take alpha, if interval transform to alpha
if alpha is None:
if 100 - interval[1] != interval[0]:
raise ValueError(
(f"When using `interval` in ForecasterSarimax, it must be symmetrical. "
f"For example, interval of 95% should be as `interval = [2.5, 97.5]`. "
f"Got {interval}.")
)
alpha = 2*(100 - interval[1])/100
if last_window is not None:
# If predictions do not follow directly from the end of the training
# data. The internal statsmodels SARIMAX model needs to be updated
# using its append method. The data needs to start at the end of the
# training series.
# check index append values
expected_index = expand_index(index=self.extended_index, steps=1)[0]
if expected_index != last_window.index[0]:
raise ValueError(
(f"To make predictions unrelated to the original data, `last_window` "
f"has to start at the end of the index seen by the forecaster.\n"
f" Series last index : {self.extended_index[-1]}.\n"
f" Expected index : {expected_index}.\n"
f" `last_window` index start : {last_window.index[0]}.")
)
last_window = transform_series(
series = last_window,
transformer = self.transformer_y,
fit = False,
inverse_transform = False
)
# TODO -----------------------------------------------------------------------------------------------------
# This is done because pmdarima deletes the series name
# Check issue: https://github.com/alkaline-ml/pmdarima/issues/535
last_window.name = None
# ----------------------------------------------------------------------------------------------------------
# Transform last_window_exog
if last_window_exog is not None:
# check index last_window_exog
if expected_index != last_window_exog.index[0]:
raise ValueError(
(f"To make predictions unrelated to the original data, `last_window_exog` "
f"has to start at the end of the index seen by the forecaster.\n"
f" Series last index : {self.extended_index[-1]}.\n"
f" Expected index : {expected_index}.\n"
f" `last_window_exog` index start : {last_window_exog.index[0]}.")
)
if isinstance(last_window_exog, pd.Series):
# pmdarima.arima.ARIMA only accepts DataFrames or 2d-arrays as exog
last_window_exog = last_window_exog.to_frame()
last_window_exog = transform_dataframe(
df = last_window_exog,
transformer = self.transformer_exog,
fit = False,
inverse_transform = False
)
self.regressor.arima_res_ = self.regressor.arima_res_.append(
endog = last_window,
exog = last_window_exog,
refit = False
)
self.extended_index = self.regressor.arima_res_.fittedvalues.index
# Exog
if exog is not None:
if isinstance(exog, pd.Series):
# pmdarima.arima.ARIMA only accepts DataFrames or 2d-arrays as exog
exog = exog.to_frame()
exog = transform_dataframe(
df = exog,
transformer = self.transformer_exog,
fit = False,
inverse_transform = False
)
exog = exog.iloc[:steps, ]
# Get following n steps predictions with intervals
predicted_mean, conf_int = self.regressor.predict(
n_periods = steps,
X = exog,
alpha = alpha,
return_conf_int = True
)
predictions = predicted_mean.to_frame(name="pred")
predictions['lower_bound'] = conf_int[:, 0]
predictions['upper_bound'] = conf_int[:, 1]
# Reverse the transformation if needed
if self.transformer_y:
for col in predictions.columns:
predictions[col] = transform_series(
series = predictions[col],
transformer = self.transformer_y,
fit = False,
inverse_transform = True
)
return predictions
set_fit_kwargs(self, fit_kwargs)
¶
Set new values for the additional keyword arguments passed to the fit
method of the regressor.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fit_kwargs |
dict |
Dict of the form {"argument": new_value}. |
required |
Source code in skforecast/ForecasterSarimax/ForecasterSarimax.py
def set_fit_kwargs(
self,
fit_kwargs: dict
) -> None:
"""
Set new values for the additional keyword arguments passed to the `fit`
method of the regressor.
Parameters
----------
fit_kwargs : dict
Dict of the form {"argument": new_value}.
Returns
-------
None
"""
self.fit_kwargs = check_select_fit_kwargs(self.regressor, fit_kwargs=fit_kwargs)
set_params(self, params)
¶
Set new values to the parameters of the model stored in the forecaster.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
params |
dict |
Parameters values. |
required |
Source code in skforecast/ForecasterSarimax/ForecasterSarimax.py
def set_params(
self,
params: dict
) -> None:
"""
Set new values to the parameters of the model stored in the forecaster.
Parameters
----------
params : dict
Parameters values.
Returns
-------
self
"""
self.regressor = clone(self.regressor)
self.regressor.set_params(**params)
self.params = self.regressor.get_params(deep=True)