Scikit-learn style wrapper for the ARIMA (AutoRegressive Integrated Moving Average)
model and auto arima selection algorithm.
This estimator treats a univariate time series as input. Call fit(y) with
a 1D array-like of observations in time order, then produce out-of-sample
forecasts via predict(steps) and prediction intervals via predict_interval(steps, level=...).
In-sample diagnostics are available through fitted_, residuals_() and summary().
Parameters:
Name
Type
Description
Default
order
tuple of int or None
The (p, d, q) order of the non-seasonal ARIMA model:
- p: AR order (number of lag observations)
- d: Degree of differencing (number of times to difference the series)
- q: MA order (size of moving average window)
If None, the order will be automatically selected using auto_arima during fitting.
(1, 0, 0)
seasonal_order
tuple of int or None
The (P, D, Q) order of the seasonal component:
- P: Seasonal AR order
- D: Seasonal differencing order
- Q: Seasonal MA order
If None, the seasonal order will be automatically selected using auto_arima during
fitting.
Estimation method. Options:
- "CSS-ML": Conditional sum of squares for initial values, then maximum likelihood
- "ML": Maximum likelihood only
- "CSS": Conditional sum of squares only
String identifier of the fitted model configuration (e.g., "Arima(1,1,1)(0,0,0)[1]").
This is updated after fitting to reflect the selected model.
Notes
The ARIMA model supports exogenous regressors which are incorporated
directly into the likelihood function, unlike the two-step approach used in
the ARAR model. This means the exogenous variables are modeled jointly with
the ARMA errors, providing a more integrated treatment.
The model uses a state-space representation and the Kalman filter for
likelihood computation and forecasting, which allows handling of missing
values and provides efficient recursive prediction.
def__init__(self,order:tuple[int,int,int]|None=(1,0,0),seasonal_order:tuple[int,int,int]|None=(0,0,0),m:int=1,include_mean:bool=True,transform_pars:bool=True,method:str="CSS-ML",n_cond:int|None=None,SSinit:str="Gardner1980",optim_method:str="BFGS",optim_kwargs:dict|None=None,kappa:float=1e6,max_p:int=5,max_q:int=5,max_P:int=2,max_Q:int=2,max_order:int=5,max_d:int=2,max_D:int=1,start_p:int=2,start_q:int=2,start_P:int=1,start_Q:int=1,stationary:bool=False,seasonal:bool=True,ic:str="aicc",stepwise:bool=True,nmodels:int=94,trace:bool=False,approximation:bool|None=None,truncate:int|None=None,test:str="kpss",test_kwargs:dict|None=None,seasonal_test:str="seas",seasonal_test_kwargs:dict|None=None,allowdrift:bool=True,allowmean:bool=True,lambda_bc:float|str|None=None,biasadj:bool=False,):iforderisnotNoneandlen(order)!=3:raiseValueError(f"`order` must be a tuple of length 3, got length {len(order)}")ifseasonal_orderisnotNoneandlen(seasonal_order)!=3:raiseValueError(f"`seasonal_order` must be a tuple of length 3, got length {len(seasonal_order)}")ifnotisinstance(m,int)orm<1:raiseValueError("`m` must be a positive integer (seasonal period).")self.order=orderself.seasonal_order=seasonal_orderself.m=mself.include_mean=include_meanself.transform_pars=transform_parsself.method=methodself.n_cond=n_condself.SSinit=SSinitself.optim_method=optim_methodself.optim_kwargs=optim_kwargsself.kappa=kappaself.max_p=max_pself.max_q=max_qself.max_P=max_Pself.max_Q=max_Qself.max_order=max_orderself.max_d=max_dself.max_D=max_Dself.start_p=start_pself.start_q=start_qself.start_P=start_Pself.start_Q=start_Qself.stationary=stationaryself.seasonal=seasonalself.ic=icself.stepwise=stepwiseself.nmodels=nmodelsself.trace=traceself.approximation=approximationself.truncate=truncateself.test=testself.test_kwargs=test_kwargsself.seasonal_test=seasonal_testself.seasonal_test_kwargs=seasonal_test_kwargsself.allowdrift=allowdriftself.allowmean=allowmeanself.lambda_bc=lambda_bcself.biasadj=biasadjself.is_auto=orderisNoneorseasonal_orderisNoneself.model_=Noneself.y_train_=Noneself.coef_=Noneself.coef_names_=Noneself.sigma2_=Noneself.loglik_=Noneself.aic_=Noneself.bic_=Noneself.arma_=Noneself.converged_=Noneself.fitted_values_=Noneself.in_sample_residuals_=Noneself.var_coef_=Noneself.n_features_in_=Noneself.n_exog_names_in_=Noneself.n_exog_features_in_=Noneself.is_memory_reduced=Falseself.is_fitted=Falseself.best_params_=Noneifself.optim_kwargsisNone:self.optim_kwargs={'maxiter':1000}ifself.is_auto:estimator_name_="AutoArima()"else:p,d,q=self.orderP,D,Q=self.seasonal_orderifP==0andD==0andQ==0:estimator_name_=f"Arima({p},{d},{q})"else:estimator_name_=f"Arima({p},{d},{q})({P},{D},{Q})[{self.m}]"self.estimator_name_=estimator_name_
If order or seasonal_order were not specified during initialization
(i.e., set to None), this method will automatically determine the best
model using auto arima with stepwise search.
Parameters:
Name
Type
Description
Default
y
pandas Series, numpy ndarray of shape (n_samples,)
Time-ordered numeric sequence.
required
exog
pandas Series, pandas DataFrame, numpy ndarray of shape (n_samples, n_exog_features)
Exogenous regressors to include in the model. These are incorporated
directly into the ARIMA likelihood function.
Fitted estimator. After fitting with automatic model selection, the
selected order and seasonal_order are stored in the respective
attributes, and estimator_selected_id_ is updated with the chosen model.
deffit(self,y:np.ndarray|pd.Series,exog:np.ndarray|pd.Series|pd.DataFrame|None=None,suppress_warnings:bool=False)->"Arima":""" Fit the ARIMA model to a univariate time series. If `order` or `seasonal_order` were not specified during initialization (i.e., set to None), this method will automatically determine the best model using auto arima with stepwise search. Parameters ---------- y : pandas Series, numpy ndarray of shape (n_samples,) Time-ordered numeric sequence. exog : pandas Series, pandas DataFrame, numpy ndarray of shape (n_samples, n_exog_features), default None Exogenous regressors to include in the model. These are incorporated directly into the ARIMA likelihood function. suppress_warnings : bool, default False If True, suppress warnings during fitting (e.g., convergence warnings). Returns ------- self : Arima Fitted estimator. After fitting with automatic model selection, the selected `order` and `seasonal_order` are stored in the respective attributes, and `estimator_selected_id_` is updated with the chosen model. """self.is_auto=self.orderisNoneorself.seasonal_orderisNoneself.model_=Noneself.y_train_=Noneself.coef_=Noneself.coef_names_=Noneself.sigma2_=Noneself.loglik_=Noneself.aic_=Noneself.bic_=Noneself.arma_=Noneself.converged_=Noneself.fitted_values_=Noneself.in_sample_residuals_=Noneself.var_coef_=Noneself.n_features_in_=Noneself.n_exog_names_in_=Noneself.n_exog_features_in_=Noneself.is_memory_reduced=Falseself.is_fitted=Falseself.best_params_=Noneifnotisinstance(y,(np.ndarray,pd.Series)):raiseTypeError("`y` must be a pandas Series or numpy array.")ifnotisinstance(exog,(type(None),pd.Series,pd.DataFrame,np.ndarray)):raiseTypeError("`exog` must be a pandas Series, DataFrame, numpy array, or None.")y=np.asarray(y,dtype=float)ify.ndim==2andy.shape[1]==1:y=y.ravel()elify.ndim!=1:raiseValueError("`y` must be 1-dimensional.")exog_names_in_=NoneifexogisnotNone:ifisinstance(exog,pd.DataFrame):exog_names_in_=list(exog.columns)exog=np.asarray(exog,dtype=float)ifexog.ndim==1:exog=exog.reshape(-1,1)elifexog.ndim!=2:raiseValueError("`exog` must be 1- or 2-dimensional.")iflen(exog)!=len(y):raiseValueError(f"Length of `exog` ({len(exog)}) does not match length of `y` ({len(y)}).")ctx=(warnings.catch_warnings()ifsuppress_warningselsenullcontext())withctx:ifsuppress_warnings:warnings.simplefilter("ignore")ifself.is_auto:self.model_=auto_arima(y=y,m=self.m,d=self.order[1]ifself.orderisnotNoneelseNone,D=self.seasonal_order[1]ifself.seasonal_orderisnotNoneelseNone,max_p=self.max_p,max_q=self.max_q,max_P=self.max_P,max_Q=self.max_Q,max_order=self.max_order,max_d=self.max_d,max_D=self.max_D,start_p=self.start_p,start_q=self.start_q,start_P=self.start_P,start_Q=self.start_Q,stationary=self.stationary,seasonal=self.seasonal,ic=self.ic,stepwise=self.stepwise,nmodels=self.nmodels,trace=self.trace,approximation=self.approximation,method=self.method,truncate=self.truncate,xreg=exog,test=self.test,test_args=self.test_kwargs,seasonal_test=self.seasonal_test,seasonal_test_args=self.seasonal_test_kwargs,allowdrift=self.allowdrift,allowmean=self.allowmean,lambda_bc=self.lambda_bc,biasadj=self.biasadj,SSinit=self.SSinit,kappa=self.kappa)best_model_order_=(self.model_['arma'][0],self.model_['arma'][5],self.model_['arma'][1])best_seasonal_order_=(self.model_['arma'][2],self.model_['arma'][6],self.model_['arma'][3])self.best_params_={'order':best_model_order_,'seasonal_order':best_seasonal_order_,'m':self.m}# NOTE: Only needed to update `estimator_name_` when auto arima is usedp,d,q=best_model_order_P,D,Q=best_seasonal_order_ifP==0andD==0andQ==0:self.estimator_name_=f"AutoArima({p},{d},{q})"else:self.estimator_name_=f"AutoArima({p},{d},{q})({P},{D},{Q})[{self.m}]"else:self.model_=arima(x=y,m=self.m,order=self.order,seasonal=self.seasonal_order,xreg=exog,include_mean=self.include_mean,transform_pars=self.transform_pars,fixed=None,init=None,method=self.method,n_cond=self.n_cond,SSinit=self.SSinit,optim_method=self.optim_method,opt_options=self.optim_kwargs,kappa=self.kappa)self.y_train_=self.model_['y']self.coef_=self.model_['coef'].to_numpy().ravel()self.coef_names_=list(self.model_['coef'].columns)self.sigma2_=self.model_['sigma2']self.loglik_=self.model_['loglik']self.aic_=self.model_['aic']self.bic_=self.model_['bic']self.arma_=self.model_['arma']self.converged_=self.model_['converged']self.fitted_values_=self.model_['fitted']self.in_sample_residuals_=self.model_['residuals']self.var_coef_=self.model_['var_coef']self.n_exog_names_in_=exog_names_in_self.n_exog_features_in_=exog.shape[1]ifexogisnotNoneelse0self.n_features_in_=1self.is_memory_reduced=Falseself.is_fitted=Trueifexog_names_in_isnotNone:n_exog=len(exog_names_in_)self.coef_names_=self.coef_names_[:-n_exog]+exog_names_in_returnself
@check_is_fitteddefpredict(self,steps:int,exog:np.ndarray|pd.Series|pd.DataFrame|None=None)->np.ndarray:""" Generate mean forecasts steps ahead. Parameters ---------- steps : int Forecast horizon (must be > 0). exog : ndarray, Series or DataFrame of shape (steps, n_exog_features), default None Exogenous regressors for the forecast period. Must have the same number of features as used during fitting. Returns ------- predictions : ndarray of shape (steps,) Point forecasts for steps 1..steps. Raises ------ ValueError If model hasn't been fitted, steps <= 0, or exog shape is incorrect. """ifnotisinstance(steps,(int,np.integer))orsteps<=0:raiseValueError("`steps` must be a positive integer.")ifexogisnotNone:exog=np.asarray(exog,dtype=float)ifexog.ndim==1:exog=exog.reshape(-1,1)elifexog.ndim!=2:raiseValueError("`exog` must be 1- or 2-dimensional.")iflen(exog)!=steps:raiseValueError(f"Length of `exog` ({len(exog)}) must match `steps` ({steps}).")ifexog.shape[1]!=self.n_exog_features_in_:raiseValueError(f"Number of exogenous features ({exog.shape[1]}) does not match "f"the number used during fitting ({self.n_exog_features_in_}).")elifself.n_exog_features_in_>0:raiseValueError(f"Model was fitted with {self.n_exog_features_in_} exogenous features, "f"but `exog` was not provided for prediction.")ifself.is_auto:predictions=forecast_arima(model=self.model_,h=steps,xreg=exog)['mean']else:predictions=predict_arima(model=self.model_,n_ahead=steps,newxreg=exog,se_fit=False)['mean']returnpredictions
The significance level for the prediction interval.
If specified, the confidence interval will be (1 - alpha) * 100%.
For example, alpha=0.05 gives 95% intervals.
Cannot be specified together with level.
If model hasn't been fitted, steps <= 0, or exog shape is incorrect.
Notes
Prediction intervals are computed using the standard errors from the
Kalman filter and assuming normally distributed innovations. The intervals
fully account for both parameter uncertainty (through the variance-covariance
matrix) and forecast uncertainty.
@check_is_fitteddefpredict_interval(self,steps:int=1,level:list[float]|tuple[float,...]|None=None,alpha:float|None=None,as_frame:bool=True,exog:np.ndarray|pd.Series|pd.DataFrame|None=None)->np.ndarray|pd.DataFrame:""" Forecast with prediction intervals. Parameters ---------- steps : int, default 1 Forecast horizon. level : list or tuple of float, default None Confidence levels in percent (e.g., 80 for 80% intervals). If None and alpha is None, defaults to (80, 95). Cannot be specified together with `alpha`. alpha : float, default None The significance level for the prediction interval. If specified, the confidence interval will be (1 - alpha) * 100%. For example, alpha=0.05 gives 95% intervals. Cannot be specified together with `level`. as_frame : bool, default True If True, return a tidy DataFrame with columns 'mean', 'lower_<L>', 'upper_<L>' for each level L. If False, return a NumPy ndarray. exog : ndarray, Series or DataFrame of shape (steps, n_exog_features), default None Exogenous regressors for the forecast period. Returns ------- predictions : numpy ndarray, pandas DataFrame If as_frame=True, pandas DataFrame with columns 'mean', 'lower_<L>', 'upper_<L>' for each level L. If as_frame=False, numpy ndarray. Raises ------ ValueError If model hasn't been fitted, steps <= 0, or exog shape is incorrect. Notes ----- Prediction intervals are computed using the standard errors from the Kalman filter and assuming normally distributed innovations. The intervals fully account for both parameter uncertainty (through the variance-covariance matrix) and forecast uncertainty. """ifnotisinstance(steps,(int,np.integer))orsteps<=0:raiseValueError("`steps` must be a positive integer.")iflevelisnotNoneandalphaisnotNone:raiseValueError("Cannot specify both `level` and `alpha`. Use one or the other.")ifalphaisnotNone:ifnot0<alpha<1:raiseValueError("`alpha` must be between 0 and 1.")level=[(1-alpha)*100]eliflevelisNone:level=(80,95)ifisinstance(level,(int,float,np.number)):level=[level]else:level=list(level)ifexogisnotNone:exog=np.asarray(exog,dtype=float)ifexog.ndim==1:exog=exog.reshape(-1,1)elifexog.ndim!=2:raiseValueError("`exog` must be 1- or 2-dimensional.")iflen(exog)!=steps:raiseValueError(f"Length of `exog` ({len(exog)}) must match `steps` ({steps}).")ifexog.shape[1]!=self.n_exog_features_in_:raiseValueError(f"Number of exogenous features ({exog.shape[1]}) does not match "f"the number used during fitting ({self.n_exog_features_in_}).")elifself.n_exog_features_in_>0:raiseValueError(f"Model was fitted with {self.n_exog_features_in_} exogenous features, "f"but `exog` was not provided for prediction.")ifself.is_auto:raw_preds=forecast_arima(model=self.model_,h=steps,xreg=exog,level=level)else:raw_preds=predict_arima(model=self.model_,n_ahead=steps,newxreg=exog,se_fit=True,level=level)levels=raw_preds['level']n_levels=len(levels)predictions=np.empty((steps,1+2*n_levels),dtype=float)predictions[:,0]=raw_preds['mean']predictions[:,1::2]=raw_preds['lower']predictions[:,2::2]=raw_preds['upper']ifas_frame:col_names=["mean"]forlevelinlevels:level=int(level)col_names.append(f"lower_{level}")col_names.append(f"upper_{level}")predictions=pd.DataFrame(data=predictions,columns=col_names,index=pd.RangeIndex(1,steps+1,name="step"))returnpredictions
@check_is_fitteddefget_residuals(self)->np.ndarray:""" Get in-sample residuals (observed - fitted) from the ARIMA model. Returns ------- residuals : ndarray of shape (n_samples,) In-sample residuals. Raises ------ NotFittedError If the model has not been fitted. RuntimeError If reduce_memory() has been called (residuals are no longer available). """check_memory_reduced(self,method_name='get_residuals')returnself.in_sample_residuals_
@check_is_fitteddefget_fitted_values(self)->np.ndarray:""" Get in-sample fitted values from the ARIMA model. Returns ------- fitted : ndarray of shape (n_samples,) In-sample fitted values. Raises ------ NotFittedError If the model has not been fitted. RuntimeError If reduce_memory() has been called (fitted values are no longer available). """check_memory_reduced(self,method_name='get_fitted_values')returnself.fitted_values_
@check_is_fitteddefget_score(self,y:None=None)->float:""" Compute R^2 score using in-sample fitted values. Parameters ---------- y : ignored Present for API compatibility with sklearn. Returns ------- score : float Coefficient of determination (R^2). """check_memory_reduced(self,method_name='get_score')y=self.y_train_fitted=self.fitted_values_# Handle NaN values if anymask=~(np.isnan(y)|np.isnan(fitted))ifmask.sum()<2:returnnp.nanss_res=np.sum((y[mask]-fitted[mask])**2)ss_tot=np.sum((y[mask]-y[mask].mean())**2)+np.finfo(float).epsreturn1.0-ss_res/ss_tot
@check_is_fitteddefget_info_criteria(self,criteria:str='aic')->float:""" Get the selected information criterion. Parameters ---------- criteria : str, default 'aic' The information criterion to retrieve. Valid options are {'aic', 'bic'}. Returns ------- metric : float The value of the selected information criterion. """ifcriterianotin['aic','bic']:raiseValueError(f"Invalid value for `criteria`: '{criteria}'. "f"Valid options are 'aic' and 'bic'.")ifcriteria=='aic':value=self.aic_elifcriteria=='bic':# NOTE: BIC may be not available. This may occur when the model did# not converge or other estimation issues.value=self.bic_ifself.bic_isnotNoneelsenp.nanreturnvalue
defget_params(self,deep:bool=True)->dict:""" Get parameters for this estimator. Parameters ---------- deep : bool, default True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns ------- params : dict Parameter names mapped to their values. """return{"order":self.order,"seasonal_order":self.seasonal_order,"m":self.m,"include_mean":self.include_mean,"transform_pars":self.transform_pars,"method":self.method,"n_cond":self.n_cond,"SSinit":self.SSinit,"optim_method":self.optim_method,"optim_kwargs":self.optim_kwargs,"kappa":self.kappa,}
Set the parameters of this estimator. Internal method without resetting
the fitted state. This method is intended for internal use only, please
use set_params() instead.
def_set_params(self,**params)->None:""" Set the parameters of this estimator. Internal method without resetting the fitted state. This method is intended for internal use only, please use `set_params()` instead. Parameters ---------- **params : dict Estimator parameters. Returns ------- None """forkey,valueinparams.items():setattr(self,key,value)self.is_auto=self.orderisNoneorself.seasonal_orderisNoneifself.is_auto:estimator_name_="AutoArima()"else:p,d,q=self.orderP,D,Q=self.seasonal_orderifP==0andD==0andQ==0:estimator_name_=f"Arima({p},{d},{q})"else:estimator_name_=f"Arima({p},{d},{q})({P},{D},{Q})[{self.m}]"self.estimator_name_=estimator_name_
defset_params(self,**params)->"Arima":""" Set the parameters of this estimator and reset the fitted state. This method resets the estimator to its unfitted state whenever parameters are changed, requiring the model to be refitted before making predictions. Parameters ---------- **params : dict Estimator parameters. Valid parameter keys are: 'order', 'seasonal_order', 'm', 'include_mean', 'transform_pars', 'method', 'n_cond', 'SSinit', 'optim_method', 'optim_kwargs', 'kappa'. Returns ------- self : Arima The estimator with updated parameters and reset state. Raises ------ ValueError If any parameter key is invalid. """valid_params={'order','seasonal_order','m','include_mean','transform_pars','method','n_cond','SSinit','optim_method','optim_kwargs','kappa'}forkeyinparams.keys():ifkeynotinvalid_params:raiseValueError(f"Invalid parameter '{key}'. Valid parameters are: {valid_params}")self._set_params(**params)fitted_attrs=['model_','y_train_','coef_','coef_names_','sigma2_','loglik_','aic_','bic_','arma_','converged_','fitted_values_','in_sample_residuals_','var_coef_','n_features_in_','n_exog_features_in_','n_exog_names_in_']forattrinfitted_attrs:setattr(self,attr,None)self.is_memory_reduced=Falseself.is_fitted=Falsereturnself
Print a summary of the fitted ARIMA model.
Includes model specification, coefficients, fit statistics, and residual diagnostics.
If reduce_memory() has been called, summary information will be limited.
@check_is_fitteddefsummary(self)->None:""" Print a summary of the fitted ARIMA model. Includes model specification, coefficients, fit statistics, and residual diagnostics. If reduce_memory() has been called, summary information will be limited. """print("ARIMA Model Summary")print("="*60)print(f"Model : {self.estimator_name_}")print(f"Method : {self.model_['method']}")print(f"Converged : {self.converged_}")print()print("Coefficients:")print("-"*60)fori,nameinenumerate(self.coef_names_):# Extract standard error from variance-covariance matrixifself.var_coef_isnotNoneandi<self.var_coef_.shape[0]andi<self.var_coef_.shape[1]:se=np.sqrt(self.var_coef_[i,i])t_stat=self.coef_[i]/seifse>0elsenp.nanprint(f" {name:15s}: {self.coef_[i]:10.4f} (SE: {se:8.4f}, t: {t_stat:8.2f})")else:print(f" {name:15s}: {self.coef_[i]:10.4f}")print()print("Model fit statistics:")print(f" sigma^2: {self.sigma2_:.6f}")print(f" Log-likelihood: {self.loglik_:.2f}")print(f" AIC: {self.aic_:.2f}")ifself.bic_isnotNone:print(f" BIC: {self.bic_:.2f}")else:print(f" BIC: N/A")print()ifnotself.is_memory_reduced:print("Residual statistics:")print(f" Mean: {np.mean(self.in_sample_residuals_):.6f}")print(f" Std Dev: {np.std(self.in_sample_residuals_,ddof=1):.6f}")print(f" MAE: {np.mean(np.abs(self.in_sample_residuals_)):.6f}")print(f" RMSE: {np.sqrt(np.mean(self.in_sample_residuals_**2)):.6f}")print()print("Time Series Summary Statistics:")print(f"Number of observations: {len(self.y_train_)}")print(f" Mean: {np.mean(self.y_train_):.4f}")print(f" Std Dev: {np.std(self.y_train_,ddof=1):.4f}")print(f" Min: {np.min(self.y_train_):.4f}")print(f" 25%: {np.percentile(self.y_train_,25):.4f}")print(f" Median: {np.median(self.y_train_):.4f}")print(f" 75%: {np.percentile(self.y_train_,75):.4f}")print(f" Max: {np.max(self.y_train_):.4f}")
Free memory by deleting large attributes after fitting.
This method removes fitted values, residuals, and other intermediate
results that are not strictly necessary for prediction. After calling
this method, certain diagnostic functions (like get_residuals(),
get_fitted_values(), summary()) will no longer work, but prediction
methods will continue to function.
Call this method only if you need to reduce memory usage and don't
need access to diagnostic information.
@check_is_fitteddefreduce_memory(self)->"Arima":""" Free memory by deleting large attributes after fitting. This method removes fitted values, residuals, and other intermediate results that are not strictly necessary for prediction. After calling this method, certain diagnostic functions (like get_residuals(), get_fitted_values(), summary()) will no longer work, but prediction methods will continue to function. Call this method only if you need to reduce memory usage and don't need access to diagnostic information. Returns ------- self : Arima The estimator with reduced memory footprint. """attrs_to_delete=['y_train_','fitted_values_','in_sample_residuals_']forattrinattrs_to_delete:ifhasattr(self,attr):delattr(self,attr)self.is_memory_reduced=Truewarnings.warn("Memory reduced. Diagnostic methods (get_residuals, get_fitted_values, ""summary, get_score) are no longer available. Prediction methods remain functional.",UserWarning)returnself
A universal scikit-learn style wrapper for statsmodels SARIMAX.
This class wraps the statsmodels.tsa.statespace.sarimax.SARIMAX model [1]_ [2]_
to follow the scikit-learn style. The following docstring is based on the
statsmodels documentation and it is highly recommended to visit their site
for the best level of detail.
The (p,d,q) order of the model for the number of AR parameters, differences,
and MA parameters.
d must be an integer indicating the integration order of the process.
p and q may either be an integers indicating the AR and MA orders
(so that all lags up to those orders are included) or else iterables
giving specific AR and / or MA lags to include.
The (P,D,Q,s) order of the seasonal component of the model for the AR
parameters, differences, MA parameters, and periodicity.
D must be an integer indicating the integration order of the process.
P and Q may either be an integers indicating the AR and MA orders
(so that all lags up to those orders are included) or else iterables
giving specific AR and / or MA lags to include.
s is an integer giving the periodicity (number of periods in season),
often it is 4 for quarterly data or 12 for monthly data.
Parameter controlling the deterministic trend polynomial A(t).
'c' indicates a constant (i.e. a degree zero component of the
trend polynomial).
't' indicates a linear trend with time.
'ct' indicates both, 'c' and 't'.
Can also be specified as an iterable defining the non-zero polynomial
exponents to include, in increasing order. For example, [1,1,0,1]
denotes a + b*t + ct^3.
Used when an explanatory variables, exog, are provided to select whether
or not coefficients on the exogenous estimators are allowed to vary over time.
Whether or not to use estimate the regression coefficients for the
exogenous variables as part of maximum likelihood estimation or through
the Kalman filter (i.e. recursive least squares). If
time_varying_regression is True, this must be set to False.
Whether or not to use partially conditional maximum likelihood
estimation.
If True, differencing is performed prior to estimation, which
discards the first s*D + d initial rows but results in a smaller
state-space formulation.
If False, the full SARIMAX model is put in state-space form so
that all data points can be used in estimation.
Whether or not to concentrate the scale (variance of the error term)
out of the likelihood. This reduces the number of parameters estimated
by maximum likelihood by one, but standard errors will then not
be available for the scale parameter.
The offset at which to start time trend values. Default is 1, so that
if trend='t' the trend is equal to 1, 2, ..., nobs. Typically is only
set when the model created by extending a previous dataset.
Whether or not to use exact diffuse initialization for non-stationary
states. Default is False (in which case approximate diffuse
initialization is used).
Additional keyword arguments to pass to the fit method of the
statsmodels SARIMAX model. The statsmodels SARIMAX.fit parameters
method, max_iter, start_params and disp have been moved to the
initialization of this model and will have priority over those provided
by the user using via sm_fit_kwargs.
Used when an explanatory variables, exog, are provided to select whether
or not coefficients on the exogenous estimators are allowed to vary over time.
Whether or not to use estimate the regression coefficients for the
exogenous variables as part of maximum likelihood estimation or through
the Kalman filter (i.e. recursive least squares). If
time_varying_regression is True, this must be set to False.
Whether or not to concentrate the scale (variance of the error term)
out of the likelihood. This reduces the number of parameters estimated
by maximum likelihood by one, but standard errors will then not
be available for the scale parameter.
Format of the object returned by the predict method. This is set
automatically according to the type of y used in the fit method to
train the model, 'numpy' or 'pandas'.
def__init__(self,order:tuple=(1,0,0),seasonal_order:tuple=(0,0,0,0),trend:str=None,measurement_error:bool=False,time_varying_regression:bool=False,mle_regression:bool=True,simple_differencing:bool=False,enforce_stationarity:bool=True,enforce_invertibility:bool=True,hamilton_representation:bool=False,concentrate_scale:bool=False,trend_offset:int=1,use_exact_diffuse:bool=False,dates=None,freq=None,missing='none',validate_specification:bool=True,method:str='lbfgs',maxiter:int=50,start_params:np.ndarray=None,disp:bool=False,sm_init_kwargs:dict[str,object]={},sm_fit_kwargs:dict[str,object]={},sm_predict_kwargs:dict[str,object]={})->None:self.order=orderself.seasonal_order=seasonal_orderself.trend=trendself.measurement_error=measurement_errorself.time_varying_regression=time_varying_regressionself.mle_regression=mle_regressionself.simple_differencing=simple_differencingself.enforce_stationarity=enforce_stationarityself.enforce_invertibility=enforce_invertibilityself.hamilton_representation=hamilton_representationself.concentrate_scale=concentrate_scaleself.trend_offset=trend_offsetself.use_exact_diffuse=use_exact_diffuseself.dates=datesself.freq=freqself.missing=missingself.validate_specification=validate_specificationself.method=methodself.maxiter=maxiterself.start_params=start_paramsself.disp=disp# Create the dictionaries with the additional statsmodels parameters to be # used during the init, fit and predict methods. Note that the statsmodels # SARIMAX.fit parameters `method`, `max_iter`, `start_params` and `disp` # have been moved to the initialization of this model and will have # priority over those provided by the user using via `sm_fit_kwargs`.self.sm_init_kwargs=sm_init_kwargsself.sm_fit_kwargs=sm_fit_kwargsself.sm_predict_kwargs=sm_predict_kwargs# Params that can be set with the `set_params` method_,_,_,_sarimax_params=inspect.getargvalues(inspect.currentframe())self._sarimax_params={k:vfork,vin_sarimax_params.items()ifknotin['self','_','_sarimax_params']}self._consolidate_kwargs()# Create Results Attributes self.output_type=Noneself.sarimax=Noneself.is_fitted=Falseself.sarimax_res=Noneself.training_index=Nonep,d,q=self.orderP,D,Q,m=self.seasonal_orderself.estimator_name_=f"Sarimax({p},{d},{q})({P},{D},{Q})[{m}]"
Create the dictionaries to be used during the init, fit, and predict methods.
Note that the parameters in this model's initialization take precedence
over those provided by the user using via the statsmodels kwargs dicts.
def_consolidate_kwargs(self)->None:""" Create the dictionaries to be used during the init, fit, and predict methods. Note that the parameters in this model's initialization take precedence over those provided by the user using via the statsmodels kwargs dicts. Parameters ---------- self Returns ------- None """# statsmodels.tsa.statespace.SARIMAX parameters_init_kwargs=self.sm_init_kwargs.copy()_init_kwargs.update({'order':self.order,'seasonal_order':self.seasonal_order,'trend':self.trend,'measurement_error':self.measurement_error,'time_varying_regression':self.time_varying_regression,'mle_regression':self.mle_regression,'simple_differencing':self.simple_differencing,'enforce_stationarity':self.enforce_stationarity,'enforce_invertibility':self.enforce_invertibility,'hamilton_representation':self.hamilton_representation,'concentrate_scale':self.concentrate_scale,'trend_offset':self.trend_offset,'use_exact_diffuse':self.use_exact_diffuse,'dates':self.dates,'freq':self.freq,'missing':self.missing,'validate_specification':self.validate_specification})self._init_kwargs=_init_kwargs# statsmodels.tsa.statespace.SARIMAX.fit parameters_fit_kwargs=self.sm_fit_kwargs.copy()_fit_kwargs.update({'method':self.method,'maxiter':self.maxiter,'start_params':self.start_params,'disp':self.disp,})self._fit_kwargs=_fit_kwargs# statsmodels.tsa.statespace.SARIMAXResults.get_forecast parametersself._predict_kwargs=self.sm_predict_kwargs.copy()
A helper method to create a new statsmodel SARIMAX model.
Additional keyword arguments to pass to the statsmodels SARIMAX model
when it is initialized can be added with the init_kwargs argument
when initializing the model.
def_create_sarimax(self,endog:np.ndarray|pd.Series|pd.DataFrame,exog:np.ndarray|pd.Series|pd.DataFrame|None=None)->None:""" A helper method to create a new statsmodel SARIMAX model. Additional keyword arguments to pass to the statsmodels SARIMAX model when it is initialized can be added with the `init_kwargs` argument when initializing the model. Parameters ---------- endog : numpy ndarray, pandas Series, pandas DataFrame The endogenous variable. exog : numpy ndarray, pandas Series, pandas DataFrame, default None The exogenous variables. Returns ------- None """self.sarimax=SARIMAX(endog=endog,exog=exog,**self._init_kwargs)
Additional keyword arguments to pass to the fit method of the
statsmodels SARIMAX model can be added with the fit_kwargs argument
when initializing the model.
Parameters:
Name
Type
Description
Default
y
numpy ndarray, pandas Series, pandas DataFrame
Training time series.
required
exog
numpy ndarray, pandas Series, pandas DataFrame
Exogenous variable/s included as predictor/s. Must have the same
number of observations as y and their indexes must be aligned so
that y[i] is regressed on exog[i].
deffit(self,y:np.ndarray|pd.Series|pd.DataFrame,exog:np.ndarray|pd.Series|pd.DataFrame|None=None)->None:""" Fit the model to the data. Additional keyword arguments to pass to the `fit` method of the statsmodels SARIMAX model can be added with the `fit_kwargs` argument when initializing the model. Parameters ---------- y : numpy ndarray, pandas Series, pandas DataFrame Training time series. exog : numpy ndarray, pandas Series, pandas DataFrame, default None Exogenous variable/s included as predictor/s. Must have the same number of observations as `y` and their indexes must be aligned so that y[i] is regressed on exog[i]. Returns ------- None """# Reset values in case the model has already been fitted.self.output_type=Noneself.sarimax_res=Noneself.is_fitted=Falseself.training_index=Noneself.output_type='numpy'ifisinstance(y,np.ndarray)else'pandas'self._create_sarimax(endog=y,exog=exog)self.sarimax_res=self.sarimax.fit(**self._fit_kwargs)self.is_fitted=Trueifself.output_type=='pandas':self.training_index=y.index
Forecast future values and, if desired, their confidence intervals.
Generate predictions (forecasts) n steps in the future with confidence
intervals. Note that if exogenous variables were used in the model fit,
they will be expected for the predict procedure and will fail otherwise.
Additional keyword arguments to pass to the get_forecast method of the
statsmodels SARIMAX model can be added with the predict_kwargs argument
when initializing the model.
@check_is_fitteddefpredict(self,steps:int,exog:np.ndarray|pd.Series|pd.DataFrame|None=None,return_conf_int:bool=False,alpha:float=0.05)->np.ndarray|pd.DataFrame:""" Forecast future values and, if desired, their confidence intervals. Generate predictions (forecasts) n steps in the future with confidence intervals. Note that if exogenous variables were used in the model fit, they will be expected for the predict procedure and will fail otherwise. Additional keyword arguments to pass to the `get_forecast` method of the statsmodels SARIMAX model can be added with the `predict_kwargs` argument when initializing the model. Parameters ---------- steps : int Number of steps to predict. exog : numpy ndarray, pandas Series, pandas DataFrame, default None Value of the exogenous variable/s for the next steps. The number of observations needed is the number of steps to predict. return_conf_int : bool, default False Whether to get the confidence intervals of the forecasts. alpha : float, default 0.05 The confidence intervals for the forecasts are (1 - alpha) %. Returns ------- predictions : numpy ndarray, pandas DataFrame Values predicted by the forecaster and their estimated interval. The output type is the same as the type of `y` used in the fit method. - pred: predictions. - lower_bound: lower bound of the interval. (if `return_conf_int`) - upper_bound: upper bound of the interval. (if `return_conf_int`) """# This is done because statsmodels doesn't allow `exog` length greater than# the number of stepsifexogisnotNoneandlen(exog)>steps:warnings.warn(f"When predicting using exogenous variables, the `exog` parameter "f"must have the same length as the number of predicted steps. Since "f"len(exog) > steps, only the first {steps} observations are used.")exog=exog[:steps]predictions=self.sarimax_res.get_forecast(steps=steps,exog=exog,**self._predict_kwargs)ifnotreturn_conf_int:predictions=predictions.predicted_meanifself.output_type=='pandas':predictions=predictions.rename("pred").to_frame()else:ifself.output_type=='numpy':predictions=np.column_stack([predictions.predicted_mean,predictions.conf_int(alpha=alpha)])else:predictions=pd.concat((predictions.predicted_mean,predictions.conf_int(alpha=alpha)),axis=1)predictions.columns=['pred','lower_bound','upper_bound']returnpredictions
Recreate the results object with new data appended to the original data.
Creates a new result object applied to a dataset that is created by
appending new data to the end of the model's original data [1]_. The new
results can then be used for analysis or forecasting.
Parameters:
Name
Type
Description
Default
y
numpy ndarray, pandas Series, pandas DataFrame
New observations from the modeled time-series process.
required
exog
numpy ndarray, pandas Series, pandas DataFrame
New observations of exogenous estimators, if applicable. Must have
the same number of observations as y and their indexes must be
aligned so that y[i] is regressed on exog[i].
Whether or not to copy the initialization from the current results
set to the new model.
False
**kwargs
Keyword arguments may be used to modify model specification arguments
when created the new model object.
{}
Returns:
Type
Description
None
Notes
The y and exog arguments to this method must be formatted in the same
way (e.g. Pandas Series versus Numpy array) as were the y and exog
arrays passed to the original model.
The y argument to this method should consist of new observations that
occurred directly after the last element of y. For any other kind of
dataset, see the apply method.
This method will apply filtering to all of the original data as well as
to the new data. To apply filtering only to the new data (which can be
much faster if the original dataset is large), see the extend method.
@check_is_fitteddefappend(self,y:np.ndarray|pd.Series|pd.DataFrame,exog:np.ndarray|pd.Series|pd.DataFrame|None=None,refit:bool=False,copy_initialization:bool=False,**kwargs)->None:""" Recreate the results object with new data appended to the original data. Creates a new result object applied to a dataset that is created by appending new data to the end of the model's original data [1]_. The new results can then be used for analysis or forecasting. Parameters ---------- y : numpy ndarray, pandas Series, pandas DataFrame New observations from the modeled time-series process. exog : numpy ndarray, pandas Series, pandas DataFrame, default None New observations of exogenous estimators, if applicable. Must have the same number of observations as `y` and their indexes must be aligned so that y[i] is regressed on exog[i]. refit : bool, default False Whether to re-fit the parameters, based on the combined dataset. copy_initialization : bool, default False Whether or not to copy the initialization from the current results set to the new model. **kwargs Keyword arguments may be used to modify model specification arguments when created the new model object. Returns ------- None Notes ----- The `y` and `exog` arguments to this method must be formatted in the same way (e.g. Pandas Series versus Numpy array) as were the `y` and `exog` arrays passed to the original model. The `y` argument to this method should consist of new observations that occurred directly after the last element of `y`. For any other kind of dataset, see the apply method. This method will apply filtering to all of the original data as well as to the new data. To apply filtering only to the new data (which can be much faster if the original dataset is large), see the extend method. References ---------- .. [1] Statsmodels MLEResults append API Reference. https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.mlemodel.MLEResults.append.html#statsmodels.tsa.statespace.mlemodel.MLEResults.append """fit_kwargs=self._fit_kwargsifrefitelseNoneself.sarimax_res=self.sarimax_res.append(endog=y,exog=exog,refit=refit,copy_initialization=copy_initialization,fit_kwargs=fit_kwargs,**kwargs)
Apply the fitted parameters to new data unrelated to the original data.
Creates a new result object using the current fitted parameters, applied
to a completely new dataset that is assumed to be unrelated to the model's
original data [1]_. The new results can then be used for analysis or forecasting.
Parameters:
Name
Type
Description
Default
y
numpy ndarray, pandas Series, pandas DataFrame
New observations from the modeled time-series process.
required
exog
numpy ndarray, pandas Series, pandas DataFrame
New observations of exogenous estimators, if applicable. Must have
the same number of observations as y and their indexes must be
aligned so that y[i] is regressed on exog[i].
Whether or not to copy the initialization from the current results
set to the new model.
False
**kwargs
Keyword arguments may be used to modify model specification arguments
when created the new model object.
{}
Returns:
Type
Description
None
Notes
The y argument to this method should consist of new observations that
are not necessarily related to the original model's y dataset. For
observations that continue that original dataset by follow directly after
its last element, see the append and extend methods.
@check_is_fitteddefapply(self,y:np.ndarray|pd.Series|pd.DataFrame,exog:np.ndarray|pd.Series|pd.DataFrame|None=None,refit:bool=False,copy_initialization:bool=False,**kwargs)->None:""" Apply the fitted parameters to new data unrelated to the original data. Creates a new result object using the current fitted parameters, applied to a completely new dataset that is assumed to be unrelated to the model's original data [1]_. The new results can then be used for analysis or forecasting. Parameters ---------- y : numpy ndarray, pandas Series, pandas DataFrame New observations from the modeled time-series process. exog : numpy ndarray, pandas Series, pandas DataFrame, default None New observations of exogenous estimators, if applicable. Must have the same number of observations as `y` and their indexes must be aligned so that y[i] is regressed on exog[i]. refit : bool, default False Whether to re-fit the parameters, using the new dataset. copy_initialization : bool, default False Whether or not to copy the initialization from the current results set to the new model. **kwargs Keyword arguments may be used to modify model specification arguments when created the new model object. Returns ------- None Notes ----- The `y` argument to this method should consist of new observations that are not necessarily related to the original model's `y` dataset. For observations that continue that original dataset by follow directly after its last element, see the append and extend methods. References ---------- .. [1] Statsmodels MLEResults apply API Reference. https://www.statsmodels.org/stable/generated/statsmodels.tsa.statespace.mlemodel.MLEResults.apply.html#statsmodels.tsa.statespace.mlemodel.MLEResults.apply """fit_kwargs=self._fit_kwargsifrefitelseNoneself.sarimax_res=self.sarimax_res.apply(endog=y,exog=exog,refit=refit,copy_initialization=copy_initialization,fit_kwargs=fit_kwargs,**kwargs)
Recreate the results object for new data that extends the original data.
Creates a new result object applied to a new dataset that is assumed to
follow directly from the end of the model's original data [1]_. The new
results can then be used for analysis or forecasting.
Parameters:
Name
Type
Description
Default
y
numpy ndarray, pandas Series, pandas DataFrame
New observations from the modeled time-series process.
required
exog
numpy ndarray, pandas Series, pandas DataFrame
New observations of exogenous estimators, if applicable. Must have
the same number of observations as y and their indexes must be
aligned so that y[i] is regressed on exog[i].
None
**kwargs
Keyword arguments may be used to modify model specification arguments
when created the new model object.
{}
Returns:
Type
Description
None
Notes
The y argument to this method should consist of new observations that
occurred directly after the last element of the model's original y
array. For any other kind of dataset, see the apply method.
This method will apply filtering only to the new data provided by the y
argument, which can be much faster than re-filtering the entire dataset.
However, the returned results object will only have results for the new
data. To retrieve results for both the new data and the original data,
see the append method.
@check_is_fitteddefextend(self,y:np.ndarray|pd.Series|pd.DataFrame,exog:np.ndarray|pd.Series|pd.DataFrame|None=None,**kwargs)->None:""" Recreate the results object for new data that extends the original data. Creates a new result object applied to a new dataset that is assumed to follow directly from the end of the model's original data [1]_. The new results can then be used for analysis or forecasting. Parameters ---------- y : numpy ndarray, pandas Series, pandas DataFrame New observations from the modeled time-series process. exog : numpy ndarray, pandas Series, pandas DataFrame, default None New observations of exogenous estimators, if applicable. Must have the same number of observations as `y` and their indexes must be aligned so that y[i] is regressed on exog[i]. **kwargs Keyword arguments may be used to modify model specification arguments when created the new model object. Returns ------- None Notes ----- The `y` argument to this method should consist of new observations that occurred directly after the last element of the model's original `y` array. For any other kind of dataset, see the apply method. This method will apply filtering only to the new data provided by the `y` argument, which can be much faster than re-filtering the entire dataset. However, the returned results object will only have results for the new data. To retrieve results for both the new data and the original data, see the append method. References ---------- .. [1] Statsmodels MLEResults extend API Reference. https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.mlemodel.MLEResults.extend.html#statsmodels.tsa.statespace.mlemodel.MLEResults.extend """self.sarimax_res=self.sarimax_res.extend(endog=y,exog=exog,**kwargs)
defset_params(self,**params:dict[str,object])->None:""" Set new values to the parameters of the estimator. Parameters ---------- params : dict Parameters values. Returns ------- None """params={k:vfork,vinparams.items()ifkinself._sarimax_params}forkey,valueinparams.items():setattr(self,key,value)self._sarimax_params[key]=valueself._consolidate_kwargs()# Reset values in case the model has already been fitted.self.output_type=Noneself.sarimax_res=Noneself.is_fitted=Falseself.training_index=None
defget_params(self,deep:bool=True)->dict[str,object]:""" Get the non trainable parameters of the estimator. This method is different from the `params` method, which returns the parameters of the fitted model. Parameters ---------- deep : bool, default True If `True`, will return the parameters for this estimator and contained subobjects that are estimators. Returns ------- params : dict Parameters of the estimator. """returnself._sarimax_params.copy()
Get the parameters of the model. The order of variables is the trend
coefficients, the k_exog exogenous coefficients, the k_ar AR
coefficients, and finally the k_ma MA coefficients.
@check_is_fitteddefparams(self)->np.ndarray|pd.Series:""" Get the parameters of the model. The order of variables is the trend coefficients, the `k_exog` exogenous coefficients, the `k_ar` AR coefficients, and finally the `k_ma` MA coefficients. Returns ------- params : numpy ndarray, pandas Series The parameters of the model. """returnself.sarimax_res.params
@check_is_fitteddefsummary(self,alpha:float=0.05,start:int=None)->object:""" Get a summary of the SARIMAXResults object. Parameters ---------- alpha : float, default 0.05 The confidence intervals for the forecasts are (1 - alpha) %. start : int, default None Integer of the start observation. Returns ------- summary : Summary instance This holds the summary table and text, which can be printed or converted to various output formats. """returnself.sarimax_res.summary(alpha=alpha,start=start)
@check_is_fitteddefget_info_criteria(self,criteria:str='aic',method:str='standard')->float:""" Get the selected information criteria. Check https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAXResults.info_criteria.html to know more about statsmodels info_criteria method. Parameters ---------- criteria : str, default 'aic' The information criteria to compute. Valid options are {'aic', 'bic', 'hqic'}. method : str, default 'standard' The method for information criteria computation. Default is 'standard' method; 'lutkepohl' computes the information criteria as in Lütkepohl (2007). Returns ------- metric : float The value of the selected information criteria. """ifcriterianotin['aic','bic','hqic']:raiseValueError("Invalid value for `criteria`. Valid options are 'aic', 'bic', ""and 'hqic'.")ifmethodnotin['standard','lutkepohl']:raiseValueError("Invalid value for `method`. Valid options are 'standard' and ""'lutkepohl'.")metric=self.sarimax_res.info_criteria(criteria=criteria,method=method)returnmetric
Scikit-learn style wrapper for the ETS (Error, Trend, Seasonality) model.
This estimator treats a univariate time series as input. Call fit(y)
with a 1D array-like of observations in time order, then produce
out-of-sample forecasts via predict(steps) and prediction intervals
via predict_interval(steps, level=...). In-sample diagnostics are
available through fitted_, residuals_() and summary().
Three-letter model specification (e.g., "ANN", "AAA", "MAM"):
- First letter: Error type (A=Additive, M=Multiplicative, Z=Auto)
- Second letter: Trend type (N=None, A=Additive, M=Multiplicative, Z=Auto)
- Third letter: Season type (N=None, A=Additive, M=Multiplicative, Z=Auto)
Use "ZZZ" or None for automatic model selection.
Three-letter model specification (e.g., "ANN", "AAA", "MAM"). Each letter
represents error, trend, and season types respectively, using A (Additive),
M (Multiplicative), N (None), or Z (Auto-select).
Whether to apply damping to the trend component. If None with model="ZZZ"
or model=None, both damped and non-damped models are evaluated during
automatic selection.
User-provided smoothing parameter for the trend component (0 < beta < alpha).
When None, the parameter is estimated during fitting if trend is present.
User-provided smoothing parameter for the seasonal component (0 < gamma < 1-alpha).
When None, the parameter is estimated during fitting if seasonality is present.
Type of parameter bounds used during optimization: "usual" for standard bounds,
"admissible" for stability-ensuring bounds, or "both" for their intersection.
Whether trend models are considered during automatic model selection. When None
with model="ZZZ" or model=None, this is determined automatically based on the data.
Dictionary containing the model configuration including error type, trend type,
seasonal type, damping flag, and seasonal period. Available after calling fit().
Dictionary of estimated model parameters including smoothing parameters (alpha,
beta, gamma, phi) and initial state values. Available after calling fit().
If automatic model selection was used (model="ZZZ" or model=None), this dictionary contains
the parameters of the selected best model. Otherwise, it is None.
def__init__(self,m:int=1,model:str|None="ZZZ",damped:bool|None=None,alpha:float|None=None,beta:float|None=None,gamma:float|None=None,phi:float|None=None,lambda_param:float|None=None,lambda_auto:bool=False,bias_adjust:bool=True,bounds:str="both",seasonal:bool=True,trend:bool|None=None,ic:Literal["aic","aicc","bic"]="aicc",allow_multiplicative:bool=True,allow_multiplicative_trend:bool=False,):ifnotisinstance(m,int)orm<1:raiseValueError(f"`m` must be a positive integer greater than or equal to 1. "f"Got {m}.")self.m=mself.model=modelifmodelisnotNoneelse"ZZZ"self.damped=dampedself.alpha=alphaself.beta=betaself.gamma=gammaself.phi=phiself.lambda_param=lambda_paramself.lambda_auto=lambda_autoself.bias_adjust=bias_adjustself.bounds=boundsself.seasonal=seasonalself.trend=trendself.ic=icself.allow_multiplicative=allow_multiplicativeself.allow_multiplicative_trend=allow_multiplicative_trendself.model_=Noneself.model_config_=Noneself.params_=Noneself.aic_=Noneself.bic_=Noneself.y_train_=Noneself.fitted_values_=Noneself.in_sample_residuals_=Noneself.n_features_in_=Noneself.is_memory_reduced=Falseself.is_fitted=Falseself.best_params_=Noneself.is_auto=self.model=="ZZZ"ifself.is_auto:self.estimator_name_="AutoEts()"else:self.estimator_name_=f"Ets({self.model})"
deffit(self,y:pd.Series|np.ndarray,exog:None=None)->Ets:""" Fit the ETS model to a univariate time series. Parameters ---------- y : array-like of shape (n_samples,) Time-ordered numeric sequence. exog : Ignored Exogenous variables. Ignored, present for API compatibility. Returns ------- self : Ets Fitted estimator. """self.model_=Noneself.model_config_=Noneself.params_=Noneself.aic_=Noneself.bic_=Noneself.y_train_=Noneself.fitted_values_=Noneself.in_sample_residuals_=Noneself.n_features_in_=Noneself.is_memory_reduced=Falseself.is_fitted=Falseself.best_params_=Noneifnotisinstance(y,(pd.Series,np.ndarray)):raiseValueError("`y` must be a pandas Series or numpy ndarray.")y=np.asarray(y,dtype=np.float64)ify.ndim==2andy.shape[1]==1:# Allow (n, 1) shaped arrays and squeeze to 1Dy=y.ravel()elify.ndim!=1:raiseValueError("`y` must be a 1D array-like sequence.")iflen(y)<1:raiseValueError("`y` is too short to fit ETS model.")# Automatic model selectionifself.model=="ZZZ":self.model_=auto_ets(y,m=self.m,seasonal=self.seasonal,trend=self.trend,damped=self.damped,ic=self.ic,allow_multiplicative=self.allow_multiplicative,allow_multiplicative_trend=self.allow_multiplicative_trend,lambda_auto=self.lambda_auto,verbose=False,)self.best_params_={"m":self.model_.config.m,"model":f"{self.model_.config.error}{self.model_.config.trend}{self.model_.config.season}","damped":self.model_.config.damped,"alpha":self.model_.params.alpha,"beta":self.model_.params.beta,"gamma":self.model_.params.gamma,"phi":self.model_.params.phi,"lambda_param":self.lambda_param,"lambda_auto":self.lambda_auto,"bias_adjust":self.bias_adjust,"bounds":self.bounds,"seasonal":self.seasonal,"trend":self.trend,"ic":self.ic,"allow_multiplicative":self.allow_multiplicative,"allow_multiplicative_trend":self.allow_multiplicative_trend,}else:# Fit specific modeldamped_param=Falseifself.dampedisNoneelseself.dampedself.model_=ets(y,m=self.m,model=self.model,damped=damped_param,alpha=self.alpha,beta=self.beta,gamma=self.gamma,phi=self.phi,lambda_param=self.lambda_param,lambda_auto=self.lambda_auto,bias_adjust=self.bias_adjust,bounds=self.bounds,)# Extract model attributes (use references to avoid duplicating arrays)self.model_config_=asdict(self.model_.config)self.params_=asdict(self.model_.params)self.aic_=self.model_.aicself.bic_=self.model_.bicself.y_train_=self.model_.y_originalself.fitted_values_=self.model_.fittedself.in_sample_residuals_=self.model_.residualsself.n_features_in_=1self.is_fitted=Truemodel_name=f"{self.model_config_['error']}{self.model_config_['trend']}{self.model_config_['season']}"ifself.model_config_['damped']andself.model_config_['trend']!="N":model_name=f"{self.model_config_['error']}{self.model_config_['trend']}d{self.model_config_['season']}"self.estimator_name_=f"Ets({model_name})"returnself
@check_is_fitteddefpredict(self,steps:int,exog:None=None)->np.ndarray:""" Generate mean forecasts steps ahead. Parameters ---------- steps : int Forecast horizon (must be > 0). exog : None Exogenous variables. Ignored, present for API compatibility. Returns ------- predictions : ndarray of shape (steps,) Point forecasts for steps 1..h. """ifnotisinstance(steps,(int,np.integer))orsteps<=0:raiseValueError("`steps` must be a positive integer.")predictions=forecast_ets(self.model_,h=steps,bias_adjust=self.bias_adjust,level=None)returnpredictions["mean"]
@check_is_fitteddefpredict_interval(self,steps:int=1,level:list[float]|tuple[float,...]=(80,95),as_frame:bool=True,exog:Any=None,)->np.ndarray|pd.DataFrame:""" Forecast with prediction intervals. Parameters ---------- steps : int, default 1 Forecast horizon. level : list or tuple of float, default (80, 95) Confidence levels in percent. as_frame : bool, default True If True, return a tidy DataFrame with columns 'mean', 'lower_<L>', 'upper_<L>' for each level L. If False, return a NumPy ndarray. exog : Ignored Exogenous variables. Ignored, present for API compatibility. Returns ------- predictions : numpy ndarray, pandas DataFrame If as_frame=True, pandas DataFrame with columns 'mean', 'lower_<L>', 'upper_<L>' for each level L. If as_frame=False, numpy ndarray. """ifnotisinstance(steps,(int,np.integer))orsteps<=0:raiseValueError("`steps` must be a positive integer.")raw_preds=forecast_ets(self.model_,h=steps,bias_adjust=self.bias_adjust,level=list(level))levels=list(level)iflevelisnotNoneelse[]n_levels=len(levels)mean=np.asarray(raw_preds["mean"])predictions=np.empty((steps,1+2*n_levels),dtype=float)predictions[:,0]=meanfori,lvinenumerate(levels):lv_int=int(lv)lower_key=f"lower_{lv_int}"upper_key=f"upper_{lv_int}"lower_arr=np.asarray(raw_preds[lower_key])upper_arr=np.asarray(raw_preds[upper_key])predictions[:,1+2*i]=lower_arrpredictions[:,1+2*i+1]=upper_arrifas_frame:col_names=["mean"]forlevelinlevels:level=int(level)col_names.append(f"lower_{level}")col_names.append(f"upper_{level}")predictions=pd.DataFrame(predictions,columns=col_names,index=pd.RangeIndex(1,steps+1,name="step"))returnpredictions
Get in-sample residuals (observed - fitted) from the ETS model.
Returns:
Name
Type
Description
residuals
ndarray of shape (n_samples,)
Source code in skforecast\stats\_ets.py
459460461462463464465466467468469470471
@check_is_fitteddefget_residuals(self)->np.ndarray:""" Get in-sample residuals (observed - fitted) from the ETS model. Returns ------- residuals : ndarray of shape (n_samples,) """check_memory_reduced(self,method_name='get_residuals')returnself.in_sample_residuals_
@check_is_fitteddefget_fitted_values(self)->np.ndarray:""" Get in-sample fitted values from the ETS model. Returns ------- fitted : ndarray of shape (n_samples,) """check_memory_reduced(self,method_name='get_fitted_values')returnself.fitted_values_
@check_is_fitteddefget_score(self,y:Any=None)->float:""" R^2 using in-sample fitted values. Parameters ---------- y : Ignored Present for API compatibility. Returns ------- score : float Coefficient of determination. """check_memory_reduced(self,method_name='get_score')y=self.y_train_fitted=self.fitted_values_# Handle NaN values if anymask=~(np.isnan(y)|np.isnan(fitted))ifmask.sum()<2:returnfloat("nan")ss_res=np.sum((y[mask]-fitted[mask])**2)ss_tot=np.sum((y[mask]-y[mask].mean())**2)+np.finfo(float).epsreturn1.0-ss_res/ss_tot
defget_params(self,deep:bool=True)->dict:""" Get parameters for this estimator. Parameters ---------- deep : bool, default True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns ------- params : dict Parameter names mapped to their values. """return{"m":self.m,"model":self.model,"damped":self.damped,"alpha":self.alpha,"beta":self.beta,"gamma":self.gamma,"phi":self.phi,"seasonal":self.seasonal,"trend":self.trend,"allow_multiplicative":self.allow_multiplicative,"allow_multiplicative_trend":self.allow_multiplicative_trend,}
@check_is_fitteddefget_feature_importances(self)->pd.DataFrame:"""Get feature importances for Eta model."""features=['alpha (level)']importances=[self.params_['alpha']]ifself.model_config_['trend']!='N':features.append('beta (trend)')importances.append(self.params_['beta'])ifself.model_config_['season']!='N':features.append('gamma (seasonal)')importances.append(self.params_['gamma'])ifself.model_config_['damped']:features.append('phi (damping)')importances.append(self.params_['phi'])returnpd.DataFrame({'feature':features,'importance':importances})
@check_is_fitteddefget_info_criteria(self,criteria:str)->float:""" Get information criteria. Parameters ---------- criteria : str Information criterion to retrieve. Valid options are 'aic' and 'bic'. Returns ------- info_criteria : float Value of the requested information criterion. """ifcriterianotin{'aic','bic'}:raiseValueError("Invalid value for `criteria`. Valid options are 'aic' and 'bic' ""for ETS model.")ifcriteria=='aic':value=self.aic_elifcriteria=='bic':value=self.bic_returnvalue
Set the parameters of this estimator. Internal method without resetting
the fitted state. This method is intended for internal use only, please
use set_params() instead.
def_set_params(self,**params)->None:""" Set the parameters of this estimator. Internal method without resetting the fitted state. This method is intended for internal use only, please use `set_params()` instead. Parameters ---------- **params : dict Estimator parameters. Returns ------- None """forkey,valueinparams.items():setattr(self,key,value)self.is_auto=self.modelisNoneorself.model=="ZZZ"ifself.is_auto:self.model="ZZZ"estimator_name_="AutoEts()"else:estimator_name_=f"Ets({self.model})"self.estimator_name_=estimator_name_
defset_params(self,**params)->Ets:""" Set the parameters of this estimator and reset the fitted state. This method resets the estimator to its unfitted state whenever parameters are changed, requiring the model to be refitted before making predictions. Parameters ---------- **params : dict Estimator parameters. Valid parameter keys are: 'm', 'model', 'damped', 'alpha', 'beta', 'gamma', 'phi', 'lambda_param', 'lambda_auto', 'bias_adjust', 'bounds', 'seasonal', 'trend', 'ic', 'allow_multiplicative', 'allow_multiplicative_trend'. Returns ------- self : Ets The estimator with updated parameters and reset state. Raises ------ ValueError If any parameter key is invalid. """valid_params={'m','model','damped','alpha','beta','gamma','phi','lambda_param','lambda_auto','bias_adjust','bounds','seasonal','trend','ic','allow_multiplicative','allow_multiplicative_trend'}forkeyinparams.keys():ifkeynotinvalid_params:raiseValueError(f"Invalid parameter '{key}' for estimator {self.__class__.__name__}. "f"Valid parameters are: {sorted(valid_params)}")self._set_params(**params)# Reset fitted state - model needs to be refitted with new parametersself.model_=Noneself.model_config_=Noneself.params_=Noneself.y_train_=Noneself.fitted_values_=Noneself.in_sample_residuals_=Noneself.n_features_in_=Noneself.is_memory_reduced=Falseself.is_fitted=Falseself.best_params_=Nonereturnself
Reduce memory usage by removing internal arrays not needed for prediction.
This method clears memory-heavy arrays that are only needed for diagnostics
but not for prediction. After calling this method, the following methods
will raise an error:
@check_is_fitteddefreduce_memory(self)->Ets:""" Reduce memory usage by removing internal arrays not needed for prediction. This method clears memory-heavy arrays that are only needed for diagnostics but not for prediction. After calling this method, the following methods will raise an error: - fitted_(): In-sample fitted values - residuals_(): In-sample residuals - score(): R² coefficient - summary(): Model summary statistics Prediction methods remain fully functional: - predict(): Point forecasts - predict_interval(): Prediction intervals Returns ------- self : Ets The estimator with reduced memory usage. """# Clear arrays at Ets levelself.y_train_=Noneself.fitted_values_=Noneself.in_sample_residuals_=None# Clear arrays at ETSModel levelifhasattr(self,'model_'):self.model_.fitted=Noneself.model_.residuals=Noneself.model_.y_original=Noneself.is_memory_reduced=Truereturnself
Scikit-learn style wrapper for the ARAR time-series model.
This estimator treats a univariate sequence as "the feature".
Call fit(y) with a 1D array-like of observations in time order, then
produce out-of-sample forecasts via predict(steps) and prediction intervals
via predict_interval(steps, level=...). In-sample diagnostics are available
through fitted_, residuals_() and summary().
Maximum AR depth considered for the (1, i, j, k) AR selection stage during
model fitting. When None, a default value is determined automatically based
on the series length.
Maximum lag used when estimating autocovariances during the memory-shortening
step. When None, a default value is determined automatically based on the
series length.
Whether to use safe mode. When True, the model falls back to a mean-only
forecast on numerical issues or very short series. When False, errors are
raised instead.
Raw tuple returned by the underlying ARAR algorithm containing:
(Y, best_phi, best_lag, sigma2, psi, sbar, max_ar_depth, max_lag).
Available after calling fit().
Estimated AR coefficients for the selected lags (1, i, j, k). Some
coefficients may be zero if the corresponding lag was not selected.
Available after calling fit().
Selected lag indices (1, i, j, k) used in the AR model, where each
represents which past observations contribute to the forecast.
Available after calling fit().
Memory-shortening filter coefficients used to transform the original
series into one with shorter memory before AR fitting. Available after
calling fit().
Akaike Information Criterion measuring model fit quality while penalizing
complexity. For models with exogenous variables, this is an approximate
calculation that treats the two-step procedure (regression + ARAR) as
independent stages, which may underestimate total model complexity.
Available after calling fit().
Bayesian Information Criterion, similar to AIC but with a stronger penalty
for model complexity. For models with exogenous variables, this is an
approximate calculation that treats the two-step procedure (regression +
ARAR) as independent stages, which may underestimate total model complexity.
Available after calling fit().
Fitted linear regression model for exogenous variables. When exogenous
variables are provided during fitting, this model captures their linear
relationship with the target series. Available after calling fit() with
exogenous variables.
Number of features (time series) seen during fit(). For ARAR, this is
always 1 as it handles univariate time series (present for scikit-learn
compatibility). Available after calling fit().
String identifier of the fitted model configuration (e.g., "Arar(lags=[1,2,3])").
This is updated after fitting to reflect the selected model.
Notes
When exogenous variables are provided during fitting, the model uses a
two-step approach (regression followed by ARAR on residuals). In this
approach, the target series is first regressed on the exogenous variables
using a linear regression model. The residuals from this regression,
representing the portion of the series not explained by the exogenous
variables, are then modeled using the ARAR model.
This design allows the influence of exogenous variables to be incorporated
prior to applying the ARAR model, rather than within the ARAR dynamics
themselves.
This two-step approach is necessary because the ARAR model is inherently
univariate and does not natively support exogenous variables. By separating
the regression step, the method preserves the original ARAR formulation
while still capturing the effects of external predictors.
However, this approach carries important assumptions and implications:
The relationship between the target series and the exogenous variables is
assumed to be linear and time-invariant.
The ARAR model is applied only to the residual process, meaning its
parameters describe the dynamics of the series after removing the
contribution of exogenous variables.
As a result, the interpretability of the ARAR parameters changes: they no
longer describe the full data-generating process, but rather the behavior
of the unexplained component.
Despite these limitations, this strategy provides a practical and
computationally efficient way to incorporate exogenous information into an
otherwise univariate ARAR framework.
When exogenous variables are provided during fitting, the model uses a
two-step approach (regression followed by ARAR on residuals). In this
approach, the target series is first regressed on the exogenous variables
using a linear regression model. The residuals from this regression,
representing the portion of the series not explained by the exogenous
variables, are then modeled using the ARAR model.
This design allows the influence of exogenous variables to be incorporated
prior to applying the ARAR model, rather than within the ARAR dynamics
themselves.
This two-step approach is necessary because the ARAR model is inherently
univariate and does not natively support exogenous variables. By separating
the regression step, the method preserves the original ARAR formulation
while still capturing the effects of external predictors.
However, this approach carries important assumptions and implications:
The relationship between the target series and the exogenous variables is
assumed to be linear and time-invariant.
The ARAR model is applied only to the residual process, meaning its
parameters describe the dynamics of the series after removing the
contribution of exogenous variables.
As a result, the interpretability of the ARAR parameters changes: they no
longer describe the full data-generating process, but rather the behavior
of the unexplained component.
Despite these limitations, this strategy provides a practical and
computationally efficient way to incorporate exogenous information into an
otherwise univariate ARAR framework.
deffit(self,y:np.ndarray|pd.Series,exog:np.ndarray|pd.Series|pd.DataFrame|None=None,suppress_warnings:bool=False)->"Arar":""" Fit the ARAR model to a univariate time series. Parameters ---------- y : array-like of shape (n_samples,) Time-ordered numeric sequence. exog : Series, DataFrame, or ndarray of shape (n_samples, n_exog_features), default None Exogenous variables to include in the model. See Notes section for details on how exogenous variables are handled. suppress_warnings : bool, default False If True, suppresses the warning about exogenous variables affecting model interpretation. Returns ------- self : Arar Fitted estimator. Notes ----- When exogenous variables are provided during fitting, the model uses a two-step approach (regression followed by ARAR on residuals). In this approach, the target series is first regressed on the exogenous variables using a linear regression model. The residuals from this regression, representing the portion of the series not explained by the exogenous variables, are then modeled using the ARAR model. This design allows the influence of exogenous variables to be incorporated prior to applying the ARAR model, rather than within the ARAR dynamics themselves. This two-step approach is necessary because the ARAR model is inherently univariate and does not natively support exogenous variables. By separating the regression step, the method preserves the original ARAR formulation while still capturing the effects of external predictors. However, this approach carries important assumptions and implications: - The relationship between the target series and the exogenous variables is assumed to be linear and time-invariant. - The ARAR model is applied only to the residual process, meaning its parameters describe the dynamics of the series after removing the contribution of exogenous variables. - As a result, the interpretability of the ARAR parameters changes: they no longer describe the full data-generating process, but rather the behavior of the unexplained component. Despite these limitations, this strategy provides a practical and computationally efficient way to incorporate exogenous information into an otherwise univariate ARAR framework. """self.lags_=Noneself.sigma2_=Noneself.psi_=Noneself.sbar_=Noneself.model_=Noneself.coef_=Noneself.aic_=Noneself.bic_=Noneself.exog_model_=Noneself.coef_exog_=Noneself.n_exog_features_in_=Noneself.y_train_=Noneself.fitted_values_=Noneself.in_sample_residuals_=Noneself.n_features_in_=Noneself.is_memory_reduced=Falseself.is_fitted=Falseifnotisinstance(y,(pd.Series,np.ndarray)):raiseTypeError("`y` must be a pandas Series or numpy ndarray.")ifnotisinstance(exog,(type(None),pd.Series,pd.DataFrame,np.ndarray)):raiseTypeError("`exog` must be None, a pandas Series, pandas DataFrame, or numpy ndarray.")y=np.asarray(y,dtype=float)ify.ndim==2andy.shape[1]==1:y=y.ravel()elify.ndim!=1:raiseValueError("`y` must be a 1D array-like sequence.")series_to_arar=yifexogisnotNone:ifnotsuppress_warnings:warnings.warn("Exogenous variables are being handled using a two-step approach: ""(1) linear regression on exog, (2) ARAR on residuals. ""This affects model interpretation:\n"" - ARAR coefficients (coef_) describe residual dynamics, not the original series\n"" - Pred intervals reflect only ARAR uncertainty, not exog regression uncertainty\n"" - Assumes a linear, time-invariant relationship between exog and target\n""For more details, see the fit() method's Notes section of ARAR class. ",ExogenousInterpretationWarning)exog=np.asarray(exog,dtype=float)ifexog.ndim==1:exog=exog.reshape(-1,1)elifexog.ndim!=2:raiseValueError("`exog` must be 1D or 2D.")iflen(exog)!=len(y):raiseValueError(f"Length of exog ({len(exog)}) must match length of y ({len(y)})")self.exog_model_=FastLinearRegression()self.exog_model_.fit(exog,y)self.coef_exog_=self.exog_model_.coef_series_to_arar=y-self.exog_model_.predict(exog)ifseries_to_arar.size<2andnotself.safe:raiseValueError("Series too short to fit ARAR when safe=False.")self.model_=arar(series_to_arar,max_ar_depth=self.max_ar_depth,max_lag=self.max_lag,safe=self.safe)(Y,best_phi,best_lag,sigma2,psi,sbar,max_ar_depth,max_lag)=self.model_self.max_ar_depth=max_ar_depthself.max_lag=max_lagself.lags_=tuple(best_lag)self.sigma2_=float(sigma2)self.psi_=np.asarray(psi,dtype=float)self.sbar_=float(sbar)self.coef_=np.asarray(best_phi,dtype=float)self.y_train_=yself.n_exog_features_in_=exog.shape[1]ifexogisnotNoneelse0self.n_features_in_=1self.is_memory_reduced=Falseself.is_fitted=Truearar_fitted=fitted_arar(self.model_)["fitted"]ifself.exog_model_isnotNone:exog_fitted=self.exog_model_.predict(exog)self.fitted_values_=exog_fitted+arar_fittedelse:self.fitted_values_=arar_fitted# Residuals: original y minus fitted valuesself.in_sample_residuals_=y-self.fitted_values_# Compute AIC and BIC# Note: For models with exogenous variables, this is an approximate calculation# that treats the two-step procedure (regression + ARAR) as independent stages.# This may underestimate model complexity. Use these criteria primarily for# comparing models with the same exogenous structure.largest_lag=max(self.lags_)valid_residuals=self.in_sample_residuals_[largest_lag:]# Remove NaN values for AIC/BIC calculationvalid_residuals=valid_residuals[~np.isnan(valid_residuals)]n=len(valid_residuals)ifn>0:# Count parameters:# - ARAR: 4 AR coefficients + 1 mean parameter (sbar) + 1 variance (sigma2) = 6# - Exog: n_exog coefficients + 1 intercept (if exog present)# Note: We count all 4 AR coefficients even if some are zero, as they were# selected during model fitting. The variance parameter sigma2 is also estimated.k_arar=6# 4 AR coefficients + sbar + sigma2k_exog=(self.n_exog_features_in_+1)ifself.exog_model_isnotNoneelse0# +1 for interceptk=k_arar+k_exogsigma2=max(np.sum(valid_residuals**2)/n,1e-12)# Ensure positiveloglik=-0.5*n*(np.log(2*np.pi)+np.log(sigma2)+1)self.aic_=-2*loglik+2*kself.bic_=-2*loglik+k*np.log(n)else:self.aic_=np.nanself.bic_=np.nanself.estimator_name_=f"Arar(lags={self.lags_})"returnself
@check_is_fitteddefpredict(self,steps:int,exog:np.ndarray|pd.Series|pd.DataFrame|None=None)->np.ndarray:""" Generate mean forecasts steps ahead. Parameters ---------- steps : int Forecast horizon (must be > 0) exog : ndarray, Series or DataFrame of shape (steps, n_exog_features), default None Exogenous variables for prediction. Returns ------- predictions : ndarray of shape (h,) Point forecasts for steps 1..h. """ifnotisinstance(steps,(int,np.integer))orsteps<=0:raiseValueError("`steps` must be a positive integer.")# Forecast ARAR componentpredictions=forecast(self.model_,h=steps)["mean"]ifself.exog_model_isNoneandexogisnotNone:raiseValueError("Model was fitted without exog, but `exog` was provided for prediction. ""Please refit the model with exogenous variables.")ifself.exog_model_isnotNone:ifexogisNone:raiseValueError("Model was fitted with exog, so `exog` is required for prediction.")exog=np.asarray(exog,dtype=float)ifexog.ndim==1:exog=exog.reshape(-1,1)elifexog.ndim!=2:raiseValueError("`exog` must be 1D or 2D.")# Check feature consistencyifexog.shape[1]!=self.n_exog_features_in_:raiseValueError(f"Mismatch in exogenous features: fitted with {self.n_exog_features_in_}, got {exog.shape[1]}.")iflen(exog)!=steps:raiseValueError(f"Length of exog ({len(exog)}) must match steps ({steps}).")# Forecast Regression componentexog_pred=self.exog_model_.predict(exog)predictions=predictions+exog_predreturnpredictions
If True, return a tidy DataFrame with columns 'mean', 'lower_',
'upper_' for each level L. If False, return a NumPy ndarray.
True
exog
ndarray, Series or DataFrame of shape (steps, n_exog_features)
Exogenous variables for prediction.
None
Returns:
Name
Type
Description
predictions
numpy ndarray, pandas DataFrame
If as_frame=True, pandas DataFrame with columns 'mean', 'lower_',
'upper_' for each level L. If as_frame=False, numpy ndarray.
Notes
When exogenous variables are used, prediction intervals account only for
ARAR forecast uncertainty and do not include uncertainty from the regression
coefficients. This may result in undercoverage (actual coverage < nominal level).
@check_is_fitteddefpredict_interval(self,steps:int=1,level=(80,95),as_frame:bool=True,exog:np.ndarray|pd.Series|pd.DataFrame|None=None)->np.ndarray|pd.DataFrame:""" Forecast with symmetric normal-theory prediction intervals. Parameters ---------- steps : int, default 1 Forecast horizon. level : iterable of int, default (80, 95) Confidence levels in percent. as_frame : bool, default True If True, return a tidy DataFrame with columns 'mean', 'lower_<L>', 'upper_<L>' for each level L. If False, return a NumPy ndarray. exog : ndarray, Series or DataFrame of shape (steps, n_exog_features), default None Exogenous variables for prediction. Returns ------- predictions : numpy ndarray, pandas DataFrame If as_frame=True, pandas DataFrame with columns 'mean', 'lower_<L>', 'upper_<L>' for each level L. If as_frame=False, numpy ndarray. Notes ----- When exogenous variables are used, prediction intervals account only for ARAR forecast uncertainty and do not include uncertainty from the regression coefficients. This may result in **undercoverage** (actual coverage < nominal level). """ifnotisinstance(steps,(int,np.integer))orsteps<=0:raiseValueError("`steps` must be a positive integer.")raw_preds=forecast(self.model_,h=steps,level=level)ifself.exog_model_isNoneandexogisnotNone:raiseValueError("Model was fitted without exog, but `exog` was provided for prediction. ""Please refit the model with exogenous variables.")ifself.exog_model_isnotNone:ifexogisNone:raiseValueError("Model was fitted with exog, so `exog` is required for prediction.")exog=np.asarray(exog,dtype=float)ifexog.ndim==1:exog=exog.reshape(-1,1)elifexog.ndim!=2:raiseValueError("`exog` must be 1D or 2D.")# Check feature consistencyifexog.shape[1]!=self.n_exog_features_in_:raiseValueError(f"Mismatch in exogenous features: fitted with {self.n_exog_features_in_}, "f"got {exog.shape[1]}.")iflen(exog)!=steps:raiseValueError(f"Length of exog ({len(exog)}) must match steps ({steps}).")exog_pred=self.exog_model_.predict(exog)raw_preds["mean"]=raw_preds["mean"]+exog_pred# Broadcast the exog prediction across confidence columnsraw_preds["upper"]=raw_preds["upper"]+exog_pred[:,np.newaxis]raw_preds["lower"]=raw_preds["lower"]+exog_pred[:,np.newaxis]levels=raw_preds["level"]n_levels=len(levels)cols=[raw_preds["mean"]]foriinrange(n_levels):cols.append(raw_preds["lower"][:,i])cols.append(raw_preds["upper"][:,i])predictions=np.column_stack(cols)ifas_frame:col_names=["mean"]forlevelinlevels:level=int(level)col_names.append(f"lower_{level}")col_names.append(f"upper_{level}")predictions=pd.DataFrame(predictions,columns=col_names,index=pd.RangeIndex(1,steps+1,name="step"))returnpredictions
Get in-sample residuals (observed - fitted) from the ARAR model.
Returns:
Name
Type
Description
residuals
ndarray of shape (n_samples,)
Source code in skforecast\stats\_arar.py
533534535536537538539540541542543544545
@check_is_fitteddefget_residuals(self)->np.ndarray:""" Get in-sample residuals (observed - fitted) from the ARAR model. Returns ------- residuals : ndarray of shape (n_samples,) """check_memory_reduced(self,method_name='get_residuals')returnself.in_sample_residuals_
@check_is_fitteddefget_fitted_values(self)->np.ndarray:""" Get in-sample fitted values from the ARAR model. Returns ------- fitted : ndarray of shape (n_samples,) """check_memory_reduced(self,method_name='get_fitted_values')returnself.fitted_values_
defget_params(self,deep:bool=True)->dict:""" Get parameters for this estimator. Parameters ---------- deep : bool, default True If True, will return the parameters for this estimator and contained subobjects that are estimators. Returns ------- params : dict Parameter names mapped to their values. """return{"max_ar_depth":self.max_ar_depth,"max_lag":self.max_lag,"safe":self.safe}
@check_is_fitteddefget_feature_importances(self)->pd.DataFrame:"""Get feature importances for Arar model."""importances=pd.DataFrame({'feature':[f'lag_{lag}'forlaginself.lags_],'importance':self.coef_})ifself.coef_exog_isnotNone:exog_importances=pd.DataFrame({'feature':[f'exog_{i}'foriinrange(self.coef_exog_.shape[0])],'importance':self.coef_exog_})importances=pd.concat([importances,exog_importances],ignore_index=True)warnings.warn("Exogenous variables are being handled using a two-step approach: ""(1) linear regression on exog, (2) ARAR on residuals. ""This affects model interpretation:\n"" - ARAR coefficients (coef_) describe residual dynamics, not the original series\n"" - Exogenous coefficients (coef_exog_) describe exogenous impact on original series",ExogenousInterpretationWarning)returnimportances
@check_is_fitteddefget_info_criteria(self,criteria:str)->float:""" Get information criteria. Parameters ---------- criteria : str Information criterion to retrieve. Valid options are 'aic' and 'bic'. Returns ------- info_criteria : float Value of the requested information criterion. """ifcriterianotin{'aic','bic'}:raiseValueError("Invalid value for `criteria`. Valid options are 'aic' and 'bic' ""for ARAR model.")ifcriteria=='aic':value=self.aic_else:value=self.bic_returnvalue
defset_params(self,**params)->"Arar":""" Set the parameters of this estimator and reset the fitted state. This method resets the estimator to its unfitted state whenever parameters are changed, requiring the model to be refitted before making predictions. Parameters ---------- **params : dict Estimator parameters. Valid parameter keys are 'max_ar_depth', 'max_lag', and 'safe'. Returns ------- Arar The estimator with updated parameters and reset state. """valid_params={'max_ar_depth','max_lag','safe'}forkeyinparams.keys():ifkeynotinvalid_params:raiseValueError(f"Invalid parameter '{key}' for estimator {self.__class__.__name__}. "f"Valid parameters are: {valid_params}")forkey,valueinparams.items():setattr(self,key,value)# Reset fitted stateself.lags_=Noneself.sigma2_=Noneself.psi_=Noneself.sbar_=Noneself.model_=Noneself.coef_=Noneself.aic_=Noneself.bic_=Noneself.exog_model_=Noneself.coef_exog_=Noneself.n_exog_features_in_=Noneself.y_train_=Noneself.fitted_values_=Noneself.in_sample_residuals_=Noneself.n_features_in_=Noneself.is_memory_reduced=Falseself.is_fitted=Falseself.estimator_name_="Arar()"returnself
@check_is_fitteddefsummary(self)->None:""" Print a simple textual summary of the fitted Arar model. """print(f"{self.estimator_name_} Model Summary")print("------------------")print(f"Selected AR lags: {self.lags_}")print(f"AR coefficients (phi): {np.round(self.coef_,4)}")print(f"Residual variance (sigma^2): {self.sigma2_:.4f}")print(f"Mean of shortened series (sbar): {self.sbar_:.4f}")print(f"Length of memory-shortening filter (psi): {len(self.psi_)}")ifnotself.is_memory_reduced:print("\nTime Series Summary Statistics")print(f"Number of observations: {len(self.y_train_)}")print(f"Mean: {np.mean(self.y_train_):.4f}")print(f"Std Dev: {np.std(self.y_train_,ddof=1):.4f}")print(f"Min: {np.min(self.y_train_):.4f}")print(f"25%: {np.percentile(self.y_train_,25):.4f}")print(f"Median: {np.median(self.y_train_):.4f}")print(f"75%: {np.percentile(self.y_train_,75):.4f}")print(f"Max: {np.max(self.y_train_):.4f}")print("\nModel Diagnostics")print(f"AIC: {self.aic_:.4f}")print(f"BIC: {self.bic_:.4f}")ifself.exog_model_isnotNone:print("\nExogenous Model (Linear Regression)")print("-----------------------------------")print(f"Number of features: {self.n_exog_features_in_}")print(f"Intercept: {self.exog_model_.intercept_:.4f}")print(f"Coefficients: {np.round(self.exog_model_.coef_,4)}")
Reduce memory usage by removing internal arrays not needed for prediction.
This method clears memory-heavy arrays that are only needed for diagnostics
but not for prediction. After calling this method, the following methods
will raise an error:
@check_is_fitteddefreduce_memory(self)->"Arar":""" Reduce memory usage by removing internal arrays not needed for prediction. This method clears memory-heavy arrays that are only needed for diagnostics but not for prediction. After calling this method, the following methods will raise an error: - fitted_(): In-sample fitted values - residuals_(): In-sample residuals - score(): R² coefficient - summary(): Model summary statistics Prediction methods remain fully functional: - predict(): Point forecasts - predict_interval(): Prediction intervals Returns ------- self : Arar The estimator with reduced memory usage. """self.fitted_values_=Noneself.in_sample_residuals_=Noneself.is_memory_reduced=Truereturnself