Input data¶
Working with sequential or time series data requires a consistent and regular spacing between observations. Uneven or irregularly spaced data can lead to ambiguous results and unreliable forecasts. For this reason, skforecast strictly enforces the use of regular indices.
To ensure reproducibility and clarity in forecasting tasks, skforecast only allows two types of index:
DatetimeIndex with frequency: A time-based index with a defined and regular frequency (e.g., daily, monthly).
RangeIndex with step: A default integer index, regularly spaced.
Other index types (such as DatetimeIndex
without frequency, or custom indices) are not supported, and their use will raise an error.
Number of time series¶
The skforecast library offers a variety of forecaster types, each tailored to specific requirements such as single or multiple time series, direct or recursive strategies, or custom predictors. Regardless of the specific forecaster type, all instances share the same API.
Forecaster | Single series | Multiple series | Recursive strategy | Direct strategy | Probabilistic prediction | Time series differentiation | Exogenous features | Window features |
---|---|---|---|---|---|---|---|---|
ForecasterRecursive | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ||
ForecasterDirect | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ||
ForecasterRecursiveMultiSeries | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ||
ForecasterDirectMultiVariate | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ||
ForecasterRNN | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |||
ForecasterSarimax | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
Libraries and data¶
# Libraries
# ==============================================================================
import pandas as pd
from lightgbm import LGBMRegressor
from skforecast.datasets import fetch_dataset
from skforecast.recursive import ForecasterRecursive
# Download data
# ==============================================================================
data = fetch_dataset(
name="h2o", raw=True, kwargs_read_csv={"names": ["y", "date"], "header": 0}
)
data["date"] = pd.to_datetime(data["date"], format="%Y-%m-%d")
data = data.set_index("date")
data = data.asfreq("MS")
data
h2o --- Monthly expenditure ($AUD) on corticosteroid drugs that the Australian health system had between 1991 and 2008. Hyndman R (2023). fpp3: Data for Forecasting: Principles and Practice(3rd Edition). http://pkg.robjhyndman.com/fpp3package/,https://github.com/robjhyndman /fpp3package, http://OTexts.com/fpp3. Shape of the dataset: (204, 2)
y | |
---|---|
date | |
1991-07-01 | 0.429795 |
1991-08-01 | 0.400906 |
1991-09-01 | 0.432159 |
1991-10-01 | 0.492543 |
1991-11-01 | 0.502369 |
... | ... |
2008-02-01 | 0.761822 |
2008-03-01 | 0.649435 |
2008-04-01 | 0.827887 |
2008-05-01 | 0.816255 |
2008-06-01 | 0.762137 |
204 rows × 1 columns
Train and predict using input with DatetimeIndex and frequency¶
# Index type and frequency
# ==============================================================================
print(f"Index type : {type(data.index)}")
print(f"Index frequency : {data.index.freq}")
Index type : <class 'pandas.core.indexes.datetimes.DatetimeIndex'> Index frequency : <MonthBegin>
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
regressor = LGBMRegressor(random_state=123, verbose=-1),
lags = 5
)
forecaster.fit(y=data['y'])
# Predictions
# ==============================================================================
forecaster.predict(steps=5)
2008-07-01 0.861239 2008-08-01 0.871102 2008-09-01 0.835840 2008-10-01 0.938713 2008-11-01 1.004192 Freq: MS, Name: pred, dtype: float64
Train and predict using input with RangeIndex¶
# Data without datetime index
# ==============================================================================
data = data.reset_index(drop=True)
data
y | |
---|---|
0 | 0.429795 |
1 | 0.400906 |
2 | 0.432159 |
3 | 0.492543 |
4 | 0.502369 |
... | ... |
199 | 0.761822 |
200 | 0.649435 |
201 | 0.827887 |
202 | 0.816255 |
203 | 0.762137 |
204 rows × 1 columns
# Index type and step
# ==============================================================================
print(f"Index type : {type(data.index)}")
print(f"Index step : {data.index.step}")
Index type : <class 'pandas.core.indexes.range.RangeIndex'> Index step : 1
# Fit - Predict
# ==============================================================================
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)
204 0.861239 205 0.871102 206 0.835840 207 0.938713 208 1.004192 Name: pred, dtype: float64