Input data¶

Working with sequential or time series data requires a consistent and regular spacing between observations. Uneven or irregularly spaced data can lead to ambiguous results and unreliable forecasts. For this reason, skforecast strictly enforces the use of regular indices.

To ensure reproducibility and clarity in forecasting tasks, skforecast only allows two types of index:

DatetimeIndex with frequency: A time-based index with a defined and regular frequency (e.g., daily, monthly).
RangeIndex with step: A default integer index, regularly spaced.

Other index types (such as DatetimeIndex without frequency, or custom indices) are not supported, and their use will raise an error.

Number of time series¶

The skforecast library offers a variety of forecaster types, each tailored to specific requirements such as single or multiple time series, direct or recursive strategies, or custom predictors. Regardless of the specific forecaster type, all instances share the same API.

Forecaster	Single series	Multiple series	Recursive strategy	Direct strategy	Probabilistic prediction	Time series differentiation	Exogenous features	Window features
ForecasterRecursive	✔️		✔️		✔️	✔️	✔️	✔️
ForecasterDirect	✔️			✔️	✔️	✔️	✔️	✔️
ForecasterRecursiveMultiSeries		✔️	✔️		✔️	✔️	✔️	✔️
ForecasterDirectMultiVariate		✔️		✔️	✔️	✔️	✔️	✔️
ForecasterRNN	✔️	✔️		✔️	✔️		✔️
ForecasterSarimax	✔️		✔️		✔️	✔️	✔️

Libraries and data¶

In [1]:

Copied!





# Libraries
# ==============================================================================
import pandas as pd
from lightgbm import LGBMRegressor
from skforecast.datasets import fetch_dataset
from skforecast.recursive import ForecasterRecursive
# Libraries
# ==============================================================================
import pandas as pd
from lightgbm import LGBMRegressor
from skforecast.datasets import fetch_dataset
from skforecast.recursive import ForecasterRecursive

In [2]:

Copied!





# Download data
# ==============================================================================
data = fetch_dataset(
    name="h2o", raw=True, kwargs_read_csv={"names": ["y", "date"], "header": 0}
)
data["date"] = pd.to_datetime(data["date"], format="%Y-%m-%d")
data = data.set_index("date")
data = data.asfreq("MS")
data
# Download data
# ==============================================================================
data = fetch_dataset(
    name="h2o", raw=True, kwargs_read_csv={"names": ["y", "date"], "header": 0}
)
data["date"] = pd.to_datetime(data["date"], format="%Y-%m-%d")
data = data.set_index("date")
data = data.asfreq("MS")
data

h2o
---
Monthly expenditure ($AUD) on corticosteroid drugs that the Australian health
system had between 1991 and 2008.
Hyndman R (2023). fpp3: Data for Forecasting: Principles and Practice(3rd
Edition). http://pkg.robjhyndman.com/fpp3package/,https://github.com/robjhyndman
/fpp3package, http://OTexts.com/fpp3.
Shape of the dataset: (204, 2)

Out[2]:

	y
date
1991-07-01	0.429795
1991-08-01	0.400906
1991-09-01	0.432159
1991-10-01	0.492543
1991-11-01	0.502369
...	...
2008-02-01	0.761822
2008-03-01	0.649435
2008-04-01	0.827887
2008-05-01	0.816255
2008-06-01	0.762137

204 rows × 1 columns

Train and predict using input with DatetimeIndex and frequency¶

In [3]:

Copied!





# Index type and frequency
# ==============================================================================
print(f"Index type      : {type(data.index)}")
print(f"Index frequency : {data.index.freq}")
# Index type and frequency
# ==============================================================================
print(f"Index type      : {type(data.index)}")
print(f"Index frequency : {data.index.freq}")

Index type      : <class 'pandas.core.indexes.datetimes.DatetimeIndex'>
Index frequency : <MonthBegin>

In [4]:

Copied!





# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
                 regressor = LGBMRegressor(random_state=123, verbose=-1),
                 lags      = 5
             )

forecaster.fit(y=data['y'])

# Predictions
# ==============================================================================
forecaster.predict(steps=5)
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
                 regressor = LGBMRegressor(random_state=123, verbose=-1),
                 lags      = 5
             )

forecaster.fit(y=data['y'])

# Predictions
# ==============================================================================
forecaster.predict(steps=5)

Out[4]:

2008-07-01    0.861239
2008-08-01    0.871102
2008-09-01    0.835840
2008-10-01    0.938713
2008-11-01    1.004192
Freq: MS, Name: pred, dtype: float64

Train and predict using input with RangeIndex¶

In [5]:

Copied!





# Data without datetime index
# ==============================================================================
data = data.reset_index(drop=True)
data
# Data without datetime index
# ==============================================================================
data = data.reset_index(drop=True)
data

Out[5]:

	y
0	0.429795
1	0.400906
2	0.432159
3	0.492543
4	0.502369
...	...
199	0.761822
200	0.649435
201	0.827887
202	0.816255
203	0.762137

204 rows × 1 columns

In [6]:

Copied!





# Index type and step
# ==============================================================================
print(f"Index type : {type(data.index)}")
print(f"Index step : {data.index.step}")
# Index type and step
# ==============================================================================
print(f"Index type : {type(data.index)}")
print(f"Index step : {data.index.step}")

Index type : <class 'pandas.core.indexes.range.RangeIndex'>
Index step : 1

In [7]:

Copied!





# Fit - Predict
# ==============================================================================
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)
# Fit - Predict
# ==============================================================================
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)

Out[7]:

204    0.861239
205    0.871102
206    0.835840
207    0.938713
208    1.004192
Name: pred, dtype: float64