Input data¶

Skforecast only allows pandas series and dataframes as input (although numpy arrays are used internally for better performance). The type of pandas index is used to determine how the data is processed:

If the index is not of type DatetimeIndex, a RangeIndex is created.
If the index is of type DatetimeIndex but has no frequency, a RangeIndex is created.
If the index is of type DatetimeIndex and has a frequency, it remains unchanged.

✎ Note

Although it is possible to use data without an associated date/time index, when using a pandas series with an associated frequency prediction results will have a more useful index.

Libraries and data¶

In [1]:

Copied!





# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from lightgbm import LGBMRegressor
from skforecast.datasets import fetch_dataset
from skforecast.recursive import ForecasterRecursive
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from lightgbm import LGBMRegressor
from skforecast.datasets import fetch_dataset
from skforecast.recursive import ForecasterRecursive

In [2]:

Copied!





# Download data
# ==============================================================================
data = fetch_dataset(
    name="h2o", raw=True, kwargs_read_csv={"names": ["y", "date"], "header": 0}
)
data["date"] = pd.to_datetime(data["date"], format="%Y-%m-%d")
data = data.set_index("date")
data = data.asfreq("MS")
data
# Download data
# ==============================================================================
data = fetch_dataset(
    name="h2o", raw=True, kwargs_read_csv={"names": ["y", "date"], "header": 0}
)
data["date"] = pd.to_datetime(data["date"], format="%Y-%m-%d")
data = data.set_index("date")
data = data.asfreq("MS")
data

h2o
---
Monthly expenditure ($AUD) on corticosteroid drugs that the Australian health
system had between 1991 and 2008.
Hyndman R (2023). fpp3: Data for Forecasting: Principles and Practice(3rd
Edition). http://pkg.robjhyndman.com/fpp3package/,https://github.com/robjhyndman
/fpp3package, http://OTexts.com/fpp3.
Shape of the dataset: (204, 2)

Out[2]:

	y
date
1991-07-01	0.429795
1991-08-01	0.400906
1991-09-01	0.432159
1991-10-01	0.492543
1991-11-01	0.502369
...	...
2008-02-01	0.761822
2008-03-01	0.649435
2008-04-01	0.827887
2008-05-01	0.816255
2008-06-01	0.762137

204 rows × 1 columns

Train and predict using input with datetime and frequency index¶

In [3]:

Copied!





# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
                 regressor = LGBMRegressor(random_state=123, verbose=-1),
                 lags      = 5
             )

forecaster.fit(y=data['y'])

# Predictions
# ==============================================================================
forecaster.predict(steps=5)
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
                 regressor = LGBMRegressor(random_state=123, verbose=-1),
                 lags      = 5
             )

forecaster.fit(y=data['y'])

# Predictions
# ==============================================================================
forecaster.predict(steps=5)

Out[3]:

2008-07-01    0.861239
2008-08-01    0.871102
2008-09-01    0.835840
2008-10-01    0.938713
2008-11-01    1.004192
Freq: MS, Name: pred, dtype: float64

Train and predict using input without datetime index¶

In [4]:

Copied!





# Data without datetime index
# ==============================================================================
data = data.reset_index(drop=True)
data
# Data without datetime index
# ==============================================================================
data = data.reset_index(drop=True)
data

Out[4]:

	y
0	0.429795
1	0.400906
2	0.432159
3	0.492543
4	0.502369
...	...
199	0.761822
200	0.649435
201	0.827887
202	0.816255
203	0.762137

204 rows × 1 columns

In [5]:

Copied!





# Fit - Predict
# ==============================================================================
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)
# Fit - Predict
# ==============================================================================
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)

Out[5]:

204    0.861239
205    0.871102
206    0.835840
207    0.938713
208    1.004192
Name: pred, dtype: float64