Input data¶
Skforecast only allows pandas series and dataframes as input (although numpy arrays are used internally for better performance). The type of pandas index is used to determine how the data is processed:
If the index is not of type DatetimeIndex, a RangeIndex is created.
If the index is of type DatetimeIndex but has no frequency, a RangeIndex is created.
If the index is of type DatetimeIndex and has a frequency, it remains unchanged.
✎ Note
Although it is possible to use data without an associated date/time index, when using a pandas series with an associated frequency prediction results will have a more useful index.
Libraries¶
In [1]:
Copied!
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from lightgbm import LGBMRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.datasets import fetch_dataset
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from lightgbm import LGBMRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.datasets import fetch_dataset
Data¶
In [2]:
Copied!
# Download data
# ==============================================================================
data = fetch_dataset(
name="h2o", raw=True, kwargs_read_csv={"names": ["y", "date"], "header": 0}
)
data["date"] = pd.to_datetime(data["date"], format="%Y-%m-%d")
data = data.set_index("date")
data = data.asfreq("MS")
data
# Download data
# ==============================================================================
data = fetch_dataset(
name="h2o", raw=True, kwargs_read_csv={"names": ["y", "date"], "header": 0}
)
data["date"] = pd.to_datetime(data["date"], format="%Y-%m-%d")
data = data.set_index("date")
data = data.asfreq("MS")
data
h2o --- Monthly expenditure ($AUD) on corticosteroid drugs that the Australian health system had between 1991 and 2008. Hyndman R (2023). fpp3: Data for Forecasting: Principles and Practice(3rd Edition). http://pkg.robjhyndman.com/fpp3package/,https://github.com/robjhyndman /fpp3package, http://OTexts.com/fpp3. Shape of the dataset: (204, 2)
Out[2]:
y | |
---|---|
date | |
1991-07-01 | 0.429795 |
1991-08-01 | 0.400906 |
1991-09-01 | 0.432159 |
1991-10-01 | 0.492543 |
1991-11-01 | 0.502369 |
... | ... |
2008-02-01 | 0.761822 |
2008-03-01 | 0.649435 |
2008-04-01 | 0.827887 |
2008-05-01 | 0.816255 |
2008-06-01 | 0.762137 |
204 rows × 1 columns
Train and predict using input with datetime and frequency index¶
In [3]:
Copied!
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = LGBMRegressor(random_state=123, verbose=-1),
lags = 5
)
forecaster.fit(y=data['y'])
# Predictions
# ==============================================================================
forecaster.predict(steps=5)
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = LGBMRegressor(random_state=123, verbose=-1),
lags = 5
)
forecaster.fit(y=data['y'])
# Predictions
# ==============================================================================
forecaster.predict(steps=5)
Out[3]:
2008-07-01 0.861239 2008-08-01 0.871102 2008-09-01 0.835840 2008-10-01 0.938713 2008-11-01 1.004192 Freq: MS, Name: pred, dtype: float64
Train and predict using input without datetime index¶
In [4]:
Copied!
# Data without datetime index
# ==============================================================================
data = data.reset_index(drop=True)
data
# Data without datetime index
# ==============================================================================
data = data.reset_index(drop=True)
data
Out[4]:
y | |
---|---|
0 | 0.429795 |
1 | 0.400906 |
2 | 0.432159 |
3 | 0.492543 |
4 | 0.502369 |
... | ... |
199 | 0.761822 |
200 | 0.649435 |
201 | 0.827887 |
202 | 0.816255 |
203 | 0.762137 |
204 rows × 1 columns
In [5]:
Copied!
# Fit - Predict
# ==============================================================================
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)
# Fit - Predict
# ==============================================================================
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)
Out[5]:
204 0.861239 205 0.871102 206 0.835840 207 0.938713 208 1.004192 Name: pred, dtype: float64