Input data¶
Skforecast only allows pandas series and dataframes as input (although numpy arrays are used internally for better performance). The type of pandas index is used to determine how the data is processed:
If the index is not of type DatetimeIndex, a RangeIndex is created.
If the index is of type DatetimeIndex but has no frequency, a RangeIndex is created.
If the index is of type DatetimeIndex and has a frequency, it remains unchanged.
Note
Although it is possible to use data without an associated date/time index, when using a pandas series with an associated frequency prediction results will have a more useful index.
Libraries¶
In [11]:
Copied!
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.datasets import fetch_dataset
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.datasets import fetch_dataset
Data¶
In [12]:
Copied!
# Download data
# ==============================================================================
data = fetch_dataset(
name="h2o", raw=True, kwargs_read_csv={"names": ["y", "date"], "header": 0}
)
data["date"] = pd.to_datetime(data["date"], format="%Y-%m-%d")
data = data.set_index("date")
data = data.asfreq("MS")
data
# Download data
# ==============================================================================
data = fetch_dataset(
name="h2o", raw=True, kwargs_read_csv={"names": ["y", "date"], "header": 0}
)
data["date"] = pd.to_datetime(data["date"], format="%Y-%m-%d")
data = data.set_index("date")
data = data.asfreq("MS")
data
h2o --- Monthly expenditure ($AUD) on corticosteroid drugs that the Australian health system had between 1991 and 2008. Hyndman R (2023). fpp3: Data for Forecasting: Principles and Practice(3rd Edition). http://pkg.robjhyndman.com/fpp3package/,https://github.com/robjhyndman /fpp3package, http://OTexts.com/fpp3. Shape of the dataset: (204, 2)
Out[12]:
y | |
---|---|
date | |
1991-07-01 | 0.429795 |
1991-08-01 | 0.400906 |
1991-09-01 | 0.432159 |
1991-10-01 | 0.492543 |
1991-11-01 | 0.502369 |
... | ... |
2008-02-01 | 0.761822 |
2008-03-01 | 0.649435 |
2008-04-01 | 0.827887 |
2008-05-01 | 0.816255 |
2008-06-01 | 0.762137 |
204 rows × 1 columns
Train and predict using input with datetime and frequency index¶
In [13]:
Copied!
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 5
)
forecaster.fit(y=data['y'])
# Predictions
# ==============================================================================
forecaster.predict(steps=5)
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 5
)
forecaster.fit(y=data['y'])
# Predictions
# ==============================================================================
forecaster.predict(steps=5)
Out[13]:
2008-07-01 0.714526 2008-08-01 0.789144 2008-09-01 0.818433 2008-10-01 0.845027 2008-11-01 0.914621 Freq: MS, Name: pred, dtype: float64
Train and predict using input without datetime index¶
In [14]:
Copied!
# Data without datetime index
# ==============================================================================
data = data.reset_index(drop=True)
data
# Data without datetime index
# ==============================================================================
data = data.reset_index(drop=True)
data
Out[14]:
y | |
---|---|
0 | 0.429795 |
1 | 0.400906 |
2 | 0.432159 |
3 | 0.492543 |
4 | 0.502369 |
... | ... |
199 | 0.761822 |
200 | 0.649435 |
201 | 0.827887 |
202 | 0.816255 |
203 | 0.762137 |
204 rows × 1 columns
In [15]:
Copied!
# Fit - Predict
# ==============================================================================
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)
# Fit - Predict
# ==============================================================================
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)
Out[15]:
204 0.714526 205 0.789144 206 0.818433 207 0.845027 208 0.914621 Name: pred, dtype: float64