Input data¶
Skforecast only allows pandas series and dataframes as input (although numpy arrays are used internally for better performance). The type of pandas index is used to determine how the data is processed:
If the index is not of type DatetimeIndex, a RangeIndex is created.
If the index is of type DatetimeIndex but has no frequency, a RangeIndex is created.
If the index is of type DatetimeIndex and has a frequency, it remains unchanged.
  Note
Although it is possible to use data without an associated date/time index, when using a pandas series with an associated frequency prediction results will have a more useful index.Libraries¶
In [1]:
Copied!
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
Data¶
In [2]:
Copied!
# Download data
# ==============================================================================
url = (
'https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/'
'data/h2o.csv'
)
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])
data['date'] = pd.to_datetime(data['date'], format='%Y/%m/%d')
data = data.set_index('date')
data = data.asfreq('MS')
data
# Download data
# ==============================================================================
url = (
'https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/'
'data/h2o.csv'
)
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])
data['date'] = pd.to_datetime(data['date'], format='%Y/%m/%d')
data = data.set_index('date')
data = data.asfreq('MS')
data
Out[2]:
y | |
---|---|
date | |
1991-07-01 | 0.429795 |
1991-08-01 | 0.400906 |
1991-09-01 | 0.432159 |
1991-10-01 | 0.492543 |
1991-11-01 | 0.502369 |
... | ... |
2008-02-01 | 0.761822 |
2008-03-01 | 0.649435 |
2008-04-01 | 0.827887 |
2008-05-01 | 0.816255 |
2008-06-01 | 0.762137 |
204 rows × 1 columns
Train and predict using input with datetime and frequency index¶
In [3]:
Copied!
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 5
)
forecaster.fit(y=data['y'])
# Predictions
# ==============================================================================
forecaster.predict(steps=5)
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 5
)
forecaster.fit(y=data['y'])
# Predictions
# ==============================================================================
forecaster.predict(steps=5)
Out[3]:
2008-07-01 0.714526 2008-08-01 0.789144 2008-09-01 0.818433 2008-10-01 0.845027 2008-11-01 0.914621 Freq: MS, Name: pred, dtype: float64
Train and predict using input without datetime index¶
In [4]:
Copied!
data = data.reset_index(drop=True)
data
data = data.reset_index(drop=True)
data
Out[4]:
y | |
---|---|
0 | 0.429795 |
1 | 0.400906 |
2 | 0.432159 |
3 | 0.492543 |
4 | 0.502369 |
... | ... |
199 | 0.761822 |
200 | 0.649435 |
201 | 0.827887 |
202 | 0.816255 |
203 | 0.762137 |
204 rows × 1 columns
In [5]:
Copied!
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)
Out[5]:
204 0.714526 205 0.789144 206 0.818433 207 0.845027 208 0.914621 Name: pred, dtype: float64
In [6]:
Copied!
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>
%%html