Input data¶
Since Version 0.4.0 only pandas series and dataframes are allowed (although internally numpy arrays are used for performance). Based on the type of pandas index, the following rules are applied:
If the index is not of type DatetimeIndex, a RangeIndex is created.
If the index is of type DatetimeIndex and but has no frequency, a RangeIndex is created.
If the index is of type DatetimeIndex and has a frequency, nothing is changed.
  Note
There is nothing wrong with using data that does not have an associated date/time index. However, if a pandas series with an associated frequency is used, results will have a more useful index.Libraries¶
In [1]:
Copied!
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
Data¶
In [2]:
Copied!
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])
data['date'] = pd.to_datetime(data['date'], format='%Y/%m/%d')
data = data.set_index('date')
data = data.asfreq('MS')
data
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])
data['date'] = pd.to_datetime(data['date'], format='%Y/%m/%d')
data = data.set_index('date')
data = data.asfreq('MS')
data
Out[2]:
y | |
---|---|
date | |
1991-07-01 | 0.429795 |
1991-08-01 | 0.400906 |
1991-09-01 | 0.432159 |
1991-10-01 | 0.492543 |
1991-11-01 | 0.502369 |
... | ... |
2008-02-01 | 0.761822 |
2008-03-01 | 0.649435 |
2008-04-01 | 0.827887 |
2008-05-01 | 0.816255 |
2008-06-01 | 0.762137 |
204 rows × 1 columns
Train and predict using input with datetime and frequency index¶
In [3]:
Copied!
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 5
)
forecaster.fit(y=data['y'])
# Predictions
# ==============================================================================
forecaster.predict(steps=5)
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=123),
lags = 5
)
forecaster.fit(y=data['y'])
# Predictions
# ==============================================================================
forecaster.predict(steps=5)
Out[3]:
2008-07-01 0.714526 2008-08-01 0.789144 2008-09-01 0.818433 2008-10-01 0.845027 2008-11-01 0.914621 Freq: MS, Name: pred, dtype: float64
Train and predict using input without datetime index¶
In [4]:
Copied!
data = data.reset_index(drop=True)
data.head()
data = data.reset_index(drop=True)
data.head()
Out[4]:
y | |
---|---|
0 | 0.429795 |
1 | 0.400906 |
2 | 0.432159 |
3 | 0.492543 |
4 | 0.502369 |
In [5]:
Copied!
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)
Out[5]:
204 0.714526 205 0.789144 206 0.818433 207 0.845027 208 0.914621 Name: pred, dtype: float64
In [6]:
Copied!
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>
%%html