Input data¶

Skforecast only allows pandas series and dataframes as input (although numpy arrays are used internally for better performance). The type of pandas index is used to determine how the data is processed:

If the index is not of type DatetimeIndex, a RangeIndex is created.
If the index is of type DatetimeIndex but has no frequency, a RangeIndex is created.
If the index is of type DatetimeIndex and has a frequency, it remains unchanged.

Note

Although it is possible to use data without an associated date/time index, when using a pandas series with an associated frequency prediction results will have a more useful index.

Libraries¶

In [1]:

            
                Copied!
                
                    
                    
                
                

        
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg

Data¶

In [2]:

            
                Copied!
                
                    
                    
                
                

        
# Download data
# ==============================================================================
url = (
    'https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/'
    'data/h2o.csv'
)
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])
data['date'] = pd.to_datetime(data['date'], format='%Y/%m/%d')
data = data.set_index('date')
data = data.asfreq('MS')
data
# Download data
# ==============================================================================
url = (
    'https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/'
    'data/h2o.csv'
)
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])
data['date'] = pd.to_datetime(data['date'], format='%Y/%m/%d')
data = data.set_index('date')
data = data.asfreq('MS')
data

Out[2]:

	y
date
1991-07-01	0.429795
1991-08-01	0.400906
1991-09-01	0.432159
1991-10-01	0.492543
1991-11-01	0.502369
...	...
2008-02-01	0.761822
2008-03-01	0.649435
2008-04-01	0.827887
2008-05-01	0.816255
2008-06-01	0.762137

204 rows × 1 columns

Train and predict using input with datetime and frequency index¶

In [3]:

            
                Copied!
                
                    
                    
                
                

        
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = RandomForestRegressor(random_state=123),
                 lags = 5
             )

forecaster.fit(y=data['y'])

# Predictions
# ==============================================================================
forecaster.predict(steps=5)
# Create and fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = RandomForestRegressor(random_state=123),
                 lags = 5
             )

forecaster.fit(y=data['y'])

# Predictions
# ==============================================================================
forecaster.predict(steps=5)

Out[3]:

2008-07-01    0.714526
2008-08-01    0.789144
2008-09-01    0.818433
2008-10-01    0.845027
2008-11-01    0.914621
Freq: MS, Name: pred, dtype: float64

Train and predict using input without datetime index¶

In [4]:

            
                Copied!
                
data = data.reset_index(drop=True)
data
data = data.reset_index(drop=True)
data

Out[4]:

	y
0	0.429795
1	0.400906
2	0.432159
3	0.492543
4	0.502369
...	...
199	0.761822
200	0.649435
201	0.827887
202	0.816255
203	0.762137

204 rows × 1 columns

In [5]:

            
                Copied!
                
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)
forecaster.fit(y=data['y'])
forecaster.predict(steps=5)

Out[5]:

204    0.714526
205    0.789144
206    0.818433
207    0.845027
208    0.914621
Name: pred, dtype: float64

In [6]:

            
                Copied!
                
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>
%%html