Plot forecasting residuals¶

Analyzing the residuals (errors) of predictions is useful to understand the behavior of a forecaster. The function skforecast.plot.plot_residuals creates 3 plots:

A time-ordered plot of residual values
A distribution plot that showcases the distribution of residuals
A plot showcasing the autocorrelation of residuals

By examining the residual values over time, you can determine whether there is a pattern in the errors made by the forecast model. The distribution plot helps you understand whether the residuals are normally distributed, and the autocorrelation plot helps you identify whether there are any dependencies or relationships between the residuals.

In [1]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# Libraries
# ==============================================================================
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge

from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import backtesting_forecaster
from skforecast.plot import plot_residuals
# Libraries
# ==============================================================================
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge

from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import backtesting_forecaster
from skforecast.plot import plot_residuals

In [2]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# Download data
# ==============================================================================
url = (
    'https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/'
    'data/h2o.csv'
)
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])

# Data preprocessing
# ==============================================================================
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data = data.set_index('date')
data = data.asfreq('MS')

# Plot data
# ==============================================================================
fig, ax=plt.subplots(figsize=(7, 3))
data.plot(ax=ax);
# Download data
# ==============================================================================
url = (
    'https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/'
    'data/h2o.csv'
)
data = pd.read_csv(url, sep=',', header=0, names=['y', 'date'])

# Data preprocessing
# ==============================================================================
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data = data.set_index('date')
data = data.asfreq('MS')

# Plot data
# ==============================================================================
fig, ax=plt.subplots(figsize=(7, 3))
data.plot(ax=ax);

In [3]:

                
                    Copied!
                    
                        
                        
                    
                    

            
# Train and backtest forecaster
# ==============================================================================
n_backtest = 36*3
data_train = data[:-n_backtest]
data_test  = data[-n_backtest:]

forecaster = ForecasterAutoreg(
                 regressor = Ridge(),
                 lags      = 5 
             )

metric, predictions = backtesting_forecaster(
                        forecaster         = forecaster,
                        y                  = data.y,
                        initial_train_size = len(data_train),
                        steps              = 36,
                        metric             = 'mean_squared_error',
                        verbose            = True
                     )
 
predictions.head()
# Train and backtest forecaster
# ==============================================================================
n_backtest = 36*3
data_train = data[:-n_backtest]
data_test  = data[-n_backtest:]

forecaster = ForecasterAutoreg(
                 regressor = Ridge(),
                 lags      = 5 
             )

metric, predictions = backtesting_forecaster(
                        forecaster         = forecaster,
                        y                  = data.y,
                        initial_train_size = len(data_train),
                        steps              = 36,
                        metric             = 'mean_squared_error',
                        verbose            = True
                     )
 
predictions.head()

Information of backtesting process
----------------------------------
Number of observations used for initial training: 96
Number of observations used for backtesting: 108
    Number of folds: 3
    Number of steps per fold: 36
    Number of steps to exclude from the end of each train set before test (gap): 0

Fold: 0
    Training:   1991-07-01 00:00:00 -- 1999-06-01 00:00:00  (n=96)
    Validation: 1999-07-01 00:00:00 -- 2002-06-01 00:00:00  (n=36)
Fold: 1
    Training:   1991-07-01 00:00:00 -- 1999-06-01 00:00:00  (n=96)
    Validation: 2002-07-01 00:00:00 -- 2005-06-01 00:00:00  (n=36)
Fold: 2
    Training:   1991-07-01 00:00:00 -- 1999-06-01 00:00:00  (n=96)
    Validation: 2005-07-01 00:00:00 -- 2008-06-01 00:00:00  (n=36)

  0%|          | 0/3 [00:00<?, ?it/s]

Out[3]:

	pred
1999-07-01	0.667651
1999-08-01	0.655759
1999-09-01	0.652177
1999-10-01	0.641377
1999-11-01	0.635245

The predictions_backtest function can be used in two ways. Firstly, it can be utilized with pre-calculated residuals to perform a backtest on the forecast model. Secondly, it can be used with both predicted values and actual values of the series to evaluate the accuracy of the forecast.

In [4]:

                
                    Copied!
                    
# Plot residuals
# ======================================================================================
residuals = predictions['pred'] - data_test['y']
_ = plot_residuals(residuals=residuals)
# Plot residuals
# ======================================================================================
residuals = predictions['pred'] - data_test['y']
_ = plot_residuals(residuals=residuals)

In [5]:

                
                    Copied!
                    
_ = plot_residuals(y_true=data_test['y'], y_pred=predictions['pred'])
_ = plot_residuals(y_true=data_test['y'], y_pred=predictions['pred'])

Customizing plots¶

It is possible to customize the plot by by either passing a pre-existing matplotlib figure object or using additional keyword arguments that are passed to matplotlib.pyplot.figure().

In [6]:

                
                    Copied!
                    
fig, ax = plt.subplots(1, 1, figsize=(8, 5))
_ = plot_residuals(residuals=residuals, fig=fig)
fig, ax = plt.subplots(1, 1, figsize=(8, 5))
_ = plot_residuals(residuals=residuals, fig=fig)

In [7]:

                
                    Copied!
                    
_ = plot_residuals(residuals=residuals, edgecolor="black", linewidth=15, figsize=(8, 5))
_ = plot_residuals(residuals=residuals, edgecolor="black", linewidth=15, figsize=(8, 5))

It is also possible to customize the internal aesthetics of the plots by changing plt.rcParams.

In [8]:

                
                    Copied!
                    
                        
                        
                    
                    

            
plt.style.use('fivethirtyeight')
plt.rcParams['lines.linewidth'] = 1.5
dark_style = {
    'figure.facecolor'  : '#212946',
    'axes.facecolor'    : '#212946',
    'savefig.facecolor' :'#212946',
    'axes.grid'         : True,
    'axes.grid.which'   : 'both',
    'axes.spines.left'  : False,
    'axes.spines.right' : False,
    'axes.spines.top'   : False,
    'axes.spines.bottom': False,
    'grid.color'        : '#2A3459',
    'grid.linewidth'    : '1',
    'text.color'        : '0.9',
    'axes.labelcolor'   : '0.9',
    'xtick.color'       : '0.9',
    'ytick.color'       : '0.9',
    'font.size'         : 12
}
plt.rcParams.update(dark_style)

fig, ax = plt.subplots(1, 1, figsize=(8, 5))
_ = plot_residuals(residuals=residuals, fig=fig)
plt.style.use('fivethirtyeight')
plt.rcParams['lines.linewidth'] = 1.5
dark_style = {
    'figure.facecolor'  : '#212946',
    'axes.facecolor'    : '#212946',
    'savefig.facecolor' :'#212946',
    'axes.grid'         : True,
    'axes.grid.which'   : 'both',
    'axes.spines.left'  : False,
    'axes.spines.right' : False,
    'axes.spines.top'   : False,
    'axes.spines.bottom': False,
    'grid.color'        : '#2A3459',
    'grid.linewidth'    : '1',
    'text.color'        : '0.9',
    'axes.labelcolor'   : '0.9',
    'xtick.color'       : '0.9',
    'ytick.color'       : '0.9',
    'font.size'         : 12
}
plt.rcParams.update(dark_style)

fig, ax = plt.subplots(1, 1, figsize=(8, 5))
_ = plot_residuals(residuals=residuals, fig=fig)

In [9]:

                
                    Copied!
                    
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>
%%html