Backtesting¶

Backtesting is a term used in modeling to refer to testing a predictive model on historical data. Backtesting involves moving backward in time, step-by-step, in as many stages as is necessary. Therefore, it is a special type of cross-validation applied to previous period(s).

Backtesting with refit and increasing training size (fixed origin)

The model is trained each time before making predictions. With this configuration, the model uses all the data available so far. It is a variation of the standard cross-validation but, instead of making a random distribution of the observations, the training set increases sequentially, maintaining the temporal order of the data.

Backtesting with refit and fixed training size (rolling origin)

A technique similar to the previous one but, in this case, the forecast origin rolls forward, therefore, the size of training remains constant. This is also known as time series cross-validation or walk-forward validation.

Backtesting without refit

After an initial train, the model is used sequentially without updating it and following the temporal order of the data. This strategy has the advantage of being much faster since the model is trained only once. However, the model does not incorporate the latest information available, so it may lose predictive capacity over time.

Libraries¶

In [1]:

            
                Copied!
                
                    
                    
                
                

        
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import backtesting_forecaster
from sklearn.linear_model import Ridge
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error
# Libraries
# ==============================================================================
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import backtesting_forecaster
from sklearn.linear_model import Ridge
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error

Data¶

In [2]:

            
                Copied!
                
                    
                    
                
                

        
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'datetime'])

# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data[['y']]
data = data.sort_index()

# Train-validation dates
# ==============================================================================
end_train = '2002-01-01 23:59:00'

print(f"Train dates      : {data.index.min()} --- {data.loc[:end_train].index.max()}  (n={len(data.loc[:end_train])})")
print(f"Validation dates : {data.loc[end_train:].index.min()} --- {data.index.max()}  (n={len(data.loc[end_train:])})")

# Plot
# ==============================================================================
fig, ax=plt.subplots(figsize=(9, 4))
data.loc[:end_train].plot(ax=ax, label='train')
data.loc[end_train:].plot(ax=ax, label='validation')
ax.legend()
plt.show()

display(data.head(4))
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o.csv')
data = pd.read_csv(url, sep=',', header=0, names=['y', 'datetime'])

# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data[['y']]
data = data.sort_index()

# Train-validation dates
# ==============================================================================
end_train = '2002-01-01 23:59:00'

print(f"Train dates      : {data.index.min()} --- {data.loc[:end_train].index.max()}  (n={len(data.loc[:end_train])})")
print(f"Validation dates : {data.loc[end_train:].index.min()} --- {data.index.max()}  (n={len(data.loc[end_train:])})")

# Plot
# ==============================================================================
fig, ax=plt.subplots(figsize=(9, 4))
data.loc[:end_train].plot(ax=ax, label='train')
data.loc[end_train:].plot(ax=ax, label='validation')
ax.legend()
plt.show()

display(data.head(4))

Train dates      : 1991-07-01 00:00:00 --- 2002-01-01 00:00:00  (n=127)
Validation dates : 2002-02-01 00:00:00 --- 2008-06-01 00:00:00  (n=77)

	y
datetime
1991-07-01	0.429795
1991-08-01	0.400906
1991-09-01	0.432159
1991-10-01	0.492543

Backtest¶

The backtesting process adapted to this scenario is:

backtesting_forecaster creates a copy of the forecaster object and trains the model with the length of the series set in initial_train_size.
It predicts and stores the next 10 steps, steps=10.
Since refit=True, the training set increases to a length of initial_train_size + steps, and the test data becomes the following 10 steps.
The model is re-trained with the new training set. The new 10 steps are then predicted.
This process is repeated until the entire series has been run.

In [3]:

            
                Copied!
                
                    
                    
                
                

        
# Backtest forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 15 
             )

metric, predictions_backtest = backtesting_forecaster(
                                   forecaster         = forecaster,
                                   y                  = data['y'],
                                   initial_train_size = len(data.loc[:end_train]),
                                   fixed_train_size   = False,
                                   steps              = 10,
                                   metric             = 'mean_squared_error',
                                   refit              = True,
                                   verbose            = True
                               )
# Backtest forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 15 
             )

metric, predictions_backtest = backtesting_forecaster(
                                   forecaster         = forecaster,
                                   y                  = data['y'],
                                   initial_train_size = len(data.loc[:end_train]),
                                   fixed_train_size   = False,
                                   steps              = 10,
                                   metric             = 'mean_squared_error',
                                   refit              = True,
                                   verbose            = True
                               )

Information of backtesting process
----------------------------------
Number of observations used for initial training: 127
Number of observations used for backtesting: 77
    Number of folds: 8
    Number of steps per fold: 10
    Last fold only includes 7 observations.

Data partition in fold: 0
    Training:   1991-07-01 00:00:00 -- 2002-01-01 00:00:00  (n=127)
    Validation: 2002-02-01 00:00:00 -- 2002-11-01 00:00:00  (n=10)
Data partition in fold: 1
    Training:   1991-07-01 00:00:00 -- 2002-11-01 00:00:00  (n=137)
    Validation: 2002-12-01 00:00:00 -- 2003-09-01 00:00:00  (n=10)
Data partition in fold: 2
    Training:   1991-07-01 00:00:00 -- 2003-09-01 00:00:00  (n=147)
    Validation: 2003-10-01 00:00:00 -- 2004-07-01 00:00:00  (n=10)
Data partition in fold: 3
    Training:   1991-07-01 00:00:00 -- 2004-07-01 00:00:00  (n=157)
    Validation: 2004-08-01 00:00:00 -- 2005-05-01 00:00:00  (n=10)
Data partition in fold: 4
    Training:   1991-07-01 00:00:00 -- 2005-05-01 00:00:00  (n=167)
    Validation: 2005-06-01 00:00:00 -- 2006-03-01 00:00:00  (n=10)
Data partition in fold: 5
    Training:   1991-07-01 00:00:00 -- 2006-03-01 00:00:00  (n=177)
    Validation: 2006-04-01 00:00:00 -- 2007-01-01 00:00:00  (n=10)
Data partition in fold: 6
    Training:   1991-07-01 00:00:00 -- 2007-01-01 00:00:00  (n=187)
    Validation: 2007-02-01 00:00:00 -- 2007-11-01 00:00:00  (n=10)
Data partition in fold: 7
    Training:   1991-07-01 00:00:00 -- 2007-11-01 00:00:00  (n=197)
    Validation: 2007-12-01 00:00:00 -- 2008-06-01 00:00:00  (n=7)

In [4]:

            
                Copied!
                
print(f"Backtest error: {metric}")
print(f"Backtest error: {metric}")

Backtest error: 0.00818535931502708

In [5]:

            
                Copied!
                
predictions_backtest.head(4)
predictions_backtest.head(4)

Out[5]:

	pred
2002-02-01	0.594506
2002-03-01	0.785886
2002-04-01	0.698925
2002-05-01	0.790560

In [6]:

            
                Copied!
                
fig, ax = plt.subplots(figsize=(9, 4))
data.loc[end_train:, 'y'].plot(ax=ax)
predictions_backtest.plot(ax=ax)
ax.legend();
fig, ax = plt.subplots(figsize=(9, 4))
data.loc[end_train:, 'y'].plot(ax=ax)
predictions_backtest.plot(ax=ax)
ax.legend();

Backtest with prediction intervals¶

In [7]:

            
                Copied!
                
                    
                    
                
                

        
# Backtest forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = Ridge(),
                 lags      = 15 
             )

metric, predictions_backtest = backtesting_forecaster(
                                   forecaster         = forecaster,
                                   y                  = data['y'],
                                   initial_train_size = len(data.loc[:end_train]),
                                   fixed_train_size   = False,
                                   steps              = 10,
                                   metric             = 'mean_squared_error',
                                   refit              = True,
                                   interval           = [5, 95],
                                   n_boot             = 500,
                                   verbose            = True
                               )
# Backtest forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = Ridge(),
                 lags      = 15 
             )

metric, predictions_backtest = backtesting_forecaster(
                                   forecaster         = forecaster,
                                   y                  = data['y'],
                                   initial_train_size = len(data.loc[:end_train]),
                                   fixed_train_size   = False,
                                   steps              = 10,
                                   metric             = 'mean_squared_error',
                                   refit              = True,
                                   interval           = [5, 95],
                                   n_boot             = 500,
                                   verbose            = True
                               )

Information of backtesting process
----------------------------------
Number of observations used for initial training: 127
Number of observations used for backtesting: 77
    Number of folds: 8
    Number of steps per fold: 10
    Last fold only includes 7 observations.

Data partition in fold: 0
    Training:   1991-07-01 00:00:00 -- 2002-01-01 00:00:00  (n=127)
    Validation: 2002-02-01 00:00:00 -- 2002-11-01 00:00:00  (n=10)
Data partition in fold: 1
    Training:   1991-07-01 00:00:00 -- 2002-11-01 00:00:00  (n=137)
    Validation: 2002-12-01 00:00:00 -- 2003-09-01 00:00:00  (n=10)
Data partition in fold: 2
    Training:   1991-07-01 00:00:00 -- 2003-09-01 00:00:00  (n=147)
    Validation: 2003-10-01 00:00:00 -- 2004-07-01 00:00:00  (n=10)
Data partition in fold: 3
    Training:   1991-07-01 00:00:00 -- 2004-07-01 00:00:00  (n=157)
    Validation: 2004-08-01 00:00:00 -- 2005-05-01 00:00:00  (n=10)
Data partition in fold: 4
    Training:   1991-07-01 00:00:00 -- 2005-05-01 00:00:00  (n=167)
    Validation: 2005-06-01 00:00:00 -- 2006-03-01 00:00:00  (n=10)
Data partition in fold: 5
    Training:   1991-07-01 00:00:00 -- 2006-03-01 00:00:00  (n=177)
    Validation: 2006-04-01 00:00:00 -- 2007-01-01 00:00:00  (n=10)
Data partition in fold: 6
    Training:   1991-07-01 00:00:00 -- 2007-01-01 00:00:00  (n=187)
    Validation: 2007-02-01 00:00:00 -- 2007-11-01 00:00:00  (n=10)
Data partition in fold: 7
    Training:   1991-07-01 00:00:00 -- 2007-11-01 00:00:00  (n=197)
    Validation: 2007-12-01 00:00:00 -- 2008-06-01 00:00:00  (n=7)

In [8]:

            
                Copied!
                
predictions_backtest.head()
predictions_backtest.head()

Out[8]:

	pred	lower_bound	upper_bound
2002-02-01	0.703579	0.599636	0.834616
2002-03-01	0.673522	0.572605	0.787942
2002-04-01	0.698319	0.589196	0.815748
2002-05-01	0.703042	0.593719	0.831795
2002-06-01	0.733776	0.632476	0.842865

In [9]:

            
                Copied!
                
                    
                    
                
                

        
fig, ax=plt.subplots(figsize=(9, 4))
data.loc[end_train:, 'y'].plot(ax=ax, label='test')
predictions_backtest['pred'].plot(ax=ax, label='predictions')
ax.fill_between(
    predictions_backtest.index,
    predictions_backtest['lower_bound'],
    predictions_backtest['upper_bound'],
    color = 'red',
    alpha = 0.2,
    label = 'prediction interval'
)
ax.legend();
fig, ax=plt.subplots(figsize=(9, 4))
data.loc[end_train:, 'y'].plot(ax=ax, label='test')
predictions_backtest['pred'].plot(ax=ax, label='predictions')
ax.fill_between(
    predictions_backtest.index,
    predictions_backtest['lower_bound'],
    predictions_backtest['upper_bound'],
    color = 'red',
    alpha = 0.2,
    label = 'prediction interval'
)
ax.legend();

Predictions on training data¶

Predictions on training data can be obtained either by using the backtesting_forecaster() function or by accessing the predict() method of the regressor stored inside the forecaster object.

Predict using backtesting_forecaster()¶

A trained forecaster is needed.

In [10]:

            
                Copied!
                
                    
                    
                
                

        
# Fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 15 
             )

forecaster.fit(y=data['y'])
# Fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 15 
             )

forecaster.fit(y=data['y'])

Set arguments initial_train_size=None and refit=False to perform backtesting using the already trained forecaster.

In [11]:

            
                Copied!
                
                    
                    
                
                

        
# Backtest train data
# ==============================================================================
metric, predictions_train = backtesting_forecaster(
                                forecaster         = forecaster,
                                y                  = data['y'],
                                initial_train_size = None,
                                steps              = 1,
                                metric             = 'mean_squared_error',
                                refit              = False,
                                verbose            = False
                            )

print(f"Backtest training error: {metric}")
# Backtest train data
# ==============================================================================
metric, predictions_train = backtesting_forecaster(
                                forecaster         = forecaster,
                                y                  = data['y'],
                                initial_train_size = None,
                                steps              = 1,
                                metric             = 'mean_squared_error',
                                refit              = False,
                                verbose            = False
                            )

print(f"Backtest training error: {metric}")

Backtest training error: 0.0005392479040738611

In [12]:

            
                Copied!
                
predictions_train.head(4)
predictions_train.head(4)

Out[12]:

	pred
1992-10-01	0.553611
1992-11-01	0.568324
1992-12-01	0.735167
1993-01-01	0.723217

The first 15 observations are not predicted since they are needed to create the lags used as predictors.

In [13]:

            
                Copied!
                
                    
                    
                
                

        
# Plot training predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(9, 4))
data.plot(ax=ax)
predictions_train.plot(ax=ax)
ax.legend();
# Plot training predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(9, 4))
data.plot(ax=ax)
predictions_train.plot(ax=ax)
ax.legend();

Predict using the internal regressor¶

In [14]:

            
                Copied!
                
                    
                    
                
                

        
# Fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 15
             )

forecaster.fit(y=data['y'])
# Fit forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 15
             )

forecaster.fit(y=data['y'])

In [15]:

            
                Copied!
                
                    
                    
                
                

        
# Create training matrix
# ==============================================================================
X, y = forecaster.create_train_X_y(
           y = data['y'], 
           exog = None
       )
# Create training matrix
# ==============================================================================
X, y = forecaster.create_train_X_y(
           y = data['y'], 
           exog = None
       )

Using the internal regressor only allows predicting one step ahead.

In [16]:

            
                Copied!
                
# Predict using the internal regressor
# ==============================================================================
forecaster.regressor.predict(X)[:4]
# Predict using the internal regressor
# ==============================================================================
forecaster.regressor.predict(X)[:4]

Out[16]:

array([0.55361079, 0.56832448, 0.73516725, 0.72321715])

Backtest with custom metric¶

Besides the frequently used metrics: mean_squared_error, mean_absolute_error, and mean_absolute_percentage_error, it is possible to use any custom function as long as:

It includes the arguments:
- y_true: true values of the series.
- y_pred: predicted values.
It returns a numeric value (float or int).

It allows evaluating the predictive capability of the model in a wide range of scenarios, for example:

Consider only certain months, days, hours...
Consider only dates that are holidays.
Consider only the last step of the predicted horizon.

The following example shows how to forecast a 12-month horizon but considering only the last 3 months of each year to calculate the interest metric.

In [17]:

            
                Copied!
                
                    
                    
                
                

        
# Backtest forecaster with custom metric
# ==============================================================================
def custom_metric(y_true, y_pred):
    """
    Calculate the mean squared error using only the predicted values of the last
    3 months of the year.
    """
    mask = y_true.index.month.isin([10, 11, 12])
    metric = mean_squared_error(y_true[mask], y_pred[mask])
    
    return metric

metric, predictions_backtest = backtesting_forecaster(
                                   forecaster         = forecaster,
                                   y                  = data['y'],
                                   initial_train_size = len(data.loc[:end_train]),
                                   fixed_train_size   = False,
                                   steps              = 10,
                                   metric             = custom_metric,
                                   refit              = True,
                                   verbose            = False
                               )

print(f"Backtest error custom metric: {metric}")
# Backtest forecaster with custom metric
# ==============================================================================
def custom_metric(y_true, y_pred):
    """
    Calculate the mean squared error using only the predicted values of the last
    3 months of the year.
    """
    mask = y_true.index.month.isin([10, 11, 12])
    metric = mean_squared_error(y_true[mask], y_pred[mask])
    
    return metric

metric, predictions_backtest = backtesting_forecaster(
                                   forecaster         = forecaster,
                                   y                  = data['y'],
                                   initial_train_size = len(data.loc[:end_train]),
                                   fixed_train_size   = False,
                                   steps              = 10,
                                   metric             = custom_metric,
                                   refit              = True,
                                   verbose            = False
                               )

print(f"Backtest error custom metric: {metric}")

Backtest error custom metric: 0.005580948738772139

Backtest with multiple metrics¶

The function backtesting_forecaster allow to estimate multiple metrics at the same time if a list of metrics is provided. This list may include custom metrics.

In [18]:

            
                Copied!
                
                    
                    
                
                

        
metrics, predictions_backtest = backtesting_forecaster(
                                    forecaster         = forecaster,
                                    y                  = data['y'],
                                    initial_train_size = len(data.loc[:end_train]),
                                    fixed_train_size   = False,
                                    steps              = 10,
                                    metric             = ['mean_squared_error', 'mean_absolute_error'],
                                    refit              = True,
                                    verbose            = False
                                )

print(f"Backtest error metrics: {metrics}")
metrics, predictions_backtest = backtesting_forecaster(
                                    forecaster         = forecaster,
                                    y                  = data['y'],
                                    initial_train_size = len(data.loc[:end_train]),
                                    fixed_train_size   = False,
                                    steps              = 10,
                                    metric             = ['mean_squared_error', 'mean_absolute_error'],
                                    refit              = True,
                                    verbose            = False
                                )

print(f"Backtest error metrics: {metrics}")

Backtest error metrics: [0.00818535931502708, 0.06489120319220776]

Backtest with exogenous variables¶

In [19]:

            
                Copied!
                
                    
                    
                
                

        
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o_exog.csv')
data = pd.read_csv(url, sep=',', header=0, names=['datetime', 'y', 'exog_1', 'exog_2'])

# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data.sort_index()

# Train-validation dates
# ==============================================================================
end_train = '2002-01-01 23:59:00'

print(f"Train dates      : {data.index.min()} --- {data.loc[:end_train].index.max()}  (n={len(data.loc[:end_train])})")
print(f"Validation dates : {data.loc[end_train:].index.min()} --- {data.index.max()}  (n={len(data.loc[end_train:])})")

# Plot
# ==============================================================================
fig, ax=plt.subplots(figsize=(9, 4))
data.plot(ax=ax);
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast/master/data/h2o_exog.csv')
data = pd.read_csv(url, sep=',', header=0, names=['datetime', 'y', 'exog_1', 'exog_2'])

# Data preprocessing
# ==============================================================================
data['datetime'] = pd.to_datetime(data['datetime'], format='%Y/%m/%d')
data = data.set_index('datetime')
data = data.asfreq('MS')
data = data.sort_index()

# Train-validation dates
# ==============================================================================
end_train = '2002-01-01 23:59:00'

print(f"Train dates      : {data.index.min()} --- {data.loc[:end_train].index.max()}  (n={len(data.loc[:end_train])})")
print(f"Validation dates : {data.loc[end_train:].index.min()} --- {data.index.max()}  (n={len(data.loc[end_train:])})")

# Plot
# ==============================================================================
fig, ax=plt.subplots(figsize=(9, 4))
data.plot(ax=ax);

Train dates      : 1992-04-01 00:00:00 --- 2002-01-01 00:00:00  (n=118)
Validation dates : 2002-02-01 00:00:00 --- 2008-06-01 00:00:00  (n=77)

In [20]:

            
                Copied!
                
                    
                    
                
                

        
# Backtest forecaster exogenous variables
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 15 
             )

metric, predictions_backtest = backtesting_forecaster(
                                   forecaster = forecaster,
                                   y          = data['y'],
                                   exog       = data[['exog_1', 'exog_2']],
                                   initial_train_size = len(data.loc[:end_train]),
                                   fixed_train_size   = False,
                                   steps      = 10,
                                   metric     = 'mean_squared_error',
                                   refit      = True,
                                   verbose    = True
                               )
# Backtest forecaster exogenous variables
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = RandomForestRegressor(random_state=123),
                 lags      = 15 
             )

metric, predictions_backtest = backtesting_forecaster(
                                   forecaster = forecaster,
                                   y          = data['y'],
                                   exog       = data[['exog_1', 'exog_2']],
                                   initial_train_size = len(data.loc[:end_train]),
                                   fixed_train_size   = False,
                                   steps      = 10,
                                   metric     = 'mean_squared_error',
                                   refit      = True,
                                   verbose    = True
                               )

Information of backtesting process
----------------------------------
Number of observations used for initial training: 118
Number of observations used for backtesting: 77
    Number of folds: 8
    Number of steps per fold: 10
    Last fold only includes 7 observations.

Data partition in fold: 0
    Training:   1992-04-01 00:00:00 -- 2002-01-01 00:00:00  (n=118)
    Validation: 2002-02-01 00:00:00 -- 2002-11-01 00:00:00  (n=10)
Data partition in fold: 1
    Training:   1992-04-01 00:00:00 -- 2002-11-01 00:00:00  (n=128)
    Validation: 2002-12-01 00:00:00 -- 2003-09-01 00:00:00  (n=10)
Data partition in fold: 2
    Training:   1992-04-01 00:00:00 -- 2003-09-01 00:00:00  (n=138)
    Validation: 2003-10-01 00:00:00 -- 2004-07-01 00:00:00  (n=10)
Data partition in fold: 3
    Training:   1992-04-01 00:00:00 -- 2004-07-01 00:00:00  (n=148)
    Validation: 2004-08-01 00:00:00 -- 2005-05-01 00:00:00  (n=10)
Data partition in fold: 4
    Training:   1992-04-01 00:00:00 -- 2005-05-01 00:00:00  (n=158)
    Validation: 2005-06-01 00:00:00 -- 2006-03-01 00:00:00  (n=10)
Data partition in fold: 5
    Training:   1992-04-01 00:00:00 -- 2006-03-01 00:00:00  (n=168)
    Validation: 2006-04-01 00:00:00 -- 2007-01-01 00:00:00  (n=10)
Data partition in fold: 6
    Training:   1992-04-01 00:00:00 -- 2007-01-01 00:00:00  (n=178)
    Validation: 2007-02-01 00:00:00 -- 2007-11-01 00:00:00  (n=10)
Data partition in fold: 7
    Training:   1992-04-01 00:00:00 -- 2007-11-01 00:00:00  (n=188)
    Validation: 2007-12-01 00:00:00 -- 2008-06-01 00:00:00  (n=7)

In [21]:

            
                Copied!
                
print(f"Backtest error with exogenous variables: {metric}")
print(f"Backtest error with exogenous variables: {metric}")

Backtest error with exogenous variables: 0.007800037462113706

In [22]:

            
                Copied!
                
fig, ax = plt.subplots(figsize=(9, 4))
data.loc[end_train:].plot(ax=ax)
predictions_backtest.plot(ax=ax)
ax.legend();
fig, ax = plt.subplots(figsize=(9, 4))
data.loc[end_train:].plot(ax=ax)
predictions_backtest.plot(ax=ax)
ax.legend();

In [23]:

            
                Copied!
                
%%html
<style>
.jupyter-wrapper .jp-CodeCell .jp-Cell-inputWrapper .jp-InputPrompt {display: none;}
</style>
%%html