Skip to content

ForecasterEquivalentDate

skforecast.recursive._forecaster_equivalent_date.ForecasterEquivalentDate

ForecasterEquivalentDate(
    offset,
    n_offsets=1,
    agg_func=np.mean,
    binner_kwargs=None,
    forecaster_id=None,
)

This forecaster predicts future values based on the most recent equivalent date. It also allows to aggregate multiple past values of the equivalent date using a function (e.g. mean, median, max, min, etc.). The equivalent date is calculated by moving back in time a specified number of steps (offset). The offset can be defined as an integer or as a pandas DateOffset. This approach is useful as a baseline, but it is a simplistic method and may not capture complex underlying patterns.

Parameters:

Name Type Description Default
offset (int, DateOffset)

Number of steps to go back in time to find the most recent equivalent date to the target period. If offset is an integer, it represents the number of steps to go back in time. For example, if the frequency of the time series is daily, offset = 7 means that the most recent data similar to the target period is the value observed 7 days ago. Pandas DateOffsets can also be used to move forward a given number of valid dates. For example, Bday(2) can be used to move back two business days. If the date does not start on a valid date, it is first moved to a valid date. For example, if the date is a Saturday, it is moved to the previous Friday. Then, the offset is applied. If the result is a non-valid date, it is moved to the next valid date. For example, if the date is a Sunday, it is moved to the next Monday. For more information about offsets, see https://pandas.pydata.org/docs/reference/offset_frequency.html.

required
n_offsets int

Number of equivalent dates (multiple of offset) used in the prediction. If n_offsets is greater than 1, the values at the equivalent dates are aggregated using the agg_func function. For example, if the frequency of the time series is daily, offset = 7, n_offsets = 2 and agg_func = np.mean, the predicted value will be the mean of the values observed 7 and 14 days ago.

1
agg_func Callable

Function used to aggregate the values of the equivalent dates when the number of equivalent dates (n_offsets) is greater than 1.

np.mean
binner_kwargs dict

Additional arguments to pass to the QuantileBinner used to discretize the residuals into k bins according to the predicted values associated with each residual. Available arguments are: n_bins, method, subsample, random_state and dtype. Argument method is passed internally to the function numpy.percentile. New in version 0.17.0

None
forecaster_id (str, int)

Name used as an identifier of the forecaster.

None

Attributes:

Name Type Description
offset (int, DateOffset)

Number of steps to go back in time to find the most recent equivalent date to the target period. If offset is an integer, it represents the number of steps to go back in time. For example, if the frequency of the time series is daily, offset = 7 means that the most recent data similar to the target period is the value observed 7 days ago. Pandas DateOffsets can also be used to move forward a given number of valid dates. For example, Bday(2) can be used to move back two business days. If the date does not start on a valid date, it is first moved to a valid date. For example, if the date is a Saturday, it is moved to the previous Friday. Then, the offset is applied. If the result is a non-valid date, it is moved to the next valid date. For example, if the date is a Sunday, it is moved to the next Monday. For more information about offsets, see https://pandas.pydata.org/docs/reference/offset_frequency.html.

n_offsets int

Number of equivalent dates (multiple of offset) used in the prediction. If offset is greater than 1, the value at the equivalent dates is aggregated using the agg_func function. For example, if the frequency of the time series is daily, offset = 7, n_offsets = 2 and agg_func = np.mean, the predicted value will be the mean of the values observed 7 and 14 days ago.

agg_func Callable

Function used to aggregate the values of the equivalent dates when the number of equivalent dates (n_offsets) is greater than 1.

window_size int

Number of past values needed to include the last equivalent dates according to the offset and n_offsets.

last_window_ pandas Series

This window represents the most recent data observed by the predictor during its training phase. It contains the past values needed to include the last equivalent date according the offset and n_offsets.

index_type_ type

Type of index of the input used in training.

index_freq_ str

Frequency of Index of the input used in training.

training_range_ pandas Index

First and last values of index of the data used during training.

series_name_in_ str

Names of the series provided by the user during training.

in_sample_residuals_ numpy ndarray

Residuals of the model when predicting training data. Only stored up to 10_000 values. If transformer_y is not None, residuals are stored in the transformed scale. If differentiation is not None, residuals are stored after differentiation.

in_sample_residuals_by_bin_ dict

In sample residuals binned according to the predicted value each residual is associated with. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_ in the form {bin: residuals}. If transformer_y is not None, residuals are stored in the transformed scale. If differentiation is not None, residuals are stored after differentiation.

out_sample_residuals_ numpy ndarray

Residuals of the model when predicting non-training data. Only stored up to 10_000 values. Use set_out_sample_residuals() method to set values. If transformer_y is not None, residuals are stored in the transformed scale. If differentiation is not None, residuals are stored after differentiation.

out_sample_residuals_by_bin_ dict

Out of sample residuals binned according to the predicted value each residual is associated with. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_ in the form {bin: residuals}. If transformer_y is not None, residuals are stored in the transformed scale. If differentiation is not None, residuals are stored after differentiation.

binner QuantileBinner

QuantileBinner used to discretize residuals into k bins according to the predicted values associated with each residual.

binner_intervals_ dict

Intervals used to discretize residuals into k bins according to the predicted values associated with each residual.

binner_kwargs dict

Additional arguments to pass to the QuantileBinner.

creation_date str

Date of creation.

is_fitted bool

Tag to identify if the regressor has been fitted (trained).

fit_date str

Date of last fit.

skforecast_version str

Version of skforecast library used to create the forecaster.

python_version str

Version of python used to create the forecaster.

forecaster_id (str, int)

Name used as an identifier of the forecaster.

_probabilistic_mode (str, bool)

Private attribute used to indicate whether the forecaster should perform some calculations during backtesting.

regressor Ignored

Not used, present here for API consistency by convention.

differentiation Ignored

Not used, present here for API consistency by convention.

differentiation_max Ignored

Not used, present here for API consistency by convention.

Methods:

Name Description
fit

Training Forecaster.

predict

Predict n steps ahead.

predict_interval

Predict n steps ahead and estimate prediction intervals using conformal

set_in_sample_residuals

Set in-sample residuals in case they were not calculated during the

set_out_sample_residuals

Set new values to the attribute out_sample_residuals_. Out of sample

summary

Show forecaster information.

Source code in skforecast\recursive\_forecaster_equivalent_date.py
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
def __init__(
    self,
    offset: int | pd.tseries.offsets.DateOffset,
    n_offsets: int = 1,
    agg_func: Callable = np.mean,
    binner_kwargs: dict[str, object] | None = None,
    forecaster_id: str | int | None = None
) -> None:

    self.offset                       = offset
    self.n_offsets                    = n_offsets
    self.agg_func                     = agg_func
    self.last_window_                 = None
    self.index_type_                  = None
    self.index_freq_                  = None
    self.training_range_              = None
    self.series_name_in_              = None
    self.in_sample_residuals_         = None
    self.out_sample_residuals_        = None
    self.in_sample_residuals_by_bin_  = None
    self.out_sample_residuals_by_bin_ = None
    self.creation_date                = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
    self.is_fitted                    = False
    self.fit_date                     = None
    self.skforecast_version           = skforecast.__version__
    self.python_version               = sys.version.split(" ")[0]
    self.forecaster_id                = forecaster_id
    self._probabilistic_mode          = "binned"
    self.regressor                    = None
    self.differentiation              = None
    self.differentiation_max          = None

    if not isinstance(self.offset, (int, pd.tseries.offsets.DateOffset)):
        raise TypeError(
            "`offset` must be an integer greater than 0 or a "
            "pandas.tseries.offsets. Find more information about offsets in "
            "https://pandas.pydata.org/docs/reference/offset_frequency.html"
        )

    self.window_size = self.offset * self.n_offsets

    self.binner_kwargs = binner_kwargs
    if binner_kwargs is None:
        self.binner_kwargs = {
            'n_bins': 10, 'method': 'linear', 'subsample': 200000,
            'random_state': 789654, 'dtype': np.float64
        }
    self.binner = QuantileBinner(**self.binner_kwargs)
    self.binner_intervals_ = None

offset instance-attribute

offset = offset

n_offsets instance-attribute

n_offsets = n_offsets

agg_func instance-attribute

agg_func = agg_func

last_window_ instance-attribute

last_window_ = None

index_type_ instance-attribute

index_type_ = None

index_freq_ instance-attribute

index_freq_ = None

training_range_ instance-attribute

training_range_ = None

series_name_in_ instance-attribute

series_name_in_ = None

in_sample_residuals_ instance-attribute

in_sample_residuals_ = None

out_sample_residuals_ instance-attribute

out_sample_residuals_ = None

in_sample_residuals_by_bin_ instance-attribute

in_sample_residuals_by_bin_ = None

out_sample_residuals_by_bin_ instance-attribute

out_sample_residuals_by_bin_ = None

creation_date instance-attribute

creation_date = strftime('%Y-%m-%d %H:%M:%S')

is_fitted instance-attribute

is_fitted = False

fit_date instance-attribute

fit_date = None

skforecast_version instance-attribute

skforecast_version = __version__

python_version instance-attribute

python_version = split(' ')[0]

forecaster_id instance-attribute

forecaster_id = forecaster_id

_probabilistic_mode instance-attribute

_probabilistic_mode = 'binned'

regressor instance-attribute

regressor = None

differentiation instance-attribute

differentiation = None

differentiation_max instance-attribute

differentiation_max = None

window_size instance-attribute

window_size = offset * n_offsets

binner_kwargs instance-attribute

binner_kwargs = binner_kwargs

binner instance-attribute

binner = QuantileBinner(**(binner_kwargs))

binner_intervals_ instance-attribute

binner_intervals_ = None

_repr_html_

_repr_html_()

HTML representation of the object. The "General Information" section is expanded by default.

Source code in skforecast\recursive\_forecaster_equivalent_date.py
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
def _repr_html_(self):
    """
    HTML representation of the object.
    The "General Information" section is expanded by default.
    """

    style, unique_id = get_style_repr_html(self.is_fitted)

    content = f"""
    <div class="container-{unique_id}">
        <h2>{type(self).__name__}</h2>
        <details open>
            <summary>General Information</summary>
            <ul>
                <li><strong>Regressor:</strong> {type(self.regressor).__name__}</li>
                <li><strong>Offset:</strong> {self.offset}</li>
                <li><strong>Number of offsets:</strong> {self.n_offsets}</li>
                <li><strong>Aggregation function:</strong> {self.agg_func.__name__}</li>
                <li><strong>Window size:</strong> {self.window_size}</li>
                <li><strong>Creation date:</strong> {self.creation_date}</li>
                <li><strong>Last fit date:</strong> {self.fit_date}</li>
                <li><strong>Skforecast version:</strong> {self.skforecast_version}</li>
                <li><strong>Python version:</strong> {self.python_version}</li>
                <li><strong>Forecaster id:</strong> {self.forecaster_id}</li>
            </ul>
        </details>
        <details>
            <summary>Training Information</summary>
            <ul>
                <li><strong>Training range:</strong> {self.training_range_.to_list() if self.is_fitted else 'Not fitted'}</li>
                <li><strong>Training index type:</strong> {str(self.index_type_).split('.')[-1][:-2] if self.is_fitted else 'Not fitted'}</li>
                <li><strong>Training index frequency:</strong> {self.index_freq_ if self.is_fitted else 'Not fitted'}</li>
            </ul>
        </details>
        <p>
            <a href="https://skforecast.org/{skforecast.__version__}/api/forecasterequivalentdate.html">&#128712 <strong>API Reference</strong></a>
            &nbsp;&nbsp;
            <a href="https://skforecast.org/{skforecast.__version__}/user_guides/forecasting-baseline.html">&#128462 <strong>User Guide</strong></a>
        </p>
    </div>
    """

    return style + content

fit

fit(
    y,
    store_in_sample_residuals=False,
    random_state=123,
    exog=None,
)

Training Forecaster.

Parameters:

Name Type Description Default
y pandas Series

Training time series.

required
store_in_sample_residuals bool

If True, in-sample residuals will be stored in the forecaster object after fitting (in_sample_residuals_ and in_sample_residuals_by_bin_ attributes). If False, only the intervals of the bins are stored.

False
random_state int

Set a seed for the random generator so that the stored sample residuals are always deterministic.

123
exog Ignored

Not used, present here for API consistency by convention.

None

Returns:

Type Description
None
Source code in skforecast\recursive\_forecaster_equivalent_date.py
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
def fit(
    self,
    y: pd.Series,
    store_in_sample_residuals: bool = False,
    random_state: int = 123,
    exog: Any = None
) -> None:
    """
    Training Forecaster.

    Parameters
    ----------
    y : pandas Series
        Training time series.
    store_in_sample_residuals : bool, default False
        If `True`, in-sample residuals will be stored in the forecaster object
        after fitting (`in_sample_residuals_` and `in_sample_residuals_by_bin_`
        attributes).
        If `False`, only the intervals of the bins are stored.
    random_state : int, default 123
        Set a seed for the random generator so that the stored sample 
        residuals are always deterministic.
    exog : Ignored
        Not used, present here for API consistency by convention.

    Returns
    -------
    None

    """

    if not isinstance(y, pd.Series):
        raise TypeError(
            f"`y` must be a pandas Series with a DatetimeIndex or a RangeIndex. "
            f"Found {type(y)}."
        )

    if isinstance(self.offset, pd.tseries.offsets.DateOffset):
        if not isinstance(y.index, pd.DatetimeIndex):
            raise TypeError(
                "If `offset` is a pandas DateOffset, the index of `y` must be a "
                "pandas DatetimeIndex with frequency."
            )
        elif y.index.freq is None:
            raise TypeError(
                "If `offset` is a pandas DateOffset, the index of `y` must be a "
                "pandas DatetimeIndex with frequency."
            )

    # Reset values in case the forecaster has already been fitted.
    self.last_window_    = None
    self.index_type_     = None
    self.index_freq_     = None
    self.training_range_ = None
    self.series_name_in_ = None
    self.is_fitted       = False

    _, y_index = check_extract_values_and_index(
        data=y, data_label='`y`', return_values=False
    )

    if isinstance(self.offset, pd.tseries.offsets.DateOffset):
        # Calculate the window_size in steps for compatibility with the
        # check_predict_input function. This is not a exact calculation
        # because the offset follows the calendar rules and the distance
        # between two dates may not be constant.
        first_valid_index = (y_index[-1] - self.offset * self.n_offsets)

        try:
            window_size_idx_start = y_index.get_loc(first_valid_index)
            window_size_idx_end = y_index.get_loc(y_index[-1])
            self.window_size = window_size_idx_end - window_size_idx_start
        except KeyError:
            raise ValueError(
                f"The length of `y` ({len(y)}), must be greater than or equal "
                f"to the window size ({self.window_size}). This is because  "
                f"the offset ({self.offset}) is larger than the available "
                f"data. Try to decrease the size of the offset ({self.offset}), "
                f"the number of `n_offsets` ({self.n_offsets}) or increase the "
                f"size of `y`."
            )
    else:
        if len(y) <= self.window_size:
            raise ValueError(
                f"Length of `y` must be greater than the maximum window size "
                f"needed by the forecaster. This is because  "
                f"the offset ({self.offset}) is larger than the available "
                f"data. Try to decrease the size of the offset ({self.offset}), "
                f"the number of `n_offsets` ({self.n_offsets}) or increase the "
                f"size of `y`.\n"
                f"    Length `y`: {len(y)}.\n"
                f"    Max window size: {self.window_size}.\n"
            )

    self.is_fitted = True
    self.series_name_in_ = y.name if y.name is not None else 'y'
    self.fit_date = pd.Timestamp.today().strftime('%Y-%m-%d %H:%M:%S')
    self.training_range_ = y_index[[0, -1]]
    self.index_type_ = type(y_index)
    self.index_freq_ = (
        y_index.freqstr if isinstance(y_index, pd.DatetimeIndex) else y_index.step
    )

    # NOTE: This is done to save time during fit in functions such as backtesting()
    if self._probabilistic_mode is not False:
        self._binning_in_sample_residuals(
            y                         = y,
            store_in_sample_residuals = store_in_sample_residuals,
            random_state              = random_state
        )

    # The last time window of training data is stored so that equivalent
    # dates are available when calling the `predict` method.
    # Store the whole series to avoid errors when the offset is larger 
    # than the data available.
    self.last_window_ = y.copy()

_binning_in_sample_residuals

_binning_in_sample_residuals(
    y, store_in_sample_residuals=False, random_state=123
)

Bin residuals according to the predicted value each residual is associated with. First a skforecast.preprocessing.QuantileBinner object is fitted to the predicted values. Then, residuals are binned according to the predicted value each residual is associated with. Residuals are stored in the forecaster object as in_sample_residuals_ and in_sample_residuals_by_bin_.

The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_. The total number of residuals stored is 10_000. New in version 0.17.0

Parameters:

Name Type Description Default
y pandas Series

Training time series.

required
store_in_sample_residuals bool

If True, in-sample residuals will be stored in the forecaster object after fitting (in_sample_residuals_ and in_sample_residuals_by_bin_ attributes). If False, only the intervals of the bins are stored.

False
random_state int

Set a seed for the random generator so that the stored sample residuals are always deterministic.

123

Returns:

Type Description
None
Source code in skforecast\recursive\_forecaster_equivalent_date.py
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
def _binning_in_sample_residuals(
    self,
    y: pd.Series,
    store_in_sample_residuals: bool = False,
    random_state: int = 123
) -> None:
    """
    Bin residuals according to the predicted value each residual is
    associated with. First a `skforecast.preprocessing.QuantileBinner` object
    is fitted to the predicted values. Then, residuals are binned according
    to the predicted value each residual is associated with. Residuals are
    stored in the forecaster object as `in_sample_residuals_` and
    `in_sample_residuals_by_bin_`.

    The number of residuals stored per bin is limited to 
    `10_000 // self.binner.n_bins_`. The total number of residuals stored is
    `10_000`.
    **New in version 0.17.0**

    Parameters
    ----------
    y : pandas Series
        Training time series.
    store_in_sample_residuals : bool, default False
        If `True`, in-sample residuals will be stored in the forecaster object
        after fitting (`in_sample_residuals_` and `in_sample_residuals_by_bin_`
        attributes).
        If `False`, only the intervals of the bins are stored.
    random_state : int, default 123
        Set a seed for the random generator so that the stored sample 
        residuals are always deterministic.

    Returns
    -------
    None

    """

    if isinstance(self.offset, pd.tseries.offsets.DateOffset):
        y_preds = []
        for n_off in range(1, self.n_offsets + 1):
            idx = y.index - self.offset * n_off
            mask = idx >= y.index[0]
            y_pred = y.loc[idx[mask]]
            y_pred.index = y.index[-mask.sum():]
            y_preds.append(y_pred)

        y_preds = pd.concat(y_preds, axis=1).to_numpy()
        y_true = y.to_numpy()[-len(y_preds):]

    else:
        y_preds = [
            y.shift(self.offset * n_off)[self.window_size:]
            for n_off in range(1, self.n_offsets + 1)
        ]
        y_preds = np.column_stack(y_preds)
        y_true = y.to_numpy()[self.window_size:]

    y_pred = np.apply_along_axis(
                 self.agg_func,
                 axis = 1,
                 arr  = y_preds
             )

    residuals = y_true - y_pred

    if self._probabilistic_mode == "binned":
        data = pd.DataFrame(
            {'prediction': y_pred, 'residuals': residuals}
        ).dropna()
        y_pred = data['prediction'].to_numpy()
        residuals = data['residuals'].to_numpy()

        self.binner.fit(y_pred)
        self.binner_intervals_ = self.binner.intervals_

    if store_in_sample_residuals:
        rng = np.random.default_rng(seed=random_state)
        if self._probabilistic_mode == "binned":
            data['bin'] = self.binner.transform(y_pred).astype(int)
            self.in_sample_residuals_by_bin_ = (
                data.groupby('bin')['residuals'].apply(np.array).to_dict()
            )

            max_sample = 10_000 // self.binner.n_bins_
            for k, v in self.in_sample_residuals_by_bin_.items():
                if len(v) > max_sample:
                    sample = v[rng.integers(low=0, high=len(v), size=max_sample)]
                    self.in_sample_residuals_by_bin_[k] = sample

            for k in self.binner_intervals_.keys():
                if k not in self.in_sample_residuals_by_bin_:
                    self.in_sample_residuals_by_bin_[k] = np.array([])

            empty_bins = [
                k for k, v in self.in_sample_residuals_by_bin_.items() 
                if v.size == 0
            ]
            if empty_bins:
                empty_bin_size = min(max_sample, len(residuals))
                for k in empty_bins:
                    self.in_sample_residuals_by_bin_[k] = rng.choice(
                        a       = residuals,
                        size    = empty_bin_size,
                        replace = False
                    )

        if len(residuals) > 10_000:
            residuals = residuals[
                rng.integers(low=0, high=len(residuals), size=10_000)
            ]

        self.in_sample_residuals_ = residuals

predict

predict(
    steps, last_window=None, check_inputs=True, exog=None
)

Predict n steps ahead.

Parameters:

Name Type Description Default
steps int

Number of steps to predict.

required
last_window pandas Series

Past values needed to select the last equivalent dates according to the offset. If last_window = None, the values stored in self.last_window_ are used and the predictions start immediately after the training data.

None
check_inputs bool

If True, the input is checked for possible warnings and errors with the check_predict_input function. This argument is created for internal use and is not recommended to be changed.

True
exog Ignored

Not used, present here for API consistency by convention.

None

Returns:

Name Type Description
predictions pandas Series

Predicted values.

Source code in skforecast\recursive\_forecaster_equivalent_date.py
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
def predict(
    self,
    steps: int,
    last_window: pd.Series | None = None,
    check_inputs: bool = True,
    exog: Any = None
) -> pd.Series:
    """
    Predict n steps ahead.

    Parameters
    ----------
    steps : int
        Number of steps to predict. 
    last_window : pandas Series, default None
        Past values needed to select the last equivalent dates according to 
        the offset. If `last_window = None`, the values stored in 
        `self.last_window_` are used and the predictions start immediately 
        after the training data.
    check_inputs : bool, default True
        If `True`, the input is checked for possible warnings and errors 
        with the `check_predict_input` function. This argument is created 
        for internal use and is not recommended to be changed.
    exog : Ignored
        Not used, present here for API consistency by convention.

    Returns
    -------
    predictions : pandas Series
        Predicted values.

    """

    if last_window is None:
        last_window = self.last_window_

    if check_inputs:
        check_predict_input(
            forecaster_name = type(self).__name__,
            steps           = steps,
            is_fitted       = self.is_fitted,
            exog_in_        = False,
            index_type_     = self.index_type_,
            index_freq_     = self.index_freq_,
            window_size     = self.window_size,
            last_window     = last_window
        )

    prediction_index = expand_index(index=last_window.index, steps=steps)

    if isinstance(self.offset, int):

        last_window_values = last_window.to_numpy(copy=True).ravel()
        equivalent_indexes = np.tile(
                                 np.arange(-self.offset, 0),
                                 int(np.ceil(steps / self.offset))
                             )
        equivalent_indexes = equivalent_indexes[:steps]

        if self.n_offsets == 1:
            equivalent_values = last_window_values[equivalent_indexes]
            predictions = equivalent_values.ravel()

        if self.n_offsets > 1:
            equivalent_indexes = [
                equivalent_indexes - n * self.offset 
                for n in np.arange(self.n_offsets)
            ]
            equivalent_indexes = np.vstack(equivalent_indexes)
            equivalent_values = last_window_values[equivalent_indexes]
            predictions = np.apply_along_axis(
                              self.agg_func,
                              axis = 0,
                              arr  = equivalent_values
                          )

        predictions = pd.Series(
                          data  = predictions,
                          index = prediction_index,
                          name  = 'pred'
                      )

    if isinstance(self.offset, pd.tseries.offsets.DateOffset):

        last_window = last_window.copy()
        max_allowed_date = last_window.index[-1]

        # For every date in prediction_index, calculate the n offsets
        offset_dates = []
        for date in prediction_index:
            selected_offsets = []
            while len(selected_offsets) < self.n_offsets:
                offset_date = date - self.offset
                if offset_date <= max_allowed_date:
                    selected_offsets.append(offset_date)
                date = offset_date
            offset_dates.append(selected_offsets)

        offset_dates = np.array(offset_dates)

        # Select the values of the time series corresponding to the each
        # offset date. If the offset date is not in the time series, the
        # value is set to NaN.
        equivalent_values = (
            last_window.
            reindex(offset_dates.ravel())
            .to_numpy()
            .reshape(-1, self.n_offsets)
        )
        equivalent_values = pd.DataFrame(
                                data    = equivalent_values,
                                index   = prediction_index,
                                columns = [f'offset_{i}' for i in range(self.n_offsets)]
                            )

        # Error if all values are missing
        if equivalent_values.isnull().all().all():
            raise ValueError(
                f"All equivalent values are missing. This is caused by using "
                f"an offset ({self.offset}) larger than the available data. "
                f"Try to decrease the size of the offset ({self.offset}), "
                f"the number of `n_offsets` ({self.n_offsets}) or increase the "
                f"size of `last_window`. In backtesting, this error may be "
                f"caused by using an `initial_train_size` too small."
            )

        # Warning if equivalent values are missing
        incomplete_offsets = equivalent_values.isnull().any(axis=1)
        incomplete_offsets = incomplete_offsets[incomplete_offsets].index
        if not incomplete_offsets.empty:
            warnings.warn(
                f"Steps: {incomplete_offsets.strftime('%Y-%m-%d').to_list()} "
                f"are calculated with less than {self.n_offsets} `n_offsets`. "
                f"To avoid this, increase the `last_window` size or decrease "
                f"the number of `n_offsets`. The current configuration requires " 
                f"a total offset of {self.offset * self.n_offsets}.",
                MissingValuesWarning
            )

        aggregate_values = equivalent_values.apply(self.agg_func, axis=1)
        predictions = aggregate_values.rename('pred')

    return predictions

predict_interval

predict_interval(
    steps,
    last_window=None,
    method="conformal",
    interval=[5, 95],
    use_in_sample_residuals=True,
    use_binned_residuals=True,
    random_state=None,
    exog=None,
    n_boot=None,
)

Predict n steps ahead and estimate prediction intervals using conformal prediction method. Refer to the References section for additional details on this method.

Parameters:

Name Type Description Default
steps int

Number of steps to predict.

required
last_window pandas Series

Past values needed to select the last equivalent dates according to the offset. If last_window = None, the values stored in self.last_window_ are used and the predictions start immediately after the training data.

None
method str

Technique used to estimate prediction intervals. Available options:

  • 'conformal': Employs the conformal prediction split method for interval estimation [1]_.
'conformal'
interval (float, list, tuple)

Confidence level of the prediction interval. Interpretation depends on the method used:

  • If float, represents the nominal (expected) coverage (between 0 and 1). For instance, interval=0.95 corresponds to [2.5, 97.5] percentiles.
  • If list or tuple, defines the exact percentiles to compute, which must be between 0 and 100 inclusive. For example, interval of 95% should be as interval = [2.5, 97.5].
  • When using method='conformal', the interval must be a float or a list/tuple defining a symmetric interval.
[5, 95]
use_in_sample_residuals bool

If True, residuals from the training data are used as proxy of prediction error to create predictions. If False, out of sample residuals (calibration) are used. Out-of-sample residuals must be precomputed using Forecaster's set_out_sample_residuals() method.

True
use_binned_residuals bool

If True, residuals are selected based on the predicted values (binned selection). If False, residuals are selected randomly.

True
random_state Ignored

Not used, present here for API consistency by convention.

None
exog Ignored

Not used, present here for API consistency by convention.

None
n_boot Ignored

Not used, present here for API consistency by convention.

None

Returns:

Name Type Description
predictions pandas DataFrame

Values predicted by the forecaster and their estimated interval.

  • pred: predictions.
  • lower_bound: lower bound of the interval.
  • upper_bound: upper bound of the interval.
References

.. [1] MAPIE - Model Agnostic Prediction Interval Estimator. https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method

Source code in skforecast\recursive\_forecaster_equivalent_date.py
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
def predict_interval(
    self,
    steps: int,
    last_window: pd.Series | None = None,
    method: str = 'conformal',
    interval: float | list[float] | tuple[float] = [5, 95],
    use_in_sample_residuals: bool = True,
    use_binned_residuals: bool = True,
    random_state: Any = None,
    exog: Any = None,
    n_boot: Any = None
) -> pd.DataFrame:
    """
    Predict n steps ahead and estimate prediction intervals using conformal 
    prediction method. Refer to the References section for additional 
    details on this method.

    Parameters
    ----------
    steps : int
        Number of steps to predict.
    last_window : pandas Series, default None
        Past values needed to select the last equivalent dates according to 
        the offset. If `last_window = None`, the values stored in 
        `self.last_window_` are used and the predictions start immediately 
        after the training data.
    method : str, default 'conformal'
        Technique used to estimate prediction intervals. Available options:

        - 'conformal': Employs the conformal prediction split method for 
        interval estimation [1]_.
    interval : float, list, tuple, default [5, 95]
        Confidence level of the prediction interval. Interpretation depends 
        on the method used:

        - If `float`, represents the nominal (expected) coverage (between 0 
        and 1). For instance, `interval=0.95` corresponds to `[2.5, 97.5]` 
        percentiles.
        - If `list` or `tuple`, defines the exact percentiles to compute, which 
        must be between 0 and 100 inclusive. For example, interval 
        of 95% should be as `interval = [2.5, 97.5]`.
        - When using `method='conformal'`, the interval must be a float or 
        a list/tuple defining a symmetric interval.
    use_in_sample_residuals : bool, default True
        If `True`, residuals from the training data are used as proxy of
        prediction error to create predictions. 
        If `False`, out of sample residuals (calibration) are used. 
        Out-of-sample residuals must be precomputed using Forecaster's
        `set_out_sample_residuals()` method.
    use_binned_residuals : bool, default True
        If `True`, residuals are selected based on the predicted values 
        (binned selection).
        If `False`, residuals are selected randomly.
    random_state : Ignored
        Not used, present here for API consistency by convention.
    exog : Ignored
        Not used, present here for API consistency by convention.
    n_boot : Ignored
        Not used, present here for API consistency by convention.

    Returns
    -------
    predictions : pandas DataFrame
        Values predicted by the forecaster and their estimated interval.

        - pred: predictions.
        - lower_bound: lower bound of the interval.
        - upper_bound: upper bound of the interval.

    References
    ----------        
    .. [1] MAPIE - Model Agnostic Prediction Interval Estimator.
           https://mapie.readthedocs.io/en/stable/theoretical_description_regression.html#the-split-method

    """

    if method != 'conformal':
        raise ValueError(
            f"Method '{method}' is not supported. Only 'conformal' is available."
        )

    if last_window is None:
        last_window = self.last_window_

    check_predict_input(
        forecaster_name = type(self).__name__,
        steps           = steps,
        is_fitted       = self.is_fitted,
        exog_in_        = False,
        index_type_     = self.index_type_,
        index_freq_     = self.index_freq_,
        window_size     = self.window_size,
        last_window     = last_window
    )

    check_residuals_input(
        forecaster_name              = type(self).__name__,
        use_in_sample_residuals      = use_in_sample_residuals,
        in_sample_residuals_         = self.in_sample_residuals_,
        out_sample_residuals_        = self.out_sample_residuals_,
        use_binned_residuals         = use_binned_residuals,
        in_sample_residuals_by_bin_  = self.in_sample_residuals_by_bin_,
        out_sample_residuals_by_bin_ = self.out_sample_residuals_by_bin_
    )

    if isinstance(interval, (list, tuple)):
        check_interval(interval=interval, ensure_symmetric_intervals=True)
        nominal_coverage = (interval[1] - interval[0]) / 100
    else:
        check_interval(alpha=interval, alpha_literal='interval')
        nominal_coverage = interval

    if use_in_sample_residuals:
        residuals = self.in_sample_residuals_
        residuals_by_bin = self.in_sample_residuals_by_bin_
    else:
        residuals = self.out_sample_residuals_
        residuals_by_bin = self.out_sample_residuals_by_bin_

    prediction_index = expand_index(index=last_window.index, steps=steps)

    if isinstance(self.offset, int):

        last_window_values = last_window.to_numpy(copy=True).ravel()
        equivalent_indexes = np.tile(
                                 np.arange(-self.offset, 0),
                                 int(np.ceil(steps / self.offset))
                             )
        equivalent_indexes = equivalent_indexes[:steps]

        if self.n_offsets == 1:
            equivalent_values = last_window_values[equivalent_indexes]
            predictions = equivalent_values.ravel()

        if self.n_offsets > 1:
            equivalent_indexes = [
                equivalent_indexes - n * self.offset 
                for n in np.arange(self.n_offsets)
            ]
            equivalent_indexes = np.vstack(equivalent_indexes)
            equivalent_values = last_window_values[equivalent_indexes]
            predictions = np.apply_along_axis(
                              self.agg_func,
                              axis = 0,
                              arr  = equivalent_values
                          )

    if isinstance(self.offset, pd.tseries.offsets.DateOffset):

        last_window = last_window.copy()
        max_allowed_date = last_window.index[-1]

        # For every date in prediction_index, calculate the n offsets
        offset_dates = []
        for date in prediction_index:
            selected_offsets = []
            while len(selected_offsets) < self.n_offsets:
                offset_date = date - self.offset
                if offset_date <= max_allowed_date:
                    selected_offsets.append(offset_date)
                date = offset_date
            offset_dates.append(selected_offsets)

        offset_dates = np.array(offset_dates)

        # Select the values of the time series corresponding to the each
        # offset date. If the offset date is not in the time series, the
        # value is set to NaN.
        equivalent_values = (
            last_window.
            reindex(offset_dates.ravel())
            .to_numpy()
            .reshape(-1, self.n_offsets)
        )
        equivalent_values = pd.DataFrame(
                                data    = equivalent_values,
                                index   = prediction_index,
                                columns = [f'offset_{i}' for i in range(self.n_offsets)]
                            )

        # Error if all values are missing
        if equivalent_values.isnull().all().all():
            raise ValueError(
                f"All equivalent values are missing. This is caused by using "
                f"an offset ({self.offset}) larger than the available data. "
                f"Try to decrease the size of the offset ({self.offset}), "
                f"the number of `n_offsets` ({self.n_offsets}) or increase the "
                f"size of `last_window`. In backtesting, this error may be "
                f"caused by using an `initial_train_size` too small."
            )

        # Warning if equivalent values are missing
        incomplete_offsets = equivalent_values.isnull().any(axis=1)
        incomplete_offsets = incomplete_offsets[incomplete_offsets].index
        if not incomplete_offsets.empty:
            warnings.warn(
                f"Steps: {incomplete_offsets.strftime('%Y-%m-%d').to_list()} "
                f"are calculated with less than {self.n_offsets} `n_offsets`. "
                f"To avoid this, increase the `last_window` size or decrease "
                f"the number of `n_offsets`. The current configuration requires " 
                f"a total offset of {self.offset * self.n_offsets}.",
                MissingValuesWarning
            )

        aggregate_values = equivalent_values.apply(self.agg_func, axis=1)
        predictions = aggregate_values.to_numpy()

    if use_binned_residuals:
        correction_factor_by_bin = {
            k: np.quantile(np.abs(v), nominal_coverage)
            for k, v in residuals_by_bin.items()
        }
        replace_func = np.vectorize(lambda x: correction_factor_by_bin[x])
        predictions_bin = self.binner.transform(predictions)
        correction_factor = replace_func(predictions_bin)
    else:
        correction_factor = np.quantile(np.abs(residuals), nominal_coverage)

    lower_bound = predictions - correction_factor
    upper_bound = predictions + correction_factor
    predictions = np.column_stack([predictions, lower_bound, upper_bound])

    predictions = pd.DataFrame(
                      data    = predictions,
                      index   = prediction_index,
                      columns = ["pred", "lower_bound", "upper_bound"]
                  )

    return predictions

set_in_sample_residuals

set_in_sample_residuals(y, random_state=123, exog=None)

Set in-sample residuals in case they were not calculated during the training process.

In-sample residuals are calculated as the difference between the true values and the predictions made by the forecaster using the training data. The following internal attributes are updated:

  • in_sample_residuals_: residuals stored in a numpy ndarray.
  • binner_intervals_: intervals used to bin the residuals are calculated using the quantiles of the predicted values.
  • in_sample_residuals_by_bin_: residuals are binned according to the predicted value they are associated with and stored in a dictionary, where the keys are the intervals of the predicted values and the values are the residuals associated with that range.

A total of 10_000 residuals are stored in the attribute in_sample_residuals_. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_.

Parameters:

Name Type Description Default
y pandas Series

Training time series.

required
random_state int

Sets a seed to the random sampling for reproducible output.

123
exog Ignored

Not used, present here for API consistency by convention.

None

Returns:

Type Description
None
Source code in skforecast\recursive\_forecaster_equivalent_date.py
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
def set_in_sample_residuals(
    self,
    y: pd.Series,
    random_state: int = 123,
    exog: Any = None
) -> None:
    """
    Set in-sample residuals in case they were not calculated during the
    training process. 

    In-sample residuals are calculated as the difference between the true 
    values and the predictions made by the forecaster using the training 
    data. The following internal attributes are updated:

    + `in_sample_residuals_`: residuals stored in a numpy ndarray.
    + `binner_intervals_`: intervals used to bin the residuals are calculated
    using the quantiles of the predicted values.
    + `in_sample_residuals_by_bin_`: residuals are binned according to the
    predicted value they are associated with and stored in a dictionary, where
    the keys are the intervals of the predicted values and the values are
    the residuals associated with that range. 

    A total of 10_000 residuals are stored in the attribute `in_sample_residuals_`.
    If the number of residuals is greater than 10_000, a random sample of
    10_000 residuals is stored. The number of residuals stored per bin is
    limited to `10_000 // self.binner.n_bins_`.

    Parameters
    ----------
    y : pandas Series
        Training time series.
    random_state : int, default 123
        Sets a seed to the random sampling for reproducible output.
    exog : Ignored
        Not used, present here for API consistency by convention.

    Returns
    -------
    None

    """

    if not self.is_fitted:
        raise NotFittedError(
            "This forecaster is not fitted yet. Call `fit` with appropriate "
            "arguments before using `set_in_sample_residuals()`."
        )

    check_y(y=y)
    y_index_range = check_extract_values_and_index(
        data=y, data_label='`y`', return_values=False
    )[1][[0, -1]]
    if not y_index_range.equals(self.training_range_):
        raise IndexError(
            f"The index range of `y` does not match the range "
            f"used during training. Please ensure the index is aligned "
            f"with the training data.\n"
            f"    Expected : {self.training_range_}\n"
            f"    Received : {y_index_range}"
        )

    self._binning_in_sample_residuals(
        y                         = y,
        store_in_sample_residuals = True,
        random_state              = random_state
    )

set_out_sample_residuals

set_out_sample_residuals(
    y_true, y_pred, append=False, random_state=123
)

Set new values to the attribute out_sample_residuals_. Out of sample residuals are meant to be calculated using observations that did not participate in the training process. Two internal attributes are updated:

  • out_sample_residuals_: residuals stored in a numpy ndarray.
  • out_sample_residuals_by_bin_: residuals are binned according to the predicted value they are associated with and stored in a dictionary, where the keys are the intervals of the predicted values and the values are the residuals associated with that range. If a bin binning is empty, it is filled with a random sample of residuals from other bins. This is done to ensure that all bins have at least one residual and can be used in the prediction process.

A total of 10_000 residuals are stored in the attribute out_sample_residuals_. If the number of residuals is greater than 10_000, a random sample of 10_000 residuals is stored. The number of residuals stored per bin is limited to 10_000 // self.binner.n_bins_.

Parameters:

Name Type Description Default
y_true numpy ndarray, pandas Series

True values of the time series from which the residuals have been calculated.

required
y_pred numpy ndarray, pandas Series

Predicted values of the time series.

required
append bool

If True, new residuals are added to the once already stored in the forecaster. If after appending the new residuals, the limit of 10_000 // self.binner.n_bins_ values per bin is reached, a random sample of residuals is stored.

False
random_state int

Sets a seed to the random sampling for reproducible output.

123

Returns:

Type Description
None
Source code in skforecast\recursive\_forecaster_equivalent_date.py
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
def set_out_sample_residuals(
    self,
    y_true: np.ndarray | pd.Series,
    y_pred: np.ndarray | pd.Series,
    append: bool = False,
    random_state: int = 123
) -> None:
    """
    Set new values to the attribute `out_sample_residuals_`. Out of sample
    residuals are meant to be calculated using observations that did not
    participate in the training process. Two internal attributes are updated:

    + `out_sample_residuals_`: residuals stored in a numpy ndarray.
    + `out_sample_residuals_by_bin_`: residuals are binned according to the
    predicted value they are associated with and stored in a dictionary, where
    the keys are the  intervals of the predicted values and the values are
    the residuals associated with that range. If a bin binning is empty, it
    is filled with a random sample of residuals from other bins. This is done
    to ensure that all bins have at least one residual and can be used in the
    prediction process.

    A total of 10_000 residuals are stored in the attribute `out_sample_residuals_`.
    If the number of residuals is greater than 10_000, a random sample of
    10_000 residuals is stored. The number of residuals stored per bin is
    limited to `10_000 // self.binner.n_bins_`.

    Parameters
    ----------
    y_true : numpy ndarray, pandas Series
        True values of the time series from which the residuals have been
        calculated.
    y_pred : numpy ndarray, pandas Series
        Predicted values of the time series.
    append : bool, default False
        If `True`, new residuals are added to the once already stored in the
        forecaster. If after appending the new residuals, the limit of
        `10_000 // self.binner.n_bins_` values per bin is reached, a random
        sample of residuals is stored.
    random_state : int, default 123
        Sets a seed to the random sampling for reproducible output.

    Returns
    -------
    None

    """

    if not self.is_fitted:
        raise NotFittedError(
            "This forecaster is not fitted yet. Call `fit` with appropriate "
            "arguments before using `set_out_sample_residuals()`."
        )

    if not isinstance(y_true, (np.ndarray, pd.Series)):
        raise TypeError(
            f"`y_true` argument must be `numpy ndarray` or `pandas Series`. "
            f"Got {type(y_true)}."
        )

    if not isinstance(y_pred, (np.ndarray, pd.Series)):
        raise TypeError(
            f"`y_pred` argument must be `numpy ndarray` or `pandas Series`. "
            f"Got {type(y_pred)}."
        )

    if len(y_true) != len(y_pred):
        raise ValueError(
            f"`y_true` and `y_pred` must have the same length. "
            f"Got {len(y_true)} and {len(y_pred)}."
        )

    if isinstance(y_true, pd.Series) and isinstance(y_pred, pd.Series):
        if not y_true.index.equals(y_pred.index):
            raise ValueError(
                "`y_true` and `y_pred` must have the same index."
            )

    if not isinstance(y_pred, np.ndarray):
        y_pred = y_pred.to_numpy()
    if not isinstance(y_true, np.ndarray):
        y_true = y_true.to_numpy()

    data = pd.DataFrame(
        {'prediction': y_pred, 'residuals': y_true - y_pred}
    ).dropna()
    y_pred = data['prediction'].to_numpy()
    residuals = data['residuals'].to_numpy()

    data['bin'] = self.binner.transform(y_pred).astype(int)
    residuals_by_bin = data.groupby('bin')['residuals'].apply(np.array).to_dict()

    out_sample_residuals = (
        np.array([]) 
        if self.out_sample_residuals_ is None
        else self.out_sample_residuals_
    )
    out_sample_residuals_by_bin = (
        {} 
        if self.out_sample_residuals_by_bin_ is None
        else self.out_sample_residuals_by_bin_
    )
    if append:
        out_sample_residuals = np.concatenate([out_sample_residuals, residuals])
        for k, v in residuals_by_bin.items():
            if k in out_sample_residuals_by_bin:
                out_sample_residuals_by_bin[k] = np.concatenate(
                    (out_sample_residuals_by_bin[k], v)
                )
            else:
                out_sample_residuals_by_bin[k] = v
    else:
        out_sample_residuals = residuals
        out_sample_residuals_by_bin = residuals_by_bin

    max_samples = 10_000 // self.binner.n_bins_
    rng = np.random.default_rng(seed=random_state)
    for k, v in out_sample_residuals_by_bin.items():
        if len(v) > max_samples:
            sample = rng.choice(a=v, size=max_samples, replace=False)
            out_sample_residuals_by_bin[k] = sample

    bin_keys = (
        []
        if self.binner_intervals_ is None
        else self.binner_intervals_.keys()
    )
    for k in bin_keys:
        if k not in out_sample_residuals_by_bin:
            out_sample_residuals_by_bin[k] = np.array([])

    empty_bins = [
        k for k, v in out_sample_residuals_by_bin.items() 
        if v.size == 0
    ]
    if empty_bins:
        warnings.warn(
            f"The following bins have no out of sample residuals: {empty_bins}. "
            f"No predicted values fall in the interval "
            f"{[self.binner_intervals_[bin] for bin in empty_bins]}. "
            f"Empty bins will be filled with a random sample of residuals.",
            ResidualsUsageWarning
        )
        empty_bin_size = min(max_samples, len(out_sample_residuals))
        for k in empty_bins:
            out_sample_residuals_by_bin[k] = rng.choice(
                a       = out_sample_residuals,
                size    = empty_bin_size,
                replace = False
            )

    if len(out_sample_residuals) > 10_000:
        out_sample_residuals = rng.choice(
            a       = out_sample_residuals, 
            size    = 10_000, 
            replace = False
        )

    self.out_sample_residuals_ = out_sample_residuals
    self.out_sample_residuals_by_bin_ = out_sample_residuals_by_bin

summary

summary()

Show forecaster information.

Parameters:

Name Type Description Default
self
required

Returns:

Type Description
None
Source code in skforecast\recursive\_forecaster_equivalent_date.py
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
def summary(self) -> None:
    """
    Show forecaster information.

    Parameters
    ----------
    self

    Returns
    -------
    None

    """

    print(self)