Benchmarking skforecast¶

Benchmarking is an essential part of the development process of skforecast. It provides a transparent view of how the performance of the library evolves across versions and helps users make informed decisions when choosing the right forecaster or configuration for their use case.

In this section, we present benchmark results that measure the execution time of key methods (fit, predict, etc.) and their performance evolution across versions, allowing the detection of improvements or regressions.

Methodology¶

The benchmarking results presented here are generated using a custom benchmarking script located in the benchmarks/ directory of the repository. This script executes a series of performance tests for the main forecasting classes and their methods, recording both execution time and variability across multiple runs.

To ensure consistency and reproducibility, all benchmarks are automatically executed as part of the continuous integration (CI) pipeline using GitHub Actions. This guarantees that each new release of the library is tested under the same environment and dependency configuration, making performance results directly comparable across versions.

Users who wish to reproduce the benchmarks locally can execute the same script by running:

# From skforecast root directory
python benchmarks/run_benchmarks.py

Results are stored in a structured format in benchmarks/benchmarks.joblib.

⚠ Warning

The plot_benchmark_results function is located at the end of the notebook. Go to plot function.

In [1]:

Copied!





# Libraries
# ==============================================================================
import platform
import psutil
import numpy as np
import pandas as pd
import joblib

import sklearn
import skforecast

# Display all columns in pandas
pd.set_option("display.max_columns", None)
# Libraries
# ==============================================================================
import platform
import psutil
import numpy as np
import pandas as pd
import joblib

import sklearn
import skforecast

# Display all columns in pandas
pd.set_option("display.max_columns", None)

In [2]:

Copied!





# Environment information
# ==============================================================================
print(f"Python version           : {platform.python_version()}")
print(f"skforecast version       : {skforecast.__version__}")
print(f"numpy version            : {np.__version__}")
print(f"pandas version           : {pd.__version__}")
print(f"scikit-learn version     : {sklearn.__version__}")
print(f"Computer network name    : {platform.node()}")
print(f"Processor type           : {platform.processor()}")
print(f"Platform type            : {platform.platform()}")
print(f"Number of physical cores : {psutil.cpu_count(logical=False)}")
print(f"Number of logical cores  : {psutil.cpu_count(logical=True)}")
print(f"Memory total             : {round(psutil.virtual_memory().total / 1e9, 2)} GB")
# Environment information
# ==============================================================================
print(f"Python version           : {platform.python_version()}")
print(f"skforecast version       : {skforecast.__version__}")
print(f"numpy version            : {np.__version__}")
print(f"pandas version           : {pd.__version__}")
print(f"scikit-learn version     : {sklearn.__version__}")
print(f"Computer network name    : {platform.node()}")
print(f"Processor type           : {platform.processor()}")
print(f"Platform type            : {platform.platform()}")
print(f"Number of physical cores : {psutil.cpu_count(logical=False)}")
print(f"Number of logical cores  : {psutil.cpu_count(logical=True)}")
print(f"Memory total             : {round(psutil.virtual_memory().total / 1e9, 2)} GB")

Python version           : 3.12.11
skforecast version       : 0.17.0
numpy version            : 2.1.3
pandas version           : 2.3.1
scikit-learn version     : 1.6.1
Computer network name    : ITES015-NB0029
Processor type           : Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
Platform type            : Windows-11-10.0.26100-SP0
Number of physical cores : 8
Number of logical cores  : 16
Memory total             : 34.07 GB

In [3]:

Copied!





import warnings
warnings.filterwarnings(
    "ignore",
    category=FutureWarning,
    message="'force_all_finite' was renamed to 'ensure_all_finite'"
)
import warnings
warnings.filterwarnings(
    "ignore",
    category=FutureWarning,
    message="'force_all_finite' was renamed to 'ensure_all_finite'"
)

Global results¶

In [4]:

Copied!





# Load benchmark results
# ==============================================================================
results_benchmark_all = joblib.load("../../benchmarks/benchmark.joblib")
results_benchmark_all.head(2)
# Load benchmark results
# ==============================================================================
results_benchmark_all = joblib.load("../../benchmarks/benchmark.joblib")
results_benchmark_all.head(2)

Out[4]:

	forecaster_name	regressor_name	function_name	function_hash	method_name	run_time_avg	run_time_median	run_time_p95	run_time_std	n_repeats	datetime	python_version	skforecast_version	numpy_version	pandas_version	sklearn_version	lightgbm_version	platform	processor	cpu_count	memory_gb
0	ForecasterRecursive	DummyRegressor	ForecasterRecursive__create_train_X_y	59b823f1ff395872fac4f7578bd859fa	_create_train_X_y	0.004280	0.004191	0.004524	0.000344	30	2025-08-26 06:59:07.003048	3.12.11	0.17.0	2.1.3	2.3.2	1.6.1	4.6.0	Linux-6.11.0-1018-azure-x86_64-with-glibc2.39	x86_64	4	16.77
1	ForecasterRecursive	DummyRegressor	ForecasterRecursive_fit	9d73eaf5faa980194d715362601eed68	fit	0.005956	0.005379	0.008166	0.001107	10	2025-08-26 06:59:07.066656	3.12.11	0.17.0	2.1.3	2.3.2	1.6.1	4.6.0	Linux-6.11.0-1018-azure-x86_64-with-glibc2.39	x86_64	4	16.77

ForecasterRecursive¶

The ForecasterRecursive class is benchmarked under a fixed experimental setup to ensure fair comparison across library versions. Key conditions are:

Condition	Value
Regressor	`sklearn.dummy.DummyRegressor` (to isolate forecaster overhead)
Dataset length	2000 synthetic observations (generated with `bench_forecaster_recursive._make_data`)
Lags as predictors	50
Exogenous features	3
Data transformation	`StandardScaler()` for time series features and exogenous variables
Prediction horizon	100 steps ahead
Backtesting split	1200 training, remaining for testing, step size = 50, no re-fitting (see guide)

Note: In versions < 0.14.0, ForecasterRecursive was named ForecasterAutoreg.

In [15]:

Copied!





# Plot benchmark results
# ==============================================================================
plot_benchmark_results(
    results_benchmark_all,
    forecaster_names = ['ForecasterRecursive', 'ForecasterAutoreg'],
    regressors       = ['DummyRegressor'],
    add_mean         = True,
    add_median       = True
)
# Plot benchmark results
# ==============================================================================
plot_benchmark_results(
    results_benchmark_all,
    forecaster_names = ['ForecasterRecursive', 'ForecasterAutoreg'],
    regressors       = ['DummyRegressor'],
    add_mean         = True,
    add_median       = True
)

ForecasterDirect¶

The ForecasterDirect class is benchmarked under a fixed experimental setup to ensure fair comparison across library versions. Key conditions are:

Condition	Value
Regressor	`sklearn.dummy.DummyRegressor` (to isolate forecaster overhead)
Dataset length	2000 synthetic observations (generated with `bench_forecaster_direct._make_data`)
Lags as predictors	20
Exogenous features	3
Data transformation	`StandardScaler()` for time series features and exogenous variables
Prediction horizon	10 steps ahead (10 regressors)
Backtesting split	1200 training, remaining for testing, step size = 10, no re-fitting (see guide)

Note: In versions < 0.14.0, ForecasterDirect was named ForecasterAutoregDirect.

In [12]:

Copied!





# Plot benchmark results
# ==============================================================================
plot_benchmark_results(
    results_benchmark_all,
    forecaster_names = ['ForecasterDirect', 'ForecasterAutoregDirect'],
    regressors       = ['DummyRegressor'],
    add_mean         = True,
    add_median       = True
)
# Plot benchmark results
# ==============================================================================
plot_benchmark_results(
    results_benchmark_all,
    forecaster_names = ['ForecasterDirect', 'ForecasterAutoregDirect'],
    regressors       = ['DummyRegressor'],
    add_mean         = True,
    add_median       = True
)

ForecasterRecursiveMultiSeries¶

The ForecasterRecursiveMultiSeries class is benchmarked under a fixed experimental setup to ensure fair comparison across library versions. Key conditions are:

Condition	Value
Regressor	`sklearn.dummy.DummyRegressor` (to isolate forecaster overhead)
Number of series	600 time series
Dataset length	2000 synthetic observations per series (generated with `bench_forecaster_recursive_multiseries._make_data`)
Lags as predictors	50
Exogenous features	3
Data transformation	`StandardScaler()` for time series features and exogenous variables
Prediction horizon	100 steps ahead
Backtesting split	1200 training, remaining for testing, step size = 50, no re-fitting (see guide)

Note: In versions < 0.14.0, ForecasterRecursiveMultiSeries was named ForecasterAutoregMultiSeries.

In [13]:

Copied!





# Plot benchmark results
# ==============================================================================
plot_benchmark_results(
    results_benchmark_all,
    forecaster_names = ['ForecasterRecursiveMultiSeries', 'ForecasterAutoregMultiSeries'],
    regressors       = ['DummyRegressor'],
    add_mean         = True,
    add_median       = True
)
# Plot benchmark results
# ==============================================================================
plot_benchmark_results(
    results_benchmark_all,
    forecaster_names = ['ForecasterRecursiveMultiSeries', 'ForecasterAutoregMultiSeries'],
    regressors       = ['DummyRegressor'],
    add_mean         = True,
    add_median       = True
)

ForecasterDirectMultiVariate¶

The ForecasterDirectMultiVariate class is benchmarked under a fixed experimental setup to ensure fair comparison across library versions. Key conditions are:

Condition	Value
Regressor	`sklearn.dummy.DummyRegressor` (to isolate forecaster overhead)
Number of series	13 time series
Dataset length	1000 synthetic observations (generated with `bench_forecaster_direct_multivariate._make_data`)
Lags as predictors	20
Exogenous features	3
Data transformation	`StandardScaler()` for time series features and exogenous variables
Prediction horizon	10 steps ahead (10 regressors)
Backtesting split	900 training, remaining for testing, step size = 10, no re-fitting (see guide)

Note: In versions < 0.14.0, ForecasterDirectMultiVariate was named ForecasterAutoregMultiVariate.

In [14]:

Copied!





# Plot benchmark results
# ==============================================================================
plot_benchmark_results(
    results_benchmark_all,
    forecaster_names = ['ForecasterDirectMultiVariate', 'ForecasterAutoregMultiVariate'],
    regressors       = ['DummyRegressor'],
    add_mean         = True,
    add_median       = True
)
# Plot benchmark results
# ==============================================================================
plot_benchmark_results(
    results_benchmark_all,
    forecaster_names = ['ForecasterDirectMultiVariate', 'ForecasterAutoregMultiVariate'],
    regressors       = ['DummyRegressor'],
    add_mean         = True,
    add_median       = True
)

Plot function¶

In [10]:

Copied!





#  Plot function
# ==============================================================================
from __future__ import annotations
import numpy as np
import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
from plotly.express.colors import qualitative
from packaging.version import Version, InvalidVersion
pio.renderers.default = "notebook_connected"


def plot_benchmark_results(
    df: pd.DataFrame, 
    forecaster_names: str | list[str], 
    regressors: str | list[str] | None = None, 
    add_median: bool = True, 
    add_mean: bool = True
) -> None:
    """
    Plot interactive benchmark results by method and package version.

    This function renders an interactive Plotly chart that visualizes execution
    times for multiple methods across `skforecast` versions. Each data point
    represents a run (or an aggregated run) for a given `(method, version)` and is
    jittered horizontally to avoid overlap. Points are **colored by version** and
    (optionally) per-version **median** and **mean** lines are overlaid for the
    selected method. A dropdown allows switching the visible method.

    Parameters
    ----------
    df : pandas DataFrame
        Input data to benchmark.
    forecaster_names : str, list 
        Forecaster(s) to filter in `df` (column `forecaster_name`).
    regressors : str, list, default None
        Regressor(s) to filter in `df` (column `regressor_name`). If `None`,
        no additional filtering by regressor is applied.
    add_median : bool, default True
        If `True`, draw one per-version **median** line for the selected method.
    add_mean : bool, default True
        If `True`, draw one per-version **mean** line for the selected method.
    
    Returns
    -------
    plotly.graph_objects.Figure
        A Plotly figure object containing the benchmark results.

    """

    if not isinstance(forecaster_names, list):
        forecaster_names = [forecaster_names]
    df = df.query("forecaster_name in @forecaster_names")

    if regressors is not None:
        if not isinstance(regressors, list):
            regressors = [regressors]
        df = df.query("regressor_name in @regressors")

    def try_version(v):
        try: 
            return Version(str(v).lstrip('v'))
        except InvalidVersion: 
            return Version("0")  # fallback

    versions_sorted = sorted(df['skforecast_version'].unique(), key=try_version)
    version_to_num = {v: i for i, v in enumerate(versions_sorted)}
    df['skforecast_version'] = pd.Categorical(
        df['skforecast_version'], 
        categories=versions_sorted, 
        ordered=True
    )
    df['x_jittered'] = (
        df['skforecast_version'].map(version_to_num).astype(float) + 
        np.random.uniform(-0.05, 0.05, size=len(df))
    )

    # --- paleta por versión ---
    version_colors = {
        v: qualitative.Plotly[i % len(qualitative.Plotly)] 
        for i, v in enumerate(versions_sorted)
    }

    methods = list(df['method_name'].unique())

    fig = go.Figure()

    # --- un trace por (método, métrica); colores por versión en los puntos ---
    method_to_traces = {m: [] for m in methods}
    for i, m in enumerate(methods):
        for version in versions_sorted:

            sub_df = df[
                (df['method_name'] == m) & 
                (df['skforecast_version'] == version)
            ]
            if sub_df.empty:
                continue

            marker = dict(
                size=10,
                color=version_colors[version],
                opacity=0.85,
                line=dict(color="white", width=1)
            )   

            error_y = dict(
                type='data', 
                array=sub_df['run_time_std'].to_numpy(), 
                visible=not sub_df['run_time_std'].isna().all(),
                color=version_colors[version],
                thickness=1.5,
                width=5
            )

            fig.add_trace(go.Scatter(
                x=sub_df['x_jittered'],
                y=sub_df['run_time_avg'],
                mode='markers',
                marker=marker,
                error_y=error_y,
                visible=(i == 0),
                # name=f"{methods[i]} — {label}",
                text = sub_df.apply(lambda row: (
                    f"Forecaster: {row['forecaster_name']}<br>"
                    f"Regressor: {row['regressor_name']}<br>"
                    f"Function: {row['function_name']}<br>"
                    f"Function_hash: {row['function_hash']}<br>"
                    f"Method: {row['method_name']}<br>"
                    f"Datetime: {row['datetime']}<br>"
                    f"Python version: {row['python_version']}<br>"
                    f"skforecast version: {row['skforecast_version']}<br>"
                    f"numpy version: {row['numpy_version']}<br>"
                    f"pandas version: {row['pandas_version']}<br>"
                    f"sklearn version: {row['sklearn_version']}<br>"
                    f"lightgbm version: {row['lightgbm_version']}<br>"
                    f"Platform: {row['platform']}<br>"
                    f"Processor: {row['processor']}<br>"
                    f"CPU count: {row['cpu_count']}<br>"
                    f"Memory (GB): {row['memory_gb']:.2f}<br>"
                    f"Run time avg: {row['run_time_avg']:.6f} s<br>"
                    f"Run time median: {row['run_time_median']:.6f} s<br>"
                    f"Run time p95: {row['run_time_p95']:.6f} s<br>"
                    f"Run time std: {row['run_time_std']:.6f} s<br>"
                    f"Nº repeats: {row['n_repeats']}"
                ), axis=1),
                hovertemplate = '%{text}<extra></extra>'
            ))
            method_to_traces[m].append(len(fig.data) - 1)

    median_trace_id = {}
    if add_median:
        for i, m in enumerate(methods):

            med_df = (
                df[df['method_name'] == m]
                .groupby('skforecast_version', observed=True)['run_time_avg']
                .median()
                .reset_index()
            )
            if med_df.empty:
                continue

            median_color = "#374151"
            med_df['x_center'] = med_df['skforecast_version'].map(version_to_num)
            fig.add_trace(go.Scatter(
                x=med_df['x_center'],
                y=med_df['run_time_avg'],
                mode='lines+markers',
                line=dict(color=median_color, width=2),
                marker=dict(size=8, color=median_color),
                name='Median (per version)',
                visible=(i == 0)
            ))
            median_trace_id[m] = len(fig.data) - 1

    mean_trace_id = {}
    if add_mean:
        for i, m in enumerate(methods):

            mean_df = (
                df[df['method_name'] == m]
                .groupby('skforecast_version', observed=True)['run_time_avg']
                .mean()
                .reset_index()
            )
            if mean_df.empty:
                continue

            mean_color = "#9CA3AF"
            mean_df['x_center'] = mean_df['skforecast_version'].map(version_to_num)
            fig.add_trace(go.Scatter(
                x=mean_df['x_center'],
                y=mean_df['run_time_avg'],
                mode='lines+markers',
                line=dict(color=mean_color, width=2, dash='dash'),
                marker=dict(size=8, color=mean_color),
                name='Mean (per version)',
                visible=(i == 0)
            ))
            mean_trace_id[m] = len(fig.data) - 1

    def visible_mask_for(method):
        n = len(fig.data)
        mask = [False] * n

        # puntos del método (todas sus versiones)
        for idx in method_to_traces.get(method, []):
            mask[idx] = True
        # su mediana y media (si existen)
        if add_median and method in median_trace_id:
            mask[median_trace_id[method]] = True
        if add_mean and method in mean_trace_id:
            mask[mean_trace_id[method]] = True

        return mask

    buttons_methods = []
    for i, m in enumerate(methods):
        buttons_methods.append(dict(
            label=m,
            method="update",
            args=[
                {"visible": visible_mask_for(m)}, 
                {"title": {"text": f"Execution time — method: `{m}`"}}
            ]
        ))

    fig.update_layout(
        title=dict(
            text=f"Execution time — method: `{methods[0]}`"
        ),
        xaxis=dict(
            tickmode="array",
            tickvals=list(version_to_num.values()),
            ticktext=list(version_to_num.keys()),
            title="skforecast version",
            tickangle=0,
            automargin=True,
        ),
        yaxis=dict(
            title="Execution time (s)", 
            automargin=True
        ),
        # template="plotly_white",
        margin=dict(l=70, r=20, t=100, b=60), 
        updatemenus=[
            dict(
                type="dropdown",
                buttons=buttons_methods,
                showactive=True,
                direction="down",
                x=1.00,
                y=1.03,
                xanchor="right",
                yanchor="bottom",
                pad={"r": 2, "t": 0},
            ),
        ],
        legend=dict(title=""),
        showlegend=False,
    )

    fig.show()
    
    # Show in web (other option)
    # from IPython.display import HTML
    # return HTML(fig.to_html(full_html=False, include_plotlyjs="cdn"))
#  Plot function
# ==============================================================================
from __future__ import annotations
import numpy as np
import pandas as pd
import plotly.io as pio
import plotly.graph_objects as go
from plotly.express.colors import qualitative
from packaging.version import Version, InvalidVersion
pio.renderers.default = "notebook_connected"


def plot_benchmark_results(
    df: pd.DataFrame, 
    forecaster_names: str | list[str], 
    regressors: str | list[str] | None = None, 
    add_median: bool = True, 
    add_mean: bool = True
) -> None:
    """
    Plot interactive benchmark results by method and package version.

    This function renders an interactive Plotly chart that visualizes execution
    times for multiple methods across `skforecast` versions. Each data point
    represents a run (or an aggregated run) for a given `(method, version)` and is
    jittered horizontally to avoid overlap. Points are **colored by version** and
    (optionally) per-version **median** and **mean** lines are overlaid for the
    selected method. A dropdown allows switching the visible method.

    Parameters
    ----------
    df : pandas DataFrame
        Input data to benchmark.
    forecaster_names : str, list 
        Forecaster(s) to filter in `df` (column `forecaster_name`).
    regressors : str, list, default None
        Regressor(s) to filter in `df` (column `regressor_name`). If `None`,
        no additional filtering by regressor is applied.
    add_median : bool, default True
        If `True`, draw one per-version **median** line for the selected method.
    add_mean : bool, default True
        If `True`, draw one per-version **mean** line for the selected method.
    
    Returns
    -------
    plotly.graph_objects.Figure
        A Plotly figure object containing the benchmark results.

    """

    if not isinstance(forecaster_names, list):
        forecaster_names = [forecaster_names]
    df = df.query("forecaster_name in @forecaster_names")

    if regressors is not None:
        if not isinstance(regressors, list):
            regressors = [regressors]
        df = df.query("regressor_name in @regressors")

    def try_version(v):
        try: 
            return Version(str(v).lstrip('v'))
        except InvalidVersion: 
            return Version("0")  # fallback

    versions_sorted = sorted(df['skforecast_version'].unique(), key=try_version)
    version_to_num = {v: i for i, v in enumerate(versions_sorted)}
    df['skforecast_version'] = pd.Categorical(
        df['skforecast_version'], 
        categories=versions_sorted, 
        ordered=True
    )
    df['x_jittered'] = (
        df['skforecast_version'].map(version_to_num).astype(float) + 
        np.random.uniform(-0.05, 0.05, size=len(df))
    )

    # --- paleta por versión ---
    version_colors = {
        v: qualitative.Plotly[i % len(qualitative.Plotly)] 
        for i, v in enumerate(versions_sorted)
    }

    methods = list(df['method_name'].unique())

    fig = go.Figure()

    # --- un trace por (método, métrica); colores por versión en los puntos ---
    method_to_traces = {m: [] for m in methods}
    for i, m in enumerate(methods):
        for version in versions_sorted:

            sub_df = df[
                (df['method_name'] == m) & 
                (df['skforecast_version'] == version)
            ]
            if sub_df.empty:
                continue

            marker = dict(
                size=10,
                color=version_colors[version],
                opacity=0.85,
                line=dict(color="white", width=1)
            )   

            error_y = dict(
                type='data', 
                array=sub_df['run_time_std'].to_numpy(), 
                visible=not sub_df['run_time_std'].isna().all(),
                color=version_colors[version],
                thickness=1.5,
                width=5
            )

            fig.add_trace(go.Scatter(
                x=sub_df['x_jittered'],
                y=sub_df['run_time_avg'],
                mode='markers',
                marker=marker,
                error_y=error_y,
                visible=(i == 0),
                # name=f"{methods[i]} — {label}",
                text = sub_df.apply(lambda row: (
                    f"Forecaster: {row['forecaster_name']}
"
                    f"Regressor: {row['regressor_name']}
"
                    f"Function: {row['function_name']}
"
                    f"Function_hash: {row['function_hash']}
"
                    f"Method: {row['method_name']}
"
                    f"Datetime: {row['datetime']}
"
                    f"Python version: {row['python_version']}
"
                    f"skforecast version: {row['skforecast_version']}
"
                    f"numpy version: {row['numpy_version']}
"
                    f"pandas version: {row['pandas_version']}
"
                    f"sklearn version: {row['sklearn_version']}
"
                    f"lightgbm version: {row['lightgbm_version']}
"
                    f"Platform: {row['platform']}
"
                    f"Processor: {row['processor']}
"
                    f"CPU count: {row['cpu_count']}
"
                    f"Memory (GB): {row['memory_gb']:.2f}
"
                    f"Run time avg: {row['run_time_avg']:.6f} s
"
                    f"Run time median: {row['run_time_median']:.6f} s
"
                    f"Run time p95: {row['run_time_p95']:.6f} s
"
                    f"Run time std: {row['run_time_std']:.6f} s
"
                    f"Nº repeats: {row['n_repeats']}"
                ), axis=1),
                hovertemplate = '%{text}'
            ))
            method_to_traces[m].append(len(fig.data) - 1)

    median_trace_id = {}
    if add_median:
        for i, m in enumerate(methods):

            med_df = (
                df[df['method_name'] == m]
                .groupby('skforecast_version', observed=True)['run_time_avg']
                .median()
                .reset_index()
            )
            if med_df.empty:
                continue

            median_color = "#374151"
            med_df['x_center'] = med_df['skforecast_version'].map(version_to_num)
            fig.add_trace(go.Scatter(
                x=med_df['x_center'],
                y=med_df['run_time_avg'],
                mode='lines+markers',
                line=dict(color=median_color, width=2),
                marker=dict(size=8, color=median_color),
                name='Median (per version)',
                visible=(i == 0)
            ))
            median_trace_id[m] = len(fig.data) - 1

    mean_trace_id = {}
    if add_mean:
        for i, m in enumerate(methods):

            mean_df = (
                df[df['method_name'] == m]
                .groupby('skforecast_version', observed=True)['run_time_avg']
                .mean()
                .reset_index()
            )
            if mean_df.empty:
                continue

            mean_color = "#9CA3AF"
            mean_df['x_center'] = mean_df['skforecast_version'].map(version_to_num)
            fig.add_trace(go.Scatter(
                x=mean_df['x_center'],
                y=mean_df['run_time_avg'],
                mode='lines+markers',
                line=dict(color=mean_color, width=2, dash='dash'),
                marker=dict(size=8, color=mean_color),
                name='Mean (per version)',
                visible=(i == 0)
            ))
            mean_trace_id[m] = len(fig.data) - 1

    def visible_mask_for(method):
        n = len(fig.data)
        mask = [False] * n

        # puntos del método (todas sus versiones)
        for idx in method_to_traces.get(method, []):
            mask[idx] = True
        # su mediana y media (si existen)
        if add_median and method in median_trace_id:
            mask[median_trace_id[method]] = True
        if add_mean and method in mean_trace_id:
            mask[mean_trace_id[method]] = True

        return mask

    buttons_methods = []
    for i, m in enumerate(methods):
        buttons_methods.append(dict(
            label=m,
            method="update",
            args=[
                {"visible": visible_mask_for(m)}, 
                {"title": {"text": f"Execution time — method: `{m}`"}}
            ]
        ))

    fig.update_layout(
        title=dict(
            text=f"Execution time — method: `{methods[0]}`"
        ),
        xaxis=dict(
            tickmode="array",
            tickvals=list(version_to_num.values()),
            ticktext=list(version_to_num.keys()),
            title="skforecast version",
            tickangle=0,
            automargin=True,
        ),
        yaxis=dict(
            title="Execution time (s)", 
            automargin=True
        ),
        # template="plotly_white",
        margin=dict(l=70, r=20, t=100, b=60), 
        updatemenus=[
            dict(
                type="dropdown",
                buttons=buttons_methods,
                showactive=True,
                direction="down",
                x=1.00,
                y=1.03,
                xanchor="right",
                yanchor="bottom",
                pad={"r": 2, "t": 0},
            ),
        ],
        legend=dict(title=""),
        showlegend=False,
    )

    fig.show()
    
    # Show in web (other option)
    # from IPython.display import HTML
    # return HTML(fig.to_html(full_html=False, include_plotlyjs="cdn"))