Backtest

`polars_ts.backtesting.backtest`

Unified backtesting pipeline for time series forecasting models.

Runs a model through cross-validation folds, collects per-fold metrics, and provides aggregated summaries and optional per-horizon breakdowns.

`_evaluate_fold(model, train_df, test_df, h, metrics, actual_col, predicted_col, id_col, time_col)`

Fit a model on train_df, predict on test_df, compute metrics.

`_per_horizon_scores(model, train_df, test_df, h, metrics, actual_col, predicted_col, id_col, time_col)`

Return one row per horizon step with metric scores.

`backtest(model, cv, metrics, *, h=None, actual_col='y', predicted_col='y_hat', id_col='unique_id', time_col='ds', n_jobs=1, return_predictions=False, per_horizon=False)`

Run a model through cross-validation folds and collect metrics.

Parameters

model Any object with fit(df) and predict(df, h=...) methods (e.g. ForecastPipeline, GlobalForecaster). cv A cross-validation generator yielding (train_df, test_df) tuples. Use expanding_window_cv, sliding_window_cv, or rolling_origin_cv. metrics Mapping of metric name to callable. Each callable must accept (df, actual_col=, predicted_col=) and return a float or per-series DataFrame. h Forecast horizon. If None, inferred from the first test fold. actual_col Column with actual values. predicted_col Column name for predictions (internal use). id_col Column identifying each time series. time_col Column with timestamps. n_jobs Number of parallel workers for fold evaluation. 1 (default) runs sequentially. return_predictions If True, include a "predictions" key with per-fold forecasts concatenated. per_horizon If True, include a "per_horizon" key with metric breakdowns by forecast step.

Returns

dict[str, pl.DataFrame] Always contains:

- ``"fold_scores"`` — one row per fold with metric columns.
- ``"summary"`` — mean and std of each metric across folds.

Optionally:

- ``"per_horizon"`` — metric scores broken down by horizon step.
- ``"predictions"`` — concatenated predictions with fold column.

`compare_models(models, df, cv, cv_kwargs, metrics, *, h=None, actual_col='y', predicted_col='y_hat', id_col='unique_id', time_col='ds', n_jobs=1)`

Compare multiple models using the same cross-validation setup.

Parameters

models Mapping of model name to model object. df Full dataset. cv A CV splitter function (not generator) such as expanding_window_cv. Called once per model with df and **cv_kwargs. cv_kwargs Keyword arguments passed to cv(df, **cv_kwargs). metrics Metric name to callable mapping. h Forecast horizon (inferred from folds if None). actual_col Column with actual values. predicted_col Column name for predictions. id_col Column identifying each time series. time_col Column with timestamps. n_jobs Number of parallel workers per model.

Returns

dict[str, pl.DataFrame] - "comparison" — one row per model with mean metric scores. - "fold_scores" — per-fold scores for all models (with model column).