Backtest
polars_ts.backtesting.backtest
Unified backtesting pipeline for time series forecasting models.
Runs a model through cross-validation folds, collects per-fold metrics, and provides aggregated summaries and optional per-horizon breakdowns.
_evaluate_fold(model, train_df, test_df, h, metrics, actual_col, predicted_col, id_col, time_col)
Fit a model on train_df, predict on test_df, compute metrics.
_per_horizon_scores(model, train_df, test_df, h, metrics, actual_col, predicted_col, id_col, time_col)
Return one row per horizon step with metric scores.
backtest(model, cv, metrics, *, h=None, actual_col='y', predicted_col='y_hat', id_col='unique_id', time_col='ds', n_jobs=1, return_predictions=False, per_horizon=False)
Run a model through cross-validation folds and collect metrics.
Parameters
model
Any object with fit(df) and predict(df, h=...) methods
(e.g. ForecastPipeline, GlobalForecaster).
cv
A cross-validation generator yielding (train_df, test_df) tuples.
Use expanding_window_cv, sliding_window_cv, or
rolling_origin_cv.
metrics
Mapping of metric name to callable. Each callable must accept
(df, actual_col=, predicted_col=) and return a float or
per-series DataFrame.
h
Forecast horizon. If None, inferred from the first test fold.
actual_col
Column with actual values.
predicted_col
Column name for predictions (internal use).
id_col
Column identifying each time series.
time_col
Column with timestamps.
n_jobs
Number of parallel workers for fold evaluation. 1 (default)
runs sequentially.
return_predictions
If True, include a "predictions" key with per-fold
forecasts concatenated.
per_horizon
If True, include a "per_horizon" key with metric
breakdowns by forecast step.
Returns
dict[str, pl.DataFrame] Always contains:
- ``"fold_scores"`` — one row per fold with metric columns.
- ``"summary"`` — mean and std of each metric across folds.
Optionally:
- ``"per_horizon"`` — metric scores broken down by horizon step.
- ``"predictions"`` — concatenated predictions with fold column.
compare_models(models, df, cv, cv_kwargs, metrics, *, h=None, actual_col='y', predicted_col='y_hat', id_col='unique_id', time_col='ds', n_jobs=1)
Compare multiple models using the same cross-validation setup.
Parameters
models
Mapping of model name to model object.
df
Full dataset.
cv
A CV splitter function (not generator) such as
expanding_window_cv. Called once per model with df
and **cv_kwargs.
cv_kwargs
Keyword arguments passed to cv(df, **cv_kwargs).
metrics
Metric name to callable mapping.
h
Forecast horizon (inferred from folds if None).
actual_col
Column with actual values.
predicted_col
Column name for predictions.
id_col
Column identifying each time series.
time_col
Column with timestamps.
n_jobs
Number of parallel workers per model.
Returns
dict[str, pl.DataFrame]
- "comparison" — one row per model with mean metric scores.
- "fold_scores" — per-fold scores for all models (with
model column).