Skip to content

Kaboudan

polars_ts.metrics.kaboudan

Kaboudan Metrics Module.

Provides the Kaboudan class for computing Kaboudan and modified Kaboudan metrics to evaluate time series forecasting models using backtesting and block shuffling techniques.

Kaboudan dataclass

A class for computing the Kaboudan and modified Kaboudan metrics.

It uses StatsForecast for backtesting and block shuffling operations to measure model performance under controlled perturbations.

Attributes:

Name Type Description
sf StatsForecast

StatsForecast instance for model training and evaluation.

backtesting_start float

Fraction of the data used as the initial training set.

n_folds int

Number of backtesting folds (rolling-origin windows).

block_size int

Size of each block used during block-based shuffling.

seed int

Random seed for reproducible shuffling. Defaults to 42.

id_col str

Name of the column identifying each time series group. Defaults to unique_id.

time_col str

Name of the column representing the chronological axis. Defaults to ds.

value_col str

Name of the column representing the target variable. Defaults to y.

modified bool

Whether to use the modified Kaboudan metric, which applies clipping to zero. Defaults to True.

agg bool

Whether to average the metrics over all the individual time series or not. Defaults to True.

block_shuffle_by_id(df)

Randomly shuffles rows in fixed-size blocks within each group identified by id_col.

This method sorts the data by id_col and then by time_col. For each group:

  1. A zero-based row index (__row_in_group) is assigned using cum_count().
  2. The method determines the number of blocks (num_blocks) by dividing the number of rows in the first group by self.block_size and forcing at least one block.
  3. Each row is assigned a __chunk_id based on integer division of __row_in_group by num_blocks.
  4. The DataFrame is then partitioned by both id_col and __chunk_id, producing blocks.
  5. These blocks are randomly shuffled, concatenated, and finally re-sorted by id_col and time_col within each group.

Parameters:

Name Type Description Default
df DataFrame

A Polars DataFrame containing at least id_col, time_col, and value_col.

required

Returns:

Type Description
DataFrame

A new DataFrame in which each group's rows are rearranged by randomly shuffling the entire blocks. The shuffle is reproducible if a seed is set (self.seed).

split_in_blocks_by_id(df)

Split each group's time series into n_folds sequential blocks.

First, the DataFrame is sorted by id_col and time_col. Then, for each group (identified by id_col), a zero-based row index is assigned in row_index. Finally, block is computed by scaling row_index by the ratio (n_folds / group_size) for that group, flooring the result, and shifting by 1 to make blocks range from 1 to n_folds.

Parameters:

Name Type Description Default
df DataFrame

A DataFrame containing columns matching id_col, time_col, and value_col.

required

Returns:

Type Description
DataFrame

A new DataFrame with one additional block column.

backtest(df)

Perform rolling-origin backtesting on the provided DataFrame using cross-validation.

This method implements a multi-step cross-validation approach by:

  1. Computing the minimal series length among all groups in the DataFrame.
  2. Determining the initial training length (history_len) as backtesting_start * min_len, and setting the test length (test_len) as the remainder.
  3. Dividing the test portion into n_folds sequential segments. Each segment length determines the forecast horizon (h) and step_size.
  4. Calling StatsForecast's cross_validation() method with h and step_size both equal to the segment length.

Parameters:

Name Type Description Default
df DataFrame

A Polars DataFrame that must contain at least the columns id_col, time_col, and value_col.

required

Returns:

Type Description
DataFrame

A Polars DataFrame (or Series) of root mean squared error (RMSE) values, averaged across the rolling-origin folds for each model. Columns represent different models.

kaboudan_metric(df)

Compute the Kaboudan Metric by comparing model errors before and after block-based shuffling.

This method first calculates a baseline error using backtest. Then it applies block_shuffle_by_id to shuffle each group's rows, re-performs backtest on the shuffled data, and compares the two sets of errors. The final metric indicates how much performance degrades due to the block shuffle.

Steps:

  1. Compute the baseline RMSE (sse_before) for the unshuffled data.
  2. Shuffle the data in blocks (block_shuffle_by_id).
  3. Compute the RMSE (sse_after) of the shuffled data.
  4. Compute the ratio sse_before / sse_after and transform it by (1 - sqrt(ratio)).

If modified is True, the resulting metric is clipped at 0 to avoid negative values.

Parameters:

Name Type Description Default
df DataFrame

A Polars DataFrame with columns for id_col, time_col, and value_col.

required

Returns:

Type Description
DataFrame

A Polars DataFrame containing columns of Kaboudan Metric values for each model. If modified is True, negative values are clipped to zero.