Kaboudan

`polars_ts.metrics.kaboudan`

Kaboudan Metrics Module.

Provides the Kaboudan class for computing Kaboudan and modified Kaboudan metrics to evaluate time series forecasting models using backtesting and block shuffling techniques.

`Kaboudan` `dataclass`

A class for computing the Kaboudan and modified Kaboudan metrics.

It uses StatsForecast for backtesting and block shuffling operations to measure model performance under controlled perturbations.

Attributes:

Name	Type	Description
`sf`	`StatsForecast`	StatsForecast instance for model training and evaluation.
`backtesting_start`	`float`	Fraction of the data used as the initial training set.
`n_folds`	`int`	Number of backtesting folds (rolling-origin windows).
`block_size`	`int`	Size of each block used during block-based shuffling.
`seed`	`int`	Random seed for reproducible shuffling. Defaults to 42.
`id_col`	`str`	Name of the column identifying each time series group. Defaults to `unique_id`.
`time_col`	`str`	Name of the column representing the chronological axis. Defaults to `ds`.
`value_col`	`str`	Name of the column representing the target variable. Defaults to `y`.
`modified`	`bool`	Whether to use the modified Kaboudan metric, which applies clipping to zero. Defaults to `True`.
`agg`	`bool`	Whether to average the metrics over all the individual time series or not. Defaults to `True`.

`block_shuffle_by_id(df)`

Randomly shuffles rows in fixed-size blocks within each group identified by id_col.

This method sorts the data by id_col and then by time_col. For each group:

A zero-based row index (__row_in_group) is assigned using cum_count().
The method determines the number of blocks (num_blocks) by dividing the number of rows in the first group by self.block_size and forcing at least one block.
Each row is assigned a __chunk_id based on integer division of __row_in_group by num_blocks.
The DataFrame is then partitioned by both id_col and __chunk_id, producing blocks.
These blocks are randomly shuffled, concatenated, and finally re-sorted by id_col and time_col within each group.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	A Polars DataFrame containing at least `id_col`, `time_col`, and `value_col`.	required

Returns:

Type	Description
`DataFrame`	A new DataFrame in which each group's rows are rearranged by randomly shuffling the entire blocks. The shuffle is reproducible if a seed is set (`self.seed`).

`split_in_blocks_by_id(df)`

Split each group's time series into n_folds sequential blocks.

First, the DataFrame is sorted by id_col and time_col. Then, for each group (identified by id_col), a zero-based row index is assigned in row_index. Finally, block is computed by scaling row_index by the ratio (n_folds / group_size) for that group, flooring the result, and shifting by 1 to make blocks range from 1 to n_folds.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	A DataFrame containing columns matching `id_col`, `time_col`, and `value_col`.	required

Returns:

Type	Description
`DataFrame`	A new DataFrame with one additional `block` column.

`backtest(df)`

Perform rolling-origin backtesting on the provided DataFrame using cross-validation.

This method implements a multi-step cross-validation approach by:

Computing the minimal series length among all groups in the DataFrame.
Determining the initial training length (history_len) as backtesting_start * min_len, and setting the test length (test_len) as the remainder.
Dividing the test portion into n_folds sequential segments. Each segment length determines the forecast horizon (h) and step_size.
Calling StatsForecast's cross_validation() method with h and step_size both equal to the segment length.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	A Polars DataFrame that must contain at least the columns `id_col`, `time_col`, and `value_col`.	required

Returns:

Type	Description
`DataFrame`	A Polars DataFrame (or Series) of root mean squared error (RMSE) values, averaged across the rolling-origin folds for each model. Columns represent different models.

`kaboudan_metric(df)`

Compute the Kaboudan Metric by comparing model errors before and after block-based shuffling.

This method first calculates a baseline error using backtest. Then it applies block_shuffle_by_id to shuffle each group's rows, re-performs backtest on the shuffled data, and compares the two sets of errors. The final metric indicates how much performance degrades due to the block shuffle.

Steps:

Compute the baseline RMSE (sse_before) for the unshuffled data.
Shuffle the data in blocks (block_shuffle_by_id).
Compute the RMSE (sse_after) of the shuffled data.
Compute the ratio sse_before / sse_after and transform it by (1 - sqrt(ratio)).

If modified is True, the resulting metric is clipped at 0 to avoid negative values.

Parameters:

Name	Type	Description	Default
`df`	`DataFrame`	A Polars DataFrame with columns for `id_col`, `time_col`, and `value_col`.	required

Returns:

Type	Description
`DataFrame`	A Polars DataFrame containing columns of Kaboudan Metric values for each model. If `modified` is True, negative values are clipped to zero.

Kaboudan

polars_ts.metrics.kaboudan

Kaboudan dataclass

block_shuffle_by_id(df)

split_in_blocks_by_id(df)

backtest(df)

kaboudan_metric(df)

`polars_ts.metrics.kaboudan`

`Kaboudan` `dataclass`

`block_shuffle_by_id(df)`

`split_in_blocks_by_id(df)`

`backtest(df)`

`kaboudan_metric(df)`