Kaboudan
polars_ts.metrics.kaboudan
Kaboudan Metrics Module.
Provides the Kaboudan class for computing Kaboudan and modified Kaboudan metrics to evaluate
time series forecasting models using backtesting and block shuffling techniques.
Kaboudan
dataclass
A class for computing the Kaboudan and modified Kaboudan metrics.
It uses StatsForecast for backtesting and block shuffling operations to measure model performance under controlled perturbations.
Attributes:
| Name | Type | Description |
|---|---|---|
sf |
StatsForecast
|
StatsForecast instance for model training and evaluation. |
backtesting_start |
float
|
Fraction of the data used as the initial training set. |
n_folds |
int
|
Number of backtesting folds (rolling-origin windows). |
block_size |
int
|
Size of each block used during block-based shuffling. |
seed |
int
|
Random seed for reproducible shuffling. Defaults to 42. |
id_col |
str
|
Name of the column identifying each time series group. Defaults to |
time_col |
str
|
Name of the column representing the chronological axis. Defaults to |
value_col |
str
|
Name of the column representing the target variable. Defaults to |
modified |
bool
|
Whether to use the modified Kaboudan metric, which applies clipping to zero. Defaults to |
agg |
bool
|
Whether to average the metrics over all the individual time series or not. Defaults to |
block_shuffle_by_id(df)
Randomly shuffles rows in fixed-size blocks within each group identified by id_col.
This method sorts the data by id_col and then by time_col. For each group:
- A zero-based row index (
__row_in_group) is assigned usingcum_count(). - The method determines the number of blocks (
num_blocks) by dividing the number of rows in the first group byself.block_sizeand forcing at least one block. - Each row is assigned a
__chunk_idbased on integer division of__row_in_groupbynum_blocks. - The DataFrame is then partitioned by both
id_coland__chunk_id, producing blocks. - These blocks are randomly shuffled, concatenated, and finally re-sorted by
id_colandtime_colwithin each group.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
A Polars DataFrame containing at least |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A new DataFrame in which each group's rows are rearranged by randomly shuffling the entire blocks. The shuffle is reproducible if a seed is set ( |
split_in_blocks_by_id(df)
Split each group's time series into n_folds sequential blocks.
First, the DataFrame is sorted by id_col and time_col. Then, for each group (identified
by id_col), a zero-based row index is assigned in row_index. Finally, block is
computed by scaling row_index by the ratio (n_folds / group_size) for that group,
flooring the result, and shifting by 1 to make blocks range from 1 to n_folds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
A DataFrame containing columns matching |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A new DataFrame with one additional |
backtest(df)
Perform rolling-origin backtesting on the provided DataFrame using cross-validation.
This method implements a multi-step cross-validation approach by:
- Computing the minimal series length among all groups in the DataFrame.
- Determining the initial training length (
history_len) asbacktesting_start * min_len, and setting the test length (test_len) as the remainder. - Dividing the test portion into
n_foldssequential segments. Each segment length determines the forecast horizon (h) andstep_size. - Calling StatsForecast's
cross_validation()method withhandstep_sizeboth equal to the segment length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
A Polars DataFrame that must contain at least the columns |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A Polars DataFrame (or Series) of root mean squared error (RMSE) values, averaged across the rolling-origin folds for each model. Columns represent different models. |
kaboudan_metric(df)
Compute the Kaboudan Metric by comparing model errors before and after block-based shuffling.
This method first calculates a baseline error using backtest. Then it applies
block_shuffle_by_id to shuffle each group's rows, re-performs backtest on the shuffled data,
and compares the two sets of errors. The final metric indicates how much performance
degrades due to the block shuffle.
Steps:
- Compute the baseline RMSE (
sse_before) for the unshuffled data. - Shuffle the data in blocks (
block_shuffle_by_id). - Compute the RMSE (
sse_after) of the shuffled data. - Compute the ratio
sse_before / sse_afterand transform it by(1 - sqrt(ratio)).
If modified is True, the resulting metric is clipped at 0 to avoid negative values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
A Polars DataFrame with columns for |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A Polars DataFrame containing columns of Kaboudan Metric values for each model. If |