Kaboudan
polars_ts.metrics.kaboudan
Kaboudan Metrics Module.
Provides the Kaboudan class for computing Kaboudan and modified Kaboudan metrics to evaluate
time series forecasting models using backtesting and block shuffling techniques.
Kaboudan
dataclass
A class for computing the Kaboudan and modified Kaboudan metrics.
It uses StatsForecast for backtesting and block shuffling operations to measure model performance under controlled perturbations.
Attributes:
| Name | Type | Description |
|---|---|---|
sf |
StatsForecast
|
StatsForecast instance for model training and evaluation. |
backtesting_start |
float
|
Fraction of the data used as the initial training set. |
n_folds |
int
|
Number of backtesting folds (rolling-origin windows). |
block_size |
int
|
Size of each block used during block-based shuffling. |
seed |
int
|
Random seed for reproducible shuffling. Defaults to 42. |
id_col |
str
|
Name of the column identifying each time series group. Defaults to |
time_col |
str
|
Name of the column representing the chronological axis. Defaults to |
value_col |
str
|
Name of the column representing the target variable. Defaults to |
modified |
bool
|
Whether to use the modified Kaboudan metric, which applies clipping to zero. Defaults to |
agg |
bool
|
Whether to average the metrics over all the individual time series or not. Defaults to |
Source code in polars_ts/metrics/kaboudan.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 | |
block_shuffle_by_id(df)
Randomly shuffles rows in fixed-size blocks within each group identified by id_col.
This method sorts the data by id_col and then by time_col. For each group:
- A zero-based row index (
__row_in_group) is assigned usingcum_count(). - The method determines the number of blocks (
num_blocks) by dividing the number of rows in the first group byself.block_sizeand forcing at least one block. - Each row is assigned a
__chunk_idbased on integer division of__row_in_groupbynum_blocks. - The DataFrame is then partitioned by both
id_coland__chunk_id, producing blocks. - These blocks are randomly shuffled, concatenated, and finally re-sorted by
id_colandtime_colwithin each group.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
A Polars DataFrame containing at least |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A new DataFrame in which each group's rows are rearranged by randomly shuffling the entire blocks. The shuffle is reproducible if a seed is set ( |
Source code in polars_ts/metrics/kaboudan.py
split_in_blocks_by_id(df)
Split each group's time series into n_folds sequential blocks.
First, the DataFrame is sorted by id_col and time_col. Then, for each group (identified
by id_col), a zero-based row index is assigned in row_index. Finally, block is
computed by scaling row_index by the ratio (n_folds / group_size) for that group,
flooring the result, and shifting by 1 to make blocks range from 1 to n_folds.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
A DataFrame containing columns matching |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A new DataFrame with one additional |
Source code in polars_ts/metrics/kaboudan.py
backtest(df)
Perform rolling-origin backtesting on the provided DataFrame using cross-validation.
This method implements a multi-step cross-validation approach by:
- Computing the minimal series length among all groups in the DataFrame.
- Determining the initial training length (
history_len) asbacktesting_start * min_len, and setting the test length (test_len) as the remainder. - Dividing the test portion into
n_foldssequential segments. Each segment length determines the forecast horizon (h) andstep_size. - Calling StatsForecast's
cross_validation()method withhandstep_sizeboth equal to the segment length.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
A Polars DataFrame that must contain at least the columns |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A Polars DataFrame (or Series) of root mean squared error (RMSE) values, averaged across the rolling-origin folds for each model. Columns represent different models. |
Source code in polars_ts/metrics/kaboudan.py
kaboudan_metric(df)
Compute the Kaboudan Metric by comparing model errors before and after block-based shuffling.
This method first calculates a baseline error using backtest. Then it applies
block_shuffle_by_id to shuffle each group's rows, re-performs backtest on the shuffled data,
and compares the two sets of errors. The final metric indicates how much performance
degrades due to the block shuffle.
Steps:
- Compute the baseline RMSE (
sse_before) for the unshuffled data. - Shuffle the data in blocks (
block_shuffle_by_id). - Compute the RMSE (
sse_after) of the shuffled data. - Compute the ratio
sse_before / sse_afterand transform it by(1 - sqrt(ratio)).
If modified is True, the resulting metric is clipped at 0 to avoid negative values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
A Polars DataFrame with columns for |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
A Polars DataFrame containing columns of Kaboudan Metric values for each model. If |