Skip to content

Kasba

polars_ts.clustering.kasba

KASBA: K-means with Accelerated Stochastic Barycenter Averaging.

Uses MSM (Move-Split-Merge) elastic distance — a proper metric that enables triangle inequality pruning for 10-30x speedup over aeon.

KASBAClusterer

KASBA clustering for univariate and multivariate time series.

Parameters

n_clusters Number of clusters. Default 8. c MSM cost coefficient. Default 1.0. independent If True, compute MSM per-channel independently (sum). If False, use dependent cross-channel MSM. Default True. ba_subset_size Proportion of cluster members used in barycenter SGD. Default 0.5. initial_step_size SGD initial learning rate. Default 0.05. decay_rate Exponential decay rate for step size. Default 0.1. max_iter Maximum k-means iterations. Default 10. seed Random seed for reproducibility. Default 42.

fit(df, id_col='unique_id', target_col='y', channel_col=None)

Fit KASBA clustering.

Parameters

df DataFrame with time series data. id_col Column identifying each time series. target_col Column with the time series values. channel_col Column identifying channels for multivariate series. If None, assumes univariate data.

predict(df, id_col='unique_id', target_col='y', channel_col=None)

Predict cluster assignments for new data.

Parameters

df DataFrame with time series data.

Returns

pl.DataFrame DataFrame with columns [id_col, "cluster"].

_to_3d_array(df, id_col, target_col, channel_col)

Convert DataFrame to 3D numpy array (n_cases, n_channels, n_timepoints).

kasba(df, k, *, c=1.0, independent=True, max_iter=10, seed=42, id_col='unique_id', target_col='y', channel_col=None, **kwargs)

KASBA clustering convenience function.

Parameters

df DataFrame with time series data. k Number of clusters. c MSM cost coefficient. Default 1.0. independent MSM mode. Default True. max_iter Maximum iterations. Default 10. seed Random seed. Default 42. id_col Series identifier column. Default "unique_id". target_col Values column. Default "y". channel_col Channel column for multivariate data. Default None.

Returns

pl.DataFrame DataFrame with columns [id_col, "cluster"].