Kasba
polars_ts.clustering.kasba
KASBA: K-means with Accelerated Stochastic Barycenter Averaging.
Uses MSM (Move-Split-Merge) elastic distance — a proper metric that enables triangle inequality pruning for 10-30x speedup over aeon.
KASBAClusterer
KASBA clustering for univariate and multivariate time series.
Parameters
n_clusters Number of clusters. Default 8. c MSM cost coefficient. Default 1.0. independent If True, compute MSM per-channel independently (sum). If False, use dependent cross-channel MSM. Default True. ba_subset_size Proportion of cluster members used in barycenter SGD. Default 0.5. initial_step_size SGD initial learning rate. Default 0.05. decay_rate Exponential decay rate for step size. Default 0.1. max_iter Maximum k-means iterations. Default 10. seed Random seed for reproducibility. Default 42.
fit(df, id_col='unique_id', target_col='y', channel_col=None)
Fit KASBA clustering.
Parameters
df DataFrame with time series data. id_col Column identifying each time series. target_col Column with the time series values. channel_col Column identifying channels for multivariate series. If None, assumes univariate data.
predict(df, id_col='unique_id', target_col='y', channel_col=None)
Predict cluster assignments for new data.
Parameters
df DataFrame with time series data.
Returns
pl.DataFrame
DataFrame with columns [id_col, "cluster"].
_to_3d_array(df, id_col, target_col, channel_col)
Convert DataFrame to 3D numpy array (n_cases, n_channels, n_timepoints).
kasba(df, k, *, c=1.0, independent=True, max_iter=10, seed=42, id_col='unique_id', target_col='y', channel_col=None, **kwargs)
KASBA clustering convenience function.
Parameters
df
DataFrame with time series data.
k
Number of clusters.
c
MSM cost coefficient. Default 1.0.
independent
MSM mode. Default True.
max_iter
Maximum iterations. Default 10.
seed
Random seed. Default 42.
id_col
Series identifier column. Default "unique_id".
target_col
Values column. Default "y".
channel_col
Channel column for multivariate data. Default None.
Returns
pl.DataFrame
DataFrame with columns [id_col, "cluster"].