Skip to content

Kmeans

polars_ts.clustering.kmeans

K-Means clustering for time series with DTW Barycentric Averaging (DBA).

Uses DTW-based distances for the assignment step and DBA for the centroid update step, producing synthetic centroids that better represent cluster averages than medoid-based approaches.

TimeSeriesKMeans

K-Means clustering for time series using DBA centroids.

Parameters

n_clusters Number of clusters. Default 2. metric Distance metric name. Currently only "dtw" is supported (DBA requires DTW alignment paths). Default "dtw". max_iter Maximum number of k-means iterations. Default 50. dba_max_iter Maximum DBA refinement iterations per centroid update. Default 30. seed Random seed for initial centroid selection. Default 42. **distance_kwargs Extra keyword arguments forwarded to the distance function.

fit(df, id_col='unique_id', target_col='y')

Fit k-means clustering with DBA centroids.

Parameters

df DataFrame with columns id_col and target_col. id_col Column identifying each time series. target_col Column with the time series values.

Returns

self

_assign(series_list, centroids)

Assign each series to the nearest centroid using DTW distance.

_dtw_distance(s, t) staticmethod

Compute DTW distance between two series.

_update_centroids(series_list, assignments)

Recompute centroids via DBA.

kmeans_dba(df, k, method='dtw', max_iter=50, seed=42, id_col='unique_id', target_col='y', **distance_kwargs)

K-Means clustering with DBA centroids (convenience function).

Parameters

df DataFrame with columns id_col and target_col. k Number of clusters. method Distance metric name (e.g. "dtw"). Default "dtw". max_iter Maximum k-means iterations. seed Random seed for reproducibility. id_col Column identifying each time series. target_col Column with the time series values. **distance_kwargs Extra keyword arguments forwarded to the distance function.

Returns

pl.DataFrame DataFrame with columns [id_col, "cluster"].