Kmeans
polars_ts.clustering.kmeans
K-Means clustering for time series with DTW Barycentric Averaging (DBA).
Uses DTW-based distances for the assignment step and DBA for the centroid update step, producing synthetic centroids that better represent cluster averages than medoid-based approaches.
TimeSeriesKMeans
K-Means clustering for time series using DBA centroids.
Parameters
n_clusters
Number of clusters. Default 2.
metric
Distance metric name. Currently only "dtw" is supported
(DBA requires DTW alignment paths). Default "dtw".
max_iter
Maximum number of k-means iterations. Default 50.
dba_max_iter
Maximum DBA refinement iterations per centroid update. Default 30.
seed
Random seed for initial centroid selection. Default 42.
**distance_kwargs
Extra keyword arguments forwarded to the distance function.
fit(df, id_col='unique_id', target_col='y')
Fit k-means clustering with DBA centroids.
Parameters
df
DataFrame with columns id_col and target_col.
id_col
Column identifying each time series.
target_col
Column with the time series values.
Returns
self
_assign(series_list, centroids)
Assign each series to the nearest centroid using DTW distance.
_dtw_distance(s, t)
staticmethod
Compute DTW distance between two series.
_update_centroids(series_list, assignments)
Recompute centroids via DBA.
kmeans_dba(df, k, method='dtw', max_iter=50, seed=42, id_col='unique_id', target_col='y', **distance_kwargs)
K-Means clustering with DBA centroids (convenience function).
Parameters
df
DataFrame with columns id_col and target_col.
k
Number of clusters.
method
Distance metric name (e.g. "dtw"). Default "dtw".
max_iter
Maximum k-means iterations.
seed
Random seed for reproducibility.
id_col
Column identifying each time series.
target_col
Column with the time series values.
**distance_kwargs
Extra keyword arguments forwarded to the distance function.
Returns
pl.DataFrame
DataFrame with columns [id_col, "cluster"].