Skip to content

Density

polars_ts.clustering.density

Density-based clustering (HDBSCAN / DBSCAN) for time series.

Computes pairwise distances via the existing Rust-accelerated distance engine and passes the precomputed matrix to scikit-learn's implementations.

_build_square_matrix(dist_dict, ids)

Convert symmetric distance dict to a square numpy matrix.

hdbscan_cluster(df, method='dtw', min_cluster_size=5, min_samples=None, id_col='unique_id', target_col='y', **distance_kwargs)

HDBSCAN clustering over time series using precomputed distances.

Parameters

df DataFrame with columns id_col and target_col. method Distance metric name (e.g. "dtw", "erp", "lcss"). min_cluster_size Minimum cluster size for HDBSCAN. min_samples Number of samples in a neighbourhood for a point to be a core point. Defaults to min_cluster_size when None. id_col Column identifying each time series. target_col Column with the time series values. **distance_kwargs Extra keyword arguments forwarded to the distance function.

Returns

pl.DataFrame DataFrame with columns [id_col, "cluster"]. Noise points are labelled -1.

dbscan_cluster(df, method='dtw', eps=0.5, min_samples=5, id_col='unique_id', target_col='y', **distance_kwargs)

DBSCAN clustering over time series using precomputed distances.

Parameters

df DataFrame with columns id_col and target_col. method Distance metric name (e.g. "dtw", "erp", "lcss"). eps Maximum distance between two samples in the same neighbourhood. min_samples Number of samples in a neighbourhood for a point to be a core point. id_col Column identifying each time series. target_col Column with the time series values. **distance_kwargs Extra keyword arguments forwarded to the distance function.

Returns

pl.DataFrame DataFrame with columns [id_col, "cluster"]. Noise points are labelled -1.

_compute_distance_matrix(df, method, id_col, target_col, **distance_kwargs)

Shared helper: compute pairwise distances and return (sorted ids, square matrix).