Skip to content

Hierarchical

polars_ts.clustering.hierarchical

Agglomerative (hierarchical) clustering for time series.

Computes pairwise distances via the existing Rust-accelerated distance engine, converts to a condensed distance matrix, and delegates to scipy.cluster.hierarchy for linkage and tree cutting.

agglomerative_cluster(df, method='dtw', n_clusters=2, linkage_method='average', id_col='unique_id', target_col='y', *, return_linkage=False, **distance_kwargs)

agglomerative_cluster(df: pl.DataFrame, method: str = ..., n_clusters: int = ..., linkage_method: str = ..., id_col: str = ..., target_col: str = ..., *, return_linkage: Literal[False] = ..., **distance_kwargs: Any) -> pl.DataFrame
agglomerative_cluster(df: pl.DataFrame, method: str = ..., n_clusters: int = ..., linkage_method: str = ..., id_col: str = ..., target_col: str = ..., *, return_linkage: Literal[True] = ..., **distance_kwargs: Any) -> tuple[pl.DataFrame, np.ndarray]

Agglomerative (hierarchical) clustering over time series.

Parameters

df DataFrame with columns id_col and target_col. method Distance metric name (e.g. "dtw", "erp", "lcss"). n_clusters Number of clusters to produce. linkage_method Linkage criterion: "single", "complete", "average", or "weighted". id_col Column identifying each time series. target_col Column with the time series values. return_linkage If True, also return the linkage matrix (compatible with scipy.cluster.hierarchy.dendrogram). **distance_kwargs Extra keyword arguments forwarded to the distance function.

Returns

pl.DataFrame or (pl.DataFrame, np.ndarray) DataFrame with columns [id_col, "cluster"]. When return_linkage=True, a tuple of (labels, linkage_matrix) is returned.