Skip to content

Auto

polars_ts.clustering.auto

Automated clustering pipeline selection for time series.

Performs grid search over method x distance x k combinations and returns the best result according to the chosen evaluation metric.

AutoClusterResult dataclass

Result of an auto_cluster search.

Attributes

best_labels DataFrame with [id_col, "cluster"] for the best combination. best_method Name of the winning clustering method. best_distance Name of the winning distance metric. best_k Winning k (None for density-based methods). best_score Evaluation score of the winning combination. results_table DataFrame summarising every evaluated combination with columns ["method", "distance", "k", "score"].

_run_clustering(df, method, distance, k, id_col, target_col, seed, hdbscan_kwargs, dbscan_kwargs)

Run a single clustering method and return labels or None on failure.

_evaluate(df, labels, distance, metric, id_col, target_col)

Evaluate clustering quality. Returns None if evaluation fails.

auto_cluster(df, methods=None, distances=None, k_range=None, metric='silhouette', id_col='unique_id', target_col='y', seed=42, hdbscan_kwargs=None, dbscan_kwargs=None)

Automated clustering pipeline selection via grid search.

Enumerates method x distance x k combinations, evaluates each with the chosen metric, and returns the best result.

Parameters

df DataFrame with columns id_col and target_col.

Methods
Clustering methods to try. Default ``["kmedoids", "spectral"]``.

distances Distance metrics to try. Default ["sbd", "dtw"]. k_range Range of k values for methods that accept k. Default range(2, 6). metric Evaluation metric: "silhouette" (higher=better), "davies_bouldin" (lower=better), or "calinski_harabasz" (higher=better). id_col Column identifying each time series. target_col Column with the time series values. seed Random seed for clustering methods. hdbscan_kwargs Extra keyword arguments for HDBSCAN (e.g. min_cluster_size). dbscan_kwargs Extra keyword arguments for DBSCAN (e.g. eps, min_samples).

Returns

AutoClusterResult Structured result with best labels, metadata, and results table.

Raises

ValueError If metric is unknown or no valid combinations are found.