Auto
polars_ts.clustering.auto
Automated clustering pipeline selection for time series.
Performs grid search over method x distance x k combinations and returns the best result according to the chosen evaluation metric.
AutoClusterResult
dataclass
Result of an auto_cluster search.
Attributes
best_labels
DataFrame with [id_col, "cluster"] for the best combination.
best_method
Name of the winning clustering method.
best_distance
Name of the winning distance metric.
best_k
Winning k (None for density-based methods).
best_score
Evaluation score of the winning combination.
results_table
DataFrame summarising every evaluated combination with columns
["method", "distance", "k", "score"].
_run_clustering(df, method, distance, k, id_col, target_col, seed, hdbscan_kwargs, dbscan_kwargs)
Run a single clustering method and return labels or None on failure.
_evaluate(df, labels, distance, metric, id_col, target_col)
Evaluate clustering quality. Returns None if evaluation fails.
auto_cluster(df, methods=None, distances=None, k_range=None, metric='silhouette', id_col='unique_id', target_col='y', seed=42, hdbscan_kwargs=None, dbscan_kwargs=None)
Automated clustering pipeline selection via grid search.
Enumerates method x distance x k combinations, evaluates each with the chosen metric, and returns the best result.
Parameters
df
DataFrame with columns id_col and target_col.
Methods
Clustering methods to try. Default ``["kmedoids", "spectral"]``.
distances
Distance metrics to try. Default ["sbd", "dtw"].
k_range
Range of k values for methods that accept k. Default range(2, 6).
metric
Evaluation metric: "silhouette" (higher=better),
"davies_bouldin" (lower=better), or "calinski_harabasz"
(higher=better).
id_col
Column identifying each time series.
target_col
Column with the time series values.
seed
Random seed for clustering methods.
hdbscan_kwargs
Extra keyword arguments for HDBSCAN (e.g. min_cluster_size).
dbscan_kwargs
Extra keyword arguments for DBSCAN (e.g. eps, min_samples).
Returns
AutoClusterResult Structured result with best labels, metadata, and results table.
Raises
ValueError
If metric is unknown or no valid combinations are found.