Skip to content

Evaluation

polars_ts.clustering.evaluation

Clustering evaluation metrics for time series using precomputed distances.

_build_dist_matrix(df, labels, method, id_col, target_col, **distance_kwargs)

Compute distance matrix and cluster assignments.

Returns (ids, id_to_cluster, dist_dict).

_group_by_cluster(id_to_cluster)

Group series IDs by their cluster assignment.

_find_medoids(cluster_members, dist)

Find the medoid (member with min total distance) for each cluster.

silhouette_score(df, labels, method='dtw', id_col='unique_id', target_col='y', **distance_kwargs)

Compute the mean silhouette score for a clustering result.

The silhouette score for each sample measures how similar it is to its own cluster compared to the nearest other cluster. Values range from -1 to 1, where higher is better.

Parameters

df DataFrame with columns id_col and target_col. labels DataFrame with columns id_col and "cluster" (e.g. from kmedoids). method Distance metric name (e.g. "dtw", "erp"). id_col Column identifying each time series. target_col Column with the time series values. **distance_kwargs Extra keyword arguments forwarded to the distance function.

Returns

float Mean silhouette score across all samples. Returns 0.0 if there is only one cluster or one sample.

silhouette_samples(df, labels, method='dtw', id_col='unique_id', target_col='y', **distance_kwargs)

Compute the silhouette score for each individual sample.

Parameters

df DataFrame with columns id_col and target_col. labels DataFrame with columns id_col and "cluster". method Distance metric name. id_col Column identifying each time series. target_col Column with the time series values. **distance_kwargs Extra keyword arguments forwarded to the distance function.

Returns

pl.DataFrame DataFrame with columns id_col, "cluster", and "silhouette".

davies_bouldin_score(df, labels, method='dtw', id_col='unique_id', target_col='y', **distance_kwargs)

Compute the Davies-Bouldin index for a clustering result.

Lower values indicate better clustering. The index measures the average similarity between each cluster and its most similar cluster, where similarity is the ratio of within-cluster distances to between-cluster distances. Uses medoids instead of centroids, which is standard for non-Euclidean distance metrics.

Parameters

df DataFrame with columns id_col and target_col. labels DataFrame with columns id_col and "cluster". method Distance metric name. id_col Column identifying each time series. target_col Column with the time series values. **distance_kwargs Extra keyword arguments forwarded to the distance function.

Returns

float Davies-Bouldin index. Returns 0.0 if there is only one cluster.

calinski_harabasz_score(df, labels, method='dtw', id_col='unique_id', target_col='y', **distance_kwargs)

Compute the Calinski-Harabasz index for a clustering result.

Higher values indicate better-defined clusters. The index is the ratio of between-cluster dispersion to within-cluster dispersion, adjusted for the number of clusters and samples.

.. note:: This is a medoid-based adaptation of the standard Calinski-Harabasz index (which assumes Euclidean centroids). Results are meaningful for comparing clusterings under the same metric but may not be directly comparable to Euclidean implementations.

Parameters

df DataFrame with columns id_col and target_col. labels DataFrame with columns id_col and "cluster". method Distance metric name. id_col Column identifying each time series. target_col Column with the time series values. **distance_kwargs Extra keyword arguments forwarded to the distance function.

Returns

float Calinski-Harabasz index. Returns 0.0 if there is only one cluster or fewer than k + 1 samples.