Shapelets

`polars_ts.clustering.shapelets`

U-Shapelet (unsupervised shapelet) clustering for time series.

Discovers discriminative subsequences (shapelets) that separate groups of time series, then clusters in shapelet-distance space.

References

Zakaria, J. et al. (2012). Clustering Time Series Using Unsupervised-Shapelets. ICDM.

`UShapeletClusterer`

Unsupervised shapelet-based time series clustering.

Discovers discriminative subsequences (shapelets) and clusters series by their shapelet distances.

Parameters

n_clusters Number of clusters. n_shapelets Number of shapelets to select. shapelet_lengths Candidate shapelet lengths to consider. n_candidates Number of random shapelet candidates to evaluate. target_col Column with the values to cluster. id_col Column identifying each time series. time_col Column with timestamps for ordering. seed Random seed for reproducibility. max_iter Maximum k-means iterations.

`fit(df)`

Discover shapelets and cluster time series.

Parameters

df Input DataFrame with time series data.

Returns

Self

`_extract_series(df, target_col, id_col, time_col)`

Extract series as a zero-padded 2-D array (n_series, max_len).

`_subsequence_distance(shapelet, series)`

Minimum sliding-window Euclidean distance between shapelet and series.

`_extract_candidates(X, shapelet_lengths, n_candidates, rng)`

Extract random shapelet candidates from the dataset.

`_score_shapelet(shapelet, X)`

Score a shapelet candidate using the gap statistic.

Computes the distances from the shapelet to all series, then finds the split point that maximizes the gap between successive sorted distances. A larger gap means the shapelet better separates series into two groups.

`_kmeans_1d(distances, k, rng, max_iter=100)`

Run k-means on a distance-feature matrix.

`shapelet_cluster(df, k=3, n_shapelets=10, shapelet_lengths=None, n_candidates=100, target_col='y', id_col='unique_id', time_col='ds', seed=42, max_iter=100)`

Discover U-Shapelets and cluster time series.

Convenience function wrapping :class:UShapeletClusterer.

Parameters

df Input DataFrame with time series data. k Number of clusters. n_shapelets Number of shapelets to select. shapelet_lengths Candidate shapelet lengths to consider. n_candidates Number of random shapelet candidates to evaluate. target_col Column with the values to transform. id_col Column identifying each time series. time_col Column with timestamps for ordering. seed Random seed for reproducibility. max_iter Maximum k-means iterations.

Returns

pl.DataFrame DataFrame with columns [id_col, "cluster"].