Skip to content

Shapelets

polars_ts.clustering.shapelets

U-Shapelet (unsupervised shapelet) clustering for time series.

Discovers discriminative subsequences (shapelets) that separate groups of time series, then clusters in shapelet-distance space.

References

  • Zakaria, J. et al. (2012). Clustering Time Series Using Unsupervised-Shapelets. ICDM.

UShapeletClusterer

Unsupervised shapelet-based time series clustering.

Discovers discriminative subsequences (shapelets) and clusters series by their shapelet distances.

Parameters

n_clusters Number of clusters. n_shapelets Number of shapelets to select. shapelet_lengths Candidate shapelet lengths to consider. n_candidates Number of random shapelet candidates to evaluate. target_col Column with the values to cluster. id_col Column identifying each time series. time_col Column with timestamps for ordering. seed Random seed for reproducibility. max_iter Maximum k-means iterations.

fit(df)

Discover shapelets and cluster time series.

Parameters

df Input DataFrame with time series data.

Returns

Self

_extract_series(df, target_col, id_col, time_col)

Extract series as a zero-padded 2-D array (n_series, max_len).

_subsequence_distance(shapelet, series)

Minimum sliding-window Euclidean distance between shapelet and series.

_extract_candidates(X, shapelet_lengths, n_candidates, rng)

Extract random shapelet candidates from the dataset.

_score_shapelet(shapelet, X)

Score a shapelet candidate using the gap statistic.

Computes the distances from the shapelet to all series, then finds the split point that maximizes the gap between successive sorted distances. A larger gap means the shapelet better separates series into two groups.

_kmeans_1d(distances, k, rng, max_iter=100)

Run k-means on a distance-feature matrix.

shapelet_cluster(df, k=3, n_shapelets=10, shapelet_lengths=None, n_candidates=100, target_col='y', id_col='unique_id', time_col='ds', seed=42, max_iter=100)

Discover U-Shapelets and cluster time series.

Convenience function wrapping :class:UShapeletClusterer.

Parameters

df Input DataFrame with time series data. k Number of clusters. n_shapelets Number of shapelets to select. shapelet_lengths Candidate shapelet lengths to consider. n_candidates Number of random shapelet candidates to evaluate. target_col Column with the values to transform. id_col Column identifying each time series. time_col Column with timestamps for ordering. seed Random seed for reproducibility. max_iter Maximum k-means iterations.

Returns

pl.DataFrame DataFrame with columns [id_col, "cluster"].