TransWikia.com

Clustering time series based on monotonic similarity

Data Science Asked by Delforge on February 4, 2021

Context

I am involved in a task of clustering 1500 time series of 500 observations into a few number of clusters. The time series share all the same observed property at different spatial locations, but responding to the same exogenous variables. However, for each time series, the magnitude of the response is very different. For a time series of reference $X$, I would like to be grouped in the same cluster series that are alike $X^a$ for all $a > 0$.

Tryouts

So far, my interpretation of the problem is that I want to cluster time series sharing a strong monotonic relationship. My first tryouts used hierarchical agglomerative clustering by defining a distance based Kendall’s tau rank coefficient since it measures the strength of a monotonic relationship. By visual interpretation, best results were obtained using Ward’s linkage method. However, this approach seems unorthodox, non-robust, or doubtful for several reasons.

First, Scipy documentation mentions here that Ward’s method is only correct when the Euclidean distance is used. Secondly, I couldn’t find any detailed application of time series clustering based either on Spearman or Kendall’s tau coefficient. Furthermore, I was very surprised that I couldn’t find any paper or reference aiming at clustering based on a monotonic criterion.

I am willing to consider other approaches, though I cannot measure their benefits. For instance, rescaling all time-series to map them to a standardized gaussian distribution (e.g. Box-Cox) and then using the Euclidean distance. Another possibility is to turn the first difference of time series into a boolean vector (1 if $Delta X >0$, $0$ otherwise) and then use the Euclidean distance or another distance metric.

Questions

Since I am new to time series clustering, I have some troubles to picture by myself what would be the best approach(es) (or the worse) for this specific purpose. Hence, I have two related questions:

  1. Specifically, is using Hierarchical Clustering based on Kendall’s tau and Ward’s linkage method a wrong way to go and why?
  2. Generally, what is the best way to cluster time series based on monotonic association?

Some references on the topic are also welcome.

One Answer

The way Ward's linkage is computed really only makes sense with squared Euclidean type of measures. Only then the Konig-Huygens theorem can be used.

Why don't you consider average linkage? Why Ward?

Answered by Has QUIT--Anony-Mousse on February 4, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP