Data Science Asked by Delforge on February 4, 2021
Context
I am involved in a task of clustering 1500 time series of 500 observations into a few number of clusters. The time series share all the same observed property at different spatial locations, but responding to the same exogenous variables. However, for each time series, the magnitude of the response is very different. For a time series of reference $X$, I would like to be grouped in the same cluster series that are alike $X^a$ for all $a > 0$.
Tryouts
So far, my interpretation of the problem is that I want to cluster time series sharing a strong monotonic relationship. My first tryouts used hierarchical agglomerative clustering by defining a distance based Kendall’s tau rank coefficient since it measures the strength of a monotonic relationship. By visual interpretation, best results were obtained using Ward’s linkage method. However, this approach seems unorthodox, non-robust, or doubtful for several reasons.
First, Scipy documentation mentions here that Ward’s method is only correct when the Euclidean distance is used. Secondly, I couldn’t find any detailed application of time series clustering based either on Spearman or Kendall’s tau coefficient. Furthermore, I was very surprised that I couldn’t find any paper or reference aiming at clustering based on a monotonic criterion.
I am willing to consider other approaches, though I cannot measure their benefits. For instance, rescaling all time-series to map them to a standardized gaussian distribution (e.g. Box-Cox) and then using the Euclidean distance. Another possibility is to turn the first difference of time series into a boolean vector (1 if $Delta X >0$, $0$ otherwise) and then use the Euclidean distance or another distance metric.
Questions
Since I am new to time series clustering, I have some troubles to picture by myself what would be the best approach(es) (or the worse) for this specific purpose. Hence, I have two related questions:
Some references on the topic are also welcome.
The way Ward's linkage is computed really only makes sense with squared Euclidean type of measures. Only then the Konig-Huygens theorem can be used.
Why don't you consider average linkage? Why Ward?
Answered by Has QUIT--Anony-Mousse on February 4, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP