Data Science Asked by Aman Savaria on May 14, 2021
I am trying to implement the research paper Combining Boosted Trees with Metafeature Engineering for Predictive Maintenance. The paper has a section called meta feature engineering where they have used hierarchical clustering to create features. The paper says:
The third method we used to analyze the outliers in the dataset is based on an
hierarchical Agglomerative Clustering algorithm [5].
Hierarchical Agglomerative Clustering starts with Z groups (Z being the
number of observations), each initially containing one object, and then at each
step it merges the two most similar groups until there is only one single group,
containing all data.
The rationale for this method is that the last observation that are merged
might still be significantly different from the group they are merged into. By
definition outliers are different cases and will typically not fit well into a cluster,
unless that cluster is comprised by other outliers itself. Yet again, since these
are not ordinary data points, we do not expect them to form large groups.
I am unable to understand the authors’ intuition behind doing this.
The problem I am trying to solve and the paper is related to is the IDA-2016 competition dataset. You can find more about the competition here
Overall the paper is not very clear so there are a few uncertainties, but the general approach is this:
So the clustering and the feature creation are only indirectly related:
Correct answer by Erwan on May 14, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP