TransWikia.com

Clustering of multi-label data

Data Science Asked by Yogesch on July 28, 2021

The dataset consists of

1) a set of objects and

2) a set of labels, which are used to describe the objects.

For the moment, for simplicity sake, each label can be marked as either true or false (In a more complex setup, each label will have a value of 1-10).

But, not all the labels are actually applied to all the objects (in principle, all the labels can and should be applied across all the objects, but in practice, they just are not). Also, when a label isn’t applied to an object, one cannot simply assume that the label’s value for that particular is false. Therefore, the missing labels will be ignored in the model.

I need to cluster the objects based on their labels.

Any tips on how and what algorithms to use will be appreciated.

One Answer

It is possible to cluster the objects based on their labels by treating the labels as features. Typically, labels are treated as targets which would frame the problem a supervised machine learning problem.

Since labels are nominal valued, you will need to use an appropriate distance metric. Jaccard index is one option.

Answered by Brian Spiering on July 28, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP