TransWikia.com

Clustering vs. Classification

Data Science Asked on March 20, 2021

I am a bit new to this, but I just had a quick question about clustering vs. classification. I have a bunch of texts that I want to classify. There are 4 classes I have at the moment, but texts can belong to more than one class. What I have seen so far is to do 4 binary classifications, but I was wondering if there is a classification algorithm where I can achieve texts belonging to more than one class. Or could I do this with clustering having overlapping clusters?

I am trying to do this in Python.

2 Answers

Clustering is unsupervised which means that you do not know the classes and / or have no examples of correctly labeled texts.

Assuming you have some labeled texts then we are talking about a classification problem.

Your current approach of doing separate binary models for each label is very basic but still sound. What is your validation metric and score for each of the models? If the performance is already good then you might not need something else.

Another approach would be to train one model that predicts all classes and then to output the multiclass probability prediction. This would give you information about the best fitting classes but you would have to fit your loss metric accordingly, accuracy wouldn't make sense here for example.

Another approach would be to make a new factor variable which encodes all possible combinations of classes and train one model on this. However I suspect that the model would perform badly due to imbalanced cases and a high complexcity.

Answered by Fnguyen on March 20, 2021

The problem you are working on is Supervised Learning as you already know the labels for each samples. If you try to Cluster the dataset which is Unsupervised Learning, you may not expect the texts to be clustered in a way you want it to be. Because the clustering is done based on the pattern (similarity) rather than the output labels.

In order to solve the above problem you can go for Multi-Label Classification where each samples can have more than 1 classes.

Answered by deepguy on March 20, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP