When unsupervised learning is more beneficial in comparison with supervised learning even the labelings are existed?

Question

When unsupervised learning is more beneficial in comparison with supervised learning even the labeling are existed?
If there is no labeling the unsupervised learning is better than supervised learning but in some cases even the labeling targets are available, the supervised learning approach works betters? What about conditions of these cases? Can we say that if there is no clear dependency between variables unsupervised learning works better?

Erwan · Answer

When unsupervised learning is more beneficial in comparison with supervised learning even the labeling are existed?

I would say that there are two main cases:

The task is semantically more meaningful as an unsupervised task. For instance let's consider a collection of books which have been annotated with topics: if the goal is to classify new books into the same pre-existing categories, then it makes sense to use a supervised setting. However if the goal is to discover new patterns of similarity that might not be intuitively easy to notice for a human annotator, then unsupervised topic modeling makes more sense.
The annotations are available for some subset of data (e.g. for evaluation purposes) but will not be available later in production (to some extent it's also a case of what the goal is, but here for technical reasons).

If there is no labeling the unsupervised learning is better than supervised learning but in some cases even the labeling targets are available, the supervised learning approach works betters? What about conditions of these cases?

The problem here is what "better" means, i.e. how the task is evaluated. If the task is evaluated against the pre-existing labels, in theory the unsupervised version cannot work better than the supervised one since the supervised one has access to more information. It's possible that a particularly unsuitable supervised method would perform worse than a well chosen unsupervised one, but that's not a fair comparison and it's very unlikely in practice.

In general the two are not comparable because the tasks are fundamentally different: in a supervised setting one wants to find patterns related to some information which is known beforehand (the labels), whereas in an unsupervised setting one wants to discover unknown patterns.

Can we say that if there is no clear dependency between variables unsupervised learning works better?

I don't think so because:

Some supervised algorithms are very good at optimizing whatever little amount of information is available in the data. Even with "no clear dependency" visible according to standard measures, some algorithms can combine the features optimally to minimize errors.
In the case of strictly no dependency at all between the features and the label, the unsupervised method might perform better in the sense that it might find meaningful patterns, but these won't be related to the labels at all (since there's no dependency). So we go back to the question of what "better" means with respect to the task: if the task was to discover unknown patterns then sure unsupervised is "better", but this has nothing to do with the labels and a supervised method would make no sense here. If the task was about predicting the labels, the unsupervised approach is just as bad as the supervised one (and there's a serious flaw in the design of the task!).

When unsupervised learning is more beneficial in comparison with supervised learning even the labelings are existed?

One Answer

Add your own answers!

Ask a Question