TransWikia.com

How to predict unknown unknowns in machine learning

Data Science Asked on May 5, 2021

I am dealing with a problem about classifying bird species through analysing MFCCs. I already built a dataset with 13 MFCCs for two kinds of birds. And I trained the data with Naive Bayes & KNN model. However, when I tried the model with prediction of third bird species, it is classified as the one of the two species. I am wondering how can I achieve to predict unknown species as unknowns? And I know my existing classification model may not work. So, what kind of model might be helpful? Does SSL useful in my case? Or treat these unknowns as outliers? But how can that be applied in MFCC?

2 Answers

If you want to predict wether a bird is a bird of one of your two classes or unknown you need three classes: $[bird A, bird B, unknown]$. For the unknown class you need data from birds which are neither $bird A$ nor $bird B$. You should make sure the number of rows for each of the three classes is roughly the same.

If you don't have data of birds which are neither $bird A$ nor $bird B$ you can use anomaly detection to detect wether a bird is $unknown$ before predicting if it is $class A$ or $class B$.

Correct answer by Tim von Känel on May 5, 2021

I don't have much experience with MFCCs but you could always take inspiration from image processing and build a siamese network like model that would give you a distance metric between your sample and that if your distinct classes.

you could then pick the highest distance metric and set a lower threshold to classify it as unknown if none of the existing classes give you a high enough similarity.

After which all you got to do is analyze the unknown sample and create a jew class.

Answered by tehem on May 5, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP