2-label dataset for 3-label classifier?

Question

I have a dataset containing headlines and sentiment related to those headlines. The headlines have been filtered out from another bigger dataset using the following criteria: keep the ones that have a very negative or a very positive sentiment. At the end, I have a dataset with a very positve and a very negative sentiment headlines.
My goal is to create a deep learning classifier using tf and keras in order to classify new observations into three class: positive sentiment, negative sentiment and neutral sentiment. In other words, my goal is to use a binary labelled dataset to create a classifier that outputs a 3-label classification and I want to do it by predicting the probability of a headline to be positive or negative.
If the predicted probabilities of a headline are:
p(positive) = 80%
p(negative) = 20%

than the headline is positive. But if:
p(positive) = 50%
p(negative) = 50%

than the headline is neutral.
What do you think?

Oliver Foster · Answer

Yes this is a good strategy. The only thing you need to figure out is the threshold value for negative to neutral, and neutral to positive.
What you could do is decide on some precision value (say 95%) that you want to achieve when classifying a positive article as positive and a negative article as negative. Based on this desired precision you can calculate the threshold you would need to set by evaluating your trained model on a holdout set (if the positive threshold is too low maybe you won't be able to acheive 95% precision, or if it's too high it might be too restrictive and label most headlines as neutral).
The design decision for these thresholds depends on the "business problem" you are trying to solve or the ideal client/use of your tool.

2-label dataset for 3-label classifier?

One Answer

Add your own answers!

Ask a Question