NLP - Retrieval-based model

Question

My goal is to predict the most appropriate answer from an utterance, in a group of 21 potential answers. (I'm not sure the "question" is called utterance though. )

Example:

Utterance: How are you today?
Answers: Answer1, 2, ..., 21.

I have a training file with this format:

Utterance:
Answers: Good answer, wrong answer1, wrong answer2,..., wrong answer20.

My problem

For the first time, we have to make a prediction from a group of possible answers, and, thus, this is a MCQ form.

Any ideas how I could start the problem?

What I've done

For the moment, the only thing I did was to choose the answers from the 21 possible answers which had the highest cosine similarity with the utterance. (So, unsupervised). It's not that bad (24% against 1/21 at random), but I'm sure there are ways to make something really better.

What I don't want to do at first

Use a generative model which predicts a full sentence. I want to choose the best candidate amongs the 21 answers, and use the training file which can allow us to do supervised learning.

yoav_aaa · Answer

Since answers change between different questions, your problem does not fit 'regular' classification problems.
And due to textual nature of your input/output, regression is not the best fit either.
This leaves me thinking K-NN is a good way utilizing supervised learning.

I don't have any good reference for this but this approach makes sense to me:

1) Embed both questions and answers into same space(using TF/IDF and PCA for example).
2) For a new(unseen) question find near neighboring labeled questions(from the training set), using K-NN.
3) Get neighboring questions answers.
4) Use K-NN(Or other distance based method) to find nearest neighbor in unseen question answer choices.

NLP - Retrieval-based model

One Answer

Add your own answers!

Ask a Question