Data Science Asked by puneet on September 5, 2021
Let’s say Morpheus has multiple users to offer colored pills(from an infinite set of colored pills), there are in total 3 unique colored pills(red, blue, green) Morpheus can offer. The trick is, Morpheus can offer only one pill to a user and the user has a choice to either take the pill or deny it. (Also, user’s decisions are independent of each other)
Now Morpheus wants to be smart about his offer and wants to model the user such that the user selects the pill he is offering.
The users are moody and there is some uncertainty that they would randomly take a choice.
Rejection can be because of multiple unknown reasons such as I didn't like the color of the pill
, I will choose the pill later
, I want to understand more about this pill
, Show me other pills before I decide
Now there are two ways I can think of modeling this:
When I treat this as binary classification, I pass pill color
as feature with other user features
to the model, and my output is the probability of user taking or rejecting a pill given the pill color.
Morpheus can then offer the pill color with the highest probability. This will use both Accept
and Reject
decisions of a user while modeling, but there is some uncertainty and the same type of users can accept or reject randomly.
When I treat this as a multi-class classification, where I try to predict the pill color itself. I would not use the rejected case in my training and would only consider cases when the user chose something.
In this way, I can reduce uncertainty in this case but would have to completely ignore rejected cases. Morpheus then can either use softmax or sigmoid for each class and take argmax to get the best choice to offer.
I am not sure if there are other ways to model this problem, but out of these two which can be a better way?
This is a textbook multi-armed bandit problem where Morpheus needs to learn the correct policy about offering pills. As you’ve said the Neos are independent, and making the assumption that there is a best pill overall, we need an algorithm that will experiment with each of the pills to find out which one is most likely to be accepted. This is the same as having three one armed bandit slot machines and trying to find out which one will pay out most frequently.
In the case where the Neos are observable (so that we have some information about each Neo and can predict what pill they would like based on their characteristics) this becomes a contextual bandit problem. This is the basic form of reinforcement learning problems
In a contextual bandit problem, you need to balance exploration (trying out offering different pills to different Neos to find out what they like) with exploitation (choosing what seems to be the best pill based on what we saw so far). This is why straight-up supervised multinomial classification approaches (as in e.g. Benji Albert’s answer) will struggle to converge: they don’t explore the “action space” (i.e. try out a bunch of responses) sufficiently in order to generate a variety of training examples for themselves.
Correct answer by Nicholas James Bailey on September 5, 2021
In your specific case our test "Neos" may not take a pill at all since Morpheus only offers one pill of a specific color.
We would have to either amend our multi-class model to include "No color / Rejection" or the binary model would work much better.
From a practical stand-point I would use a multi-class model here for one simple reason:
It is the only one that straigh-up solves Morpheus use case!
If we design a binary model with pill color
as a predictor we would have to then run the model three times (with the persons data and each pill color), compare the prediction of acceptance and choose the best outcome whereas a multi-class model simply tells us the pill color with the highest acceptance likelihood.
Now from a theoretical stand-point we also have to consider that the classes are presented independently of each other unlike in the movie. One color only is presented and users do not have knowledge of other colors so no relative preference is build. The context of the decision therefore is a bit closer to the binary model.
However in the end as in all prediction cases, performance wins. So I'd simply build both models and compare performance.
Answered by Fnguyen on September 5, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP