How to model a binary classification problem in a evolving environment

Question

Suppose you have a problem with two classes YES or NO. While the YES class is fixed, in the sense that the observations do not evolve, the observations of class NO may evolve during time and may change to the class YES. I would like to predict the probability of a given observation change to YES.

What kind of ML models could be used here? Could Bayesian Networks be a solution?

I would also like to observe what changed in the variables (or which variables were most important) that turned a NO into a YES.

Thank you

Sandeep Bhutani · Answer

This is an interesting problem. It is analogous to an example of court case trial - say a person is not guilty until he is proved guilty (your NO label) and will stay guilty when proved (your YES label).
Now to guess the chances of a person to converted to guilty following techniques I would have applied:

Method 1. Cluster similar persons and find other people's probability. You can cluster similar cases by having similar features of the records which have already labeled as YES . You can use community detection to for unsupervised clustering here.

Method 2. Use markov chains. If you can define the possible steps (or stages) of observations which are usually followed before converting to YES. Then based on the chain being followed by that specific record the probability can be guessed

Romain Reboulleau · Answer

If you are trying to predict something that looks like probabilities, but don't actually need a "real" probability (for instance, if you aim at ranking observations to find those who are the most likely to switch to no), you could probably use logistic regression over the dataset. This would require the model to be refitted every once in a while.

If you are looking for actual probability, it has to be a probability to switch from NO to YES over a period of time. In that case, you are falling into time series analysis. This is a more complicated problem.

My intuition would be to consider the state change as a physical random phenomenon, such as a radioactive decay, where the probability of having decayed over a $Delta t$ time period is $e^{-lambda Delta t}$. Based on cases who have switched in the past, you could build a rgression model to predict a $lambda$ value for each observation. Of course you can use other decay functions than the exponential (find the best one through the validation step). This will eventually allow predicting a switch probability over any given time period.

How to model a binary classification problem in a evolving environment

2 Answers

Add your own answers!

Ask a Question