Is it possible to create a predictive model for a dataset that consists of only positive occurrences of the dependent variable?

Question

Lets say I want to predict earthquakes.
My dataset would only contain data about earthquake occurrences and no data about non-earthquake occurrences as that would basically be any other period of time which is not kept in the dataset.
In that case, I assume a decision tree or logistic regression would not work as we don't have a dichotomous dependent variable (as the occurrence only gets into the dataset if an earthquake occurred).
Are there any models which would suit this situation, or is a different approach needed?

Anoop A Nair · Answer

My dataset would only contain data about earthquake occurrences and no
data about non-earthquake occurrences as that would basically be any
other period of time which is not kept in the dataset.

It would be nice if you could specify the features used for the prediction.

In that case, I assume a decision tree or logistic regression would
not work as we don't have a dichotomous dependent variable (as the
occurrence only gets into the dataset if an earthquake occurred).

Here you are right!. So it's better to resort to anomaly detection algorithms for the same. Here are some examples

Unsupervised anomaly detection
Outlier detection
Anomaly detection in python

andins · Answer

Given the example of earthquakes I assume that your aim is to predict a dichotomous variable but you actually only observe (or record) samples with one of the two labels.
In this case is tough to make predictions. The best would be to actually get those samples with the other label. If you really can't get those samples then Anoop A Nair answer points in the right direction: unsupervised methods.
Basically you learn the distribution of your samples and flag as "novelty" any new sample that has low probability under the learned distribution.
Just to follow on the example in the case of earthquakes it would be much better to learn the distribution of normal events than that of earthquakes since the latter are much rarer.

Erwan · Answer

Is it possible to create a predictive model for a dataset that consists of only positive occurrences of the dependent variable?

One-class classification is a type of classification algorithm which does exactly that.
In one-class classification the principle is to discover the patterns which characterize the instances of the class, assuming that everything which doesn't follow these patterns doesn't belong to the class. The model is trained using only examples from the class, and when applied the model predicts a probability that the input instance belongs to the class. By putting a threshold on the probability the model can be used as a binary classifier.
In terms of methods/implementations, I know one-class SVM but there are probably other methods as well.

Is it possible to create a predictive model for a dataset that consists of only positive occurrences of the dependent variable?

3 Answers

Add your own answers!

Ask a Question