TransWikia.com

Is it possible to create a predictive model for a dataset that consists of only positive occurrences of the dependent variable?

Data Science Asked on August 18, 2021

Lets say I want to predict earthquakes.

My dataset would only contain data about earthquake occurrences and no data about non-earthquake occurrences as that would basically be any other period of time which is not kept in the dataset.

In that case, I assume a decision tree or logistic regression would not work as we don’t have a dichotomous dependent variable (as the occurrence only gets into the dataset if an earthquake occurred).

Are there any models which would suit this situation, or is a different approach needed?

3 Answers

My dataset would only contain data about earthquake occurrences and no data about non-earthquake occurrences as that would basically be any other period of time which is not kept in the dataset.

It would be nice if you could specify the features used for the prediction.

In that case, I assume a decision tree or logistic regression would not work as we don't have a dichotomous dependent variable (as the occurrence only gets into the dataset if an earthquake occurred).

Here you are right!. So it's better to resort to anomaly detection algorithms for the same. Here are some examples

  1. Unsupervised anomaly detection
  2. Outlier detection
  3. Anomaly detection in python

Answered by Anoop A Nair on August 18, 2021

Given the example of earthquakes I assume that your aim is to predict a dichotomous variable but you actually only observe (or record) samples with one of the two labels. In this case is tough to make predictions. The best would be to actually get those samples with the other label. If you really can't get those samples then Anoop A Nair answer points in the right direction: unsupervised methods. Basically you learn the distribution of your samples and flag as "novelty" any new sample that has low probability under the learned distribution. Just to follow on the example in the case of earthquakes it would be much better to learn the distribution of normal events than that of earthquakes since the latter are much rarer.

Answered by andins on August 18, 2021

Is it possible to create a predictive model for a dataset that consists of only positive occurrences of the dependent variable?

One-class classification is a type of classification algorithm which does exactly that.

In one-class classification the principle is to discover the patterns which characterize the instances of the class, assuming that everything which doesn't follow these patterns doesn't belong to the class. The model is trained using only examples from the class, and when applied the model predicts a probability that the input instance belongs to the class. By putting a threshold on the probability the model can be used as a binary classifier.

In terms of methods/implementations, I know one-class SVM but there are probably other methods as well.

Answered by Erwan on August 18, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP