Data Science Asked by Luc on February 5, 2021
The case is to model if the sequence of events influences the probability of binary target variable. We have for example five different events which occur in time (event: A,B,C,D,E). They can occur in order from 1 to 5. I would like to check if the order of their occurrence influences the target variable.
My first idea was to convert the time of occurrence into numbers from 1 to 5 and then for example use logistic regression.
Do You know any other practices?
Any whitepapers and ideas will be helpful.
If the order in which the events appears matters, consider using a recurrent neural network. The setup that you propose is invariant to event ordering, whereas in a RNN the events are fed in in sequential order.
Answered by liangjy on February 5, 2021
If you have a large enough sample size, you can indeed carry this out the way you propose.
For five events, you have 120 ($^5P_5$) possible permutations of the order of events. This allows you to run a logistic regression with 120 dummy independent variables, each of which corresponds to a permutation of your order of events. The F-test of this regression will function as a significance test to see if there is any difference in frequency of your outcome between different orderings of events.
This does require a large sample size, however. A good rule of thumb is at least 20 observations per independent variable in a General Linear Model, so if you have a few thousand samples, we'd expect this model to fit reasonably well.
This does assume you have a relatively small number of events. Five seems manageable, but as your number of events increases, you quickly run into problems as your number of independent variables grows factorially.
Answered by R Hill on February 5, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP