How to infer which sequence of events are more likely to result in an event of interest?

Data Science Asked on August 4, 2020


I have a sequence/series of events. Some sequences will end up with an event of interest $y$, while others won’t.

$$S_1 = [a, b, c, d, …, y]$$

$$S_2 = [a, b, b, e, …, a]$$


$$S_n = [a, a, f, m, …, y]$$

Sequences can be of varying lengths; all sequences are independent of one another; events are linearly spaced within a sequence. Within a sequence, memory of previous events is important (i.e., it’s not just the previous event that is important), and the order of events is important, too.

My use-case is e-commerce (i.e., website navigation/browsing, where my event of interest is a transaction being made at the end of the customer journey, $y$). I guess this could generalise to many fields: words in a sentence, political events, component failure, personal development, etc.

I think I’m looking for an Association Rules Mining-type approach, but where the order in which items are added to the basket is important. If that makes sense.

Solution 1

Is there some method whereby I can find which sequence of consecutive events are more likely to end up with $y$ somewhere down the line? The event of interest, $y$ doesn’t have to come immediately after the chain of events. For example, if $[a, b]$ is important, then I don’t mind if the sequence is $[…, a, b, y]$ or $[a, b, …, y]$, etc.

Solution 2

To make this even better, is there a method where the important sequence of events don’t even have to be ‘touching’ each other? For example, the $[a, b, y]$ sequence might be equivalent to $[a, …, b, y]$ or $[a, …, b, …, y]$, etc.

Add your own answers!

Ask a Question

Get help from others!

© 2024 All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP