Data Science Asked by Denis Gontcharov on November 28, 2020
I work with two datasets. The first dataset contains fluor values measured every minute. The second dataset contains certain events and their time. We know that these events cause peaks in fluor values shortly before and shortly after the event time.
A simplified reproducible example in R:
Here I provide a simplified version of the R code I use to relate the fluor values to events. I have a series of fluor values measured every minute. Next I have a second dataset with three events A, B and C that occur at three different times. It seems there are three large peaks in fluor value around each event.
The objective is to classify each value as corresponding to an event whenever the time of that value falls within 15 minutes of that time of an event. Fluor values that fall outside of any events are categorized as NA.
Set seed
set.seed(10)
Generate fluor values dataset and events dataset
values <- data.table(date.time.value=seq(ymd_hms("2018-01-01 00:00:00"), by= "min", length.out = 200),fluor=5*sin(seq(0,20,length.out = 200)))+(abs(rnorm(200,6,1))+2)
events <- data.table(event=c("A","B","C"),date.time.event=c(ymd_hms("2018-01-01 00:20:00"),ymd_hms("2018-01-01 01:20:00"),ymd_hms("2018-01-01 02:20:00")))
Using the data.table package: I add an “event” variable to the values dataset. That variable takes on the name of an event (in this case A, B or C) whenever the date.time.value is within 15 minutes of a date.time.event.
values[, event:=events[.SD[, .(d_dn=date.time.value-15*60, d_up=date.time.value+15*60)], on=.(date.time.event>=d_dn, date.time.event<=d_up), event]]
Here’s what the result looks like. We see that we have one fluor value peak for each event A, B and C and that the other fluor values have NA as event:
ggplot(values, aes(date.time.value, fluor,col=event)) + geom_point()
A real example:
I managed to relate the two datasets by coloring the fluor values that occur several minutes before or several minutes after an event. This is what the result looks like:
What I’m missing:
I’m almost certain that there must be some kind of R-package that’s specifically built for my problem here but I just can’t figure out which one…
You seem to be looking for a clustering mechanism for your event sequence data based on a dissimilarity/ distance measure. I would recommend you to explore TraMineR package.
Answered by Mankind_008 on November 28, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP