Stack Overflow Asked by naanan_ on January 5, 2022
I was trying to sum numbers whose time lag is 1. i.e. I would like to summarize the rows by adding the frequencies of values where the days differ only by a single day within a particular group. I used the lag function to get the diff, but not sure how to proceed from here.
df <- df %>%
group_by(group) %>%
mutate(diff = dt - lag(dt))
df[!is.na(df$diff) & df$diff > 1,]$diff <- NA
For ex:
group dt freq diff
groupA 2016-03-21 1 NA
groupA 2016-03-22 1 1
groupA 2016-03-23 1 1
groupA 2016-03-26 2 NA
groupA 2016-03-28 1 NA
groupA 2016-03-29 3 1
groupA 2016-03-30 3 1
groupA 2016-03-31 5 1
groupB 2016-04-01 1 NA
groupB 2016-04-02 2 1
I need to group this into:
group dt freq diff duration
groupA 2016-03-21 1 NA 3 (1 + 1 + 1)
groupA 2016-03-22 1 1
groupA 2016-03-23 1 1
groupA 2016-03-26 2 NA 2
groupA 2016-03-28 1 NA 12(1 + 3 + 3 + 5)
groupA 2016-03-29 3 1
groupA 2016-03-30 3 1
groupA 2016-03-31 5 1
groupB 2016-04-01 1 NA 3(1 + 2)
groupB 2016-04-02 2 1
Also referred to this, but cumulative does not work as I do not consider jumps more than a single day apart. Is looping in a custom function the only way?
You can do it much easier with this approach (grouping rows with less.than
1 day difference); this will create a helper column gap
which later will be used to sum the freq
for consecutive days in the same group:
library(dplyr)
df %>%
mutate(gap = cumsum(!c(TRUE, diff(as.Date(df$dt)) == 1))) %>%
group_by(gap, group) %>%
mutate(duration = sum(freq, na.rm=TRUE)) %>%
ungroup %>% select(-gap) %>% as.data.frame
# group dt freq duration
# 1 groupA 2016-03-21 1 3
# 2 groupA 2016-03-22 1 3
# 3 groupA 2016-03-23 1 3
# 4 groupA 2016-03-26 2 2
# 5 groupA 2016-03-28 1 12
# 6 groupA 2016-03-29 3 12
# 7 groupA 2016-03-30 3 12
# 8 groupA 2016-03-31 5 12
# 9 groupB 2016-04-01 1 3
# 10 groupB 2016-04-02 2 3
Answered by M-- on January 5, 2022
Here is a tidyverse
solution using dplyr::lead
:
library(tidyverse);
df %>%
mutate(dt = as.POSIXct(dt)) %>%
group_by(group) %>%
mutate(
diff = pmin(c(1, diff(dt)), c(1, diff(lead(dt))), na.rm = T),
id = cumsum(c(TRUE, diff(diff) != 0) | diff > 1)) %>%
group_by(group, id) %>%
mutate(duration = sum(freq)) %>%
ungroup() %>%
select(-diff, -id)
## A tibble: 10 x 4
# group dt freq duration
# <fct> <dttm> <int> <int>
# 1 groupA 2016-03-21 00:00:00 1 3
# 2 groupA 2016-03-22 00:00:00 1 3
# 3 groupA 2016-03-23 00:00:00 1 3
# 4 groupA 2016-03-26 00:00:00 2 2
# 5 groupA 2016-03-28 00:00:00 1 12
# 6 groupA 2016-03-29 00:00:00 3 12
# 7 groupA 2016-03-30 00:00:00 3 12
# 8 groupA 2016-03-31 00:00:00 5 12
# 9 groupB 2016-04-01 00:00:00 1 3
#10 groupB 2016-04-02 00:00:00 2 3
Explanation: diff
chooses the minimum difference between the preceding and following date. We then look for changes in diff
, and create a new grouping vector id
by which we calculate the summary metric sum(freq)
.
df <- read.table(text =
" group dt freq diff
groupA 2016-03-21 1 NA
groupA 2016-03-22 1 1
groupA 2016-03-23 1 1
groupA 2016-03-26 2 NA
groupA 2016-03-28 1 NA
groupA 2016-03-29 3 1
groupA 2016-03-30 3 1
groupA 2016-03-31 5 1
groupB 2016-04-01 1 NA
groupB 2016-04-02 2 1 ", header = T)
For your second example:
# Sample data
df <- read.table(text =
" group dt freq diff
groupA 2016-03-21 1 NA
groupA 2016-03-22 1 1
groupA 2016-03-23 1 1
groupA 2016-03-26 2 NA
groupA 2016-03-28 1 NA
groupA 2016-04-01 3 1
groupA 2016-04-02 3 1
groupA 2016-04-03 5 1
groupB 2016-04-01 1 NA
groupB 2016-04-02 2 1 ", header = T)
df %>%
mutate(dt = as.POSIXct(dt)) %>%
group_by(group) %>%
mutate(
diff = pmin(c(1, diff(dt)), c(1, diff(lead(dt))), na.rm = T),
id = cumsum(c(TRUE, diff(diff) != 0) | diff > 1)) %>%
group_by(group, id) %>%
mutate(duration = sum(freq)) %>%
ungroup() %>%
select(-diff, -id);
## A tibble: 10 x 4
# group dt freq duration
# <fct> <dttm> <int> <int>
# 1 groupA 2016-03-21 00:00:00 1 3
# 2 groupA 2016-03-22 00:00:00 1 3
# 3 groupA 2016-03-23 00:00:00 1 3
# 4 groupA 2016-03-26 00:00:00 2 2
# 5 groupA 2016-03-28 00:00:00 1 1
# 6 groupA 2016-04-01 00:00:00 3 11
# 7 groupA 2016-04-02 00:00:00 3 11
# 8 groupA 2016-04-03 00:00:00 5 11
# 9 groupB 2016-04-01 00:00:00 1 3
#10 groupB 2016-04-02 00:00:00 2 3
Answered by Maurits Evers on January 5, 2022
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP