Stack Overflow Asked by Näms on November 16, 2020
I have this data frame:
> df
date val cday
<date> <dbl> <dbl>
2019-12-01 1 NA
2019-12-02 0 NA
2019-12-03 1 NA
2019-12-04 0 1
2019-12-05 0 NA
2019-12-06 0 NA
2019-12-07 1 1
2019-12-08 2 NA
2019-12-09 3 NA
2019-12-10 3 NA
# … with 246 more rows
I would like complete df$cday
continuously from df$cday == 1
to a max of 30 for a following df$cday == 1
before 30 I want to start counting from 1 again all other NAs
I would like to retain.
The result should look like this:
> df
date val cday
<date> <dbl> <dbl>
2019-12-01 1 NA
2019-12-02 0 NA
2019-12-03 1 NA
2019-12-04 0 1
2019-12-05 0 2
2019-12-06 0 3
2019-12-07 1 1
2019-12-08 2 2
2019-12-09 3 3
2019-12-10 3 4
# … with 246 more rows
There is probably an easy solution to this but I couldn’t find anything searching. I would be very thankful for some hints!
One way would be:
library(dplyr)
df %>%
group_by(idx = cumsum(!is.na(cday))) %>%
mutate(cday = case_when(!all(is.na(cday)) ~ row_number())) %>%
ungroup %>% select(-idx)
Output (with the visible part of your example):
# A tibble: 10 x 3
date val cday
<fct> <int> <int>
1 2019-12-01 1 NA
2 2019-12-02 0 NA
3 2019-12-03 1 NA
4 2019-12-04 0 1
5 2019-12-05 0 2
6 2019-12-06 0 3
7 2019-12-07 1 1
8 2019-12-08 2 2
9 2019-12-09 3 3
10 2019-12-10 3 4
The above code assumes that all your non-missing cases currently are 1. If sequences can start with other integers as well, you could adjust with:
df %>%
group_by(idx = cumsum(!is.na(cday))) %>%
mutate(cday = case_when(!all(is.na(cday)) ~ cday[1] + (row_number() - 1))) %>%
ungroup %>% select(-idx)
Correct answer by arg0naut91 on November 16, 2020
We can use rowid
from data.table
library(dplyr)
library(data.table)
df %>%
mutate(cday = replace(rowid(cumsum(replace_na(cday, 0))),
seq_len(which.max(!is.na(cday))-1), NA))
# date val cday
#1 2019-12-01 1 NA
#2 2019-12-02 0 NA
#3 2019-12-03 1 NA
#4 2019-12-04 0 1
#5 2019-12-05 0 2
#6 2019-12-06 0 3
#7 2019-12-07 1 1
#8 2019-12-08 2 2
#9 2019-12-09 3 3
#10 2019-12-10 3 4
df <- structure(list(date = c("2019-12-01", "2019-12-02", "2019-12-03",
"2019-12-04", "2019-12-05", "2019-12-06", "2019-12-07", "2019-12-08",
"2019-12-09", "2019-12-10"), val = c(1L, 0L, 1L, 0L, 0L, 0L,
1L, 2L, 3L, 3L), cday = c(NA, NA, NA, 1L, NA, NA, 1L, NA, NA,
NA)), class = "data.frame", row.names = c(NA, -10L))
Answered by akrun on November 16, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP