Complete list continuously in an existing df

Question

I have this data frame: > df date val cday 2019-12-01 1 NA 2019-12-02 0 NA 2019-12-03 1 NA 2019-12-04 0 1 2019-12-05 0 NA 2019-12-06 0 NA 2019-12-07 1 1 2019-12-08 2 NA 2019-12-09 3 NA 2019-12-10 3 NA # … with 246 more rows I would like complete df$cday continuously from df$cday == 1 to a max of 30 for a following df$cday == 1 before 30 I want to start counting from 1 again all other NAs I would like to retain. The result should look like this: > df date val cday 2019-12-01 1 NA 2019-12-02 0 NA 2019-12-03 1 NA 2019-12-04 0 1 2019-12-05 0 2 2019-12-06 0 3 2019-12-07 1 1 2019-12-08 2 2 2019-12-09 3 3 2019-12-10 3 4 # … with 246 more rows There is probably an easy solution to this but I couldn't find anything searching. I would be very thankful for some hints!

arg0naut91 · Accepted Answer

One way would be: library(dplyr) df %>% group_by(idx = cumsum(!is.na(cday))) %>% mutate(cday = case_when(!all(is.na(cday)) ~ row_number())) %>% ungroup %>% select(-idx) Output (with the visible part of your example): # A tibble: 10 x 3 date val cday 1 2019-12-01 1 NA 2 2019-12-02 0 NA 3 2019-12-03 1 NA 4 2019-12-04 0 1 5 2019-12-05 0 2 6 2019-12-06 0 3 7 2019-12-07 1 1 8 2019-12-08 2 2 9 2019-12-09 3 3 10 2019-12-10 3 4 The above code assumes that all your non-missing cases currently are 1. If sequences can start with other integers as well, you could adjust with: df %>% group_by(idx = cumsum(!is.na(cday))) %>% mutate(cday = case_when(!all(is.na(cday)) ~ cday[1] + (row_number() - 1))) %>% ungroup %>% select(-idx)

akrun · Answer

We can use rowid from data.table
library(dplyr)
library(data.table)
df %>% 
  mutate(cday = replace(rowid(cumsum(replace_na(cday, 0))), 
        seq_len(which.max(!is.na(cday))-1), NA))
#        date val cday
#1  2019-12-01   1   NA
#2  2019-12-02   0   NA
#3  2019-12-03   1   NA
#4  2019-12-04   0    1
#5  2019-12-05   0    2
#6  2019-12-06   0    3
#7  2019-12-07   1    1
#8  2019-12-08   2    2
#9  2019-12-09   3    3
#10 2019-12-10   3    4

data
df <- structure(list(date = c("2019-12-01", "2019-12-02", "2019-12-03", 
"2019-12-04", "2019-12-05", "2019-12-06", "2019-12-07", "2019-12-08", 
"2019-12-09", "2019-12-10"), val = c(1L, 0L, 1L, 0L, 0L, 0L, 
1L, 2L, 3L, 3L), cday = c(NA, NA, NA, 1L, NA, NA, 1L, NA, NA, 
NA)), class = "data.frame", row.names = c(NA, -10L))

Complete list continuously in an existing df

2 Answers

data

Add your own answers!

Ask a Question