TransWikia.com

Complete list continuously in an existing df

Stack Overflow Asked by Näms on November 16, 2020

I have this data frame:

> df
   date         val  cday
   <date>     <dbl> <dbl>
  2019-12-01     1     NA
  2019-12-02     0     NA
  2019-12-03     1     NA
  2019-12-04     0     1
  2019-12-05     0     NA
  2019-12-06     0     NA
  2019-12-07     1     1
  2019-12-08     2     NA
  2019-12-09     3     NA
  2019-12-10     3     NA
# … with 246 more rows

I would like complete df$cday continuously from df$cday == 1 to a max of 30 for a following df$cday == 1 before 30 I want to start counting from 1 again all other NAs I would like to retain.

The result should look like this:

> df
   date         val  cday
   <date>     <dbl> <dbl>
  2019-12-01     1     NA
  2019-12-02     0     NA
  2019-12-03     1     NA
  2019-12-04     0     1
  2019-12-05     0     2
  2019-12-06     0     3
  2019-12-07     1     1
  2019-12-08     2     2
  2019-12-09     3     3
  2019-12-10     3     4
# … with 246 more rows

There is probably an easy solution to this but I couldn’t find anything searching. I would be very thankful for some hints!

2 Answers

One way would be:

library(dplyr)

df %>%
  group_by(idx = cumsum(!is.na(cday))) %>%
  mutate(cday = case_when(!all(is.na(cday)) ~ row_number())) %>%
  ungroup %>% select(-idx)

Output (with the visible part of your example):

# A tibble: 10 x 3
   date         val  cday
   <fct>      <int> <int>
 1 2019-12-01     1    NA
 2 2019-12-02     0    NA
 3 2019-12-03     1    NA
 4 2019-12-04     0     1
 5 2019-12-05     0     2
 6 2019-12-06     0     3
 7 2019-12-07     1     1
 8 2019-12-08     2     2
 9 2019-12-09     3     3
10 2019-12-10     3     4

The above code assumes that all your non-missing cases currently are 1. If sequences can start with other integers as well, you could adjust with:

df %>%
  group_by(idx = cumsum(!is.na(cday))) %>%
  mutate(cday = case_when(!all(is.na(cday)) ~ cday[1] + (row_number() - 1))) %>%
  ungroup %>% select(-idx)

Correct answer by arg0naut91 on November 16, 2020

We can use rowid from data.table

library(dplyr)
library(data.table)
df %>% 
  mutate(cday = replace(rowid(cumsum(replace_na(cday, 0))), 
        seq_len(which.max(!is.na(cday))-1), NA))
#        date val cday
#1  2019-12-01   1   NA
#2  2019-12-02   0   NA
#3  2019-12-03   1   NA
#4  2019-12-04   0    1
#5  2019-12-05   0    2
#6  2019-12-06   0    3
#7  2019-12-07   1    1
#8  2019-12-08   2    2
#9  2019-12-09   3    3
#10 2019-12-10   3    4

data

df <- structure(list(date = c("2019-12-01", "2019-12-02", "2019-12-03", 
"2019-12-04", "2019-12-05", "2019-12-06", "2019-12-07", "2019-12-08", 
"2019-12-09", "2019-12-10"), val = c(1L, 0L, 1L, 0L, 0L, 0L, 
1L, 2L, 3L, 3L), cday = c(NA, NA, NA, 1L, NA, NA, 1L, NA, NA, 
NA)), class = "data.frame", row.names = c(NA, -10L))

Answered by akrun on November 16, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP