Setdiff within mutate function

Question

I have a data frame with three columns. Each row contains three unique numbers between 1 and 5 (inclusive).
df <- data.frame(a=c(1,4,2),
                 b=c(5,3,1),
                 c=c(3,1,5))

I want to use mutate to create two additional columns that, for each row, contain the two numbers between 1 and 5 that do not appear in the initial three columns in ascending order. The desired data frame in the example would be:
df2 <- data.frame(a=c(1,4,2),
                  b=c(5,3,1),
                  c=c(3,1,5),
                  d=c(2,2,3),
                  e=c(4,5,4))

I tried to use the below mutate function utilizing setdiff to accomplish this, but returned NAs rather than the values I was looking for:
df <- df %>% mutate(d=setdiff(c(a,b,c),c(1:5))[1],
                    e=setdiff(c(a,b,c),c(1:5))[2])

I can get around this by looping through each row (or using an apply function) but would prefer a mutate approach if possible.
Thank you for your help!

r2evans · Answer

Base R:
cbind(df, t(apply(df, 1, setdiff, x = 1:5)))
#   a b c 1 2
# 1 1 5 3 2 4
# 2 4 3 1 2 5
# 3 2 1 5 3 4

Warning: if there are any non-numerical columns, apply will happily up-convert things (converting to a matrix internally).

akrun · Answer

We can use pmap to loop over the rows, create a list column and then unnest it to create two new columns library(dplyr) librayr(purrr) library(tidyr) df %>% mutate(out = pmap(., ~ setdiff(1:5, c(...)) %>% as.list%>% set_names(c('d', 'e')))) %%>% unnest_wider(c(out)) # A tibble: 3 x 5 # a b c d e # #1 1 5 3 2 4 #2 4 3 1 2 5 #3 2 1 5 3 4 Or using base R df[c('d', 'e')] <- do.call(rbind, lapply(asplit(df, 1), function(x) setdiff(1:5, x)))

Setdiff within mutate function

2 Answers

Add your own answers!

Ask a Question