TransWikia.com

full join of the several data frames of nested data frame

Stack Overflow Asked by Seyed Hosseini on February 9, 2021

I have a nested data frame (as below) created after using map function and I have not untested it yet. Considering the example below, let’s say I have 4 data frames nested (based on year) inside another data frame. In other words, the data frame that I have in my hand looks like df_nested. How do I full join the data frames of each year separately (using map again) and then unnest them into a final data set? I am trying to full join the data frames of the year 2010 (df1, df2) with each other and then full join the data frames of the year 2011 (df3, df4) and finally append these fully joint datasets.

df1 <- data.frame(year = c(2010,2010,2010,2010),id=c(1,2,3,4), name = c("A","B","C","D"))
df2 <- data.frame(year = c(2010,2010,2010,2010),id=c(1,2,3,4), age=c(21,22,25,29))
df3 <- data.frame(year = c(2011,2011,2011,2011),id=c(5,6,7,8), name = c("W","X","Y","Z"))
df4 <- data.frame(year = c(2011,2011,2011,2011),id=c(5,6,7,8), age=c(30,35,40,50))

df_netsed <- bind_rows(df1,df2,df3,df4) %>%
  group_by(year) %>%
  nest()

Here is what I expect to see:

df_expected <- full_join(df1, df2,by="id") %>% bind_rows(full_join(df3, df4,by="id"))

2 Answers

You can try to group_by id and drop NA values for each nested data.

library(tidyverse)

df_netsed %>%
  ungroup %>%
  mutate(data = map(data, 
                ~.x %>% group_by(id) %>% summarise(across(.fns = na.omit)))) %>%
  unnest(data)

#   year    id name    age
#  <dbl> <dbl> <chr> <dbl>
#1  2010     1 A        21
#2  2010     2 B        22
#3  2010     3 C        25
#4  2010     4 D        29
#5  2011     5 W        30
#6  2011     6 X        35
#7  2011     7 Y        40
#8  2011     8 Z        50

Answered by Ronak Shah on February 9, 2021

Update per OP comments
You don't actually need to nest, group_by should be enough:

library(dplyr)

bind_rows(df1, df2, df3, df4) %>%
  group_by(year, id) %>%
  summarise(across(everything(), na.omit))

# Groups:   year [2]
   year    id name    age
  <dbl> <dbl> <fct> <dbl>
1  2010     1 A        21
2  2010     2 B        22
3  2010     3 C        25
4  2010     4 D        29
5  2011     5 W        30
6  2011     6 X        35
7  2011     7 Y        40
8  2011     8 Z        50

Previous
With dplyr, you can join each pair of data frames first, then bind_rows:

library(dplyr)

inner_join(df1, df2) %>% bind_rows(inner_join(df3, df4))

  year id name age
1 2010  1    A  21
2 2010  2    B  22
3 2010  3    C  25
4 2010  4    D  29
5 2011  5    W  30
6 2011  6    X  35
7 2011  7    Y  40
8 2011  8    Z  50

Answered by andrew_reece on February 9, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP