Stack Overflow Asked by Seyed Hosseini on February 9, 2021
I have a nested data frame (as below) created after using map function and I have not untested it yet. Considering the example below, let’s say I have 4 data frames nested (based on year) inside another data frame. In other words, the data frame that I have in my hand looks like df_nested. How do I full join the data frames of each year separately (using map again) and then unnest them into a final data set? I am trying to full join the data frames of the year 2010 (df1, df2) with each other and then full join the data frames of the year 2011 (df3, df4) and finally append these fully joint datasets.
df1 <- data.frame(year = c(2010,2010,2010,2010),id=c(1,2,3,4), name = c("A","B","C","D"))
df2 <- data.frame(year = c(2010,2010,2010,2010),id=c(1,2,3,4), age=c(21,22,25,29))
df3 <- data.frame(year = c(2011,2011,2011,2011),id=c(5,6,7,8), name = c("W","X","Y","Z"))
df4 <- data.frame(year = c(2011,2011,2011,2011),id=c(5,6,7,8), age=c(30,35,40,50))
df_netsed <- bind_rows(df1,df2,df3,df4) %>%
group_by(year) %>%
nest()
Here is what I expect to see:
df_expected <- full_join(df1, df2,by="id") %>% bind_rows(full_join(df3, df4,by="id"))
You can try to group_by
id
and drop NA
values for each nested data.
library(tidyverse)
df_netsed %>%
ungroup %>%
mutate(data = map(data,
~.x %>% group_by(id) %>% summarise(across(.fns = na.omit)))) %>%
unnest(data)
# year id name age
# <dbl> <dbl> <chr> <dbl>
#1 2010 1 A 21
#2 2010 2 B 22
#3 2010 3 C 25
#4 2010 4 D 29
#5 2011 5 W 30
#6 2011 6 X 35
#7 2011 7 Y 40
#8 2011 8 Z 50
Answered by Ronak Shah on February 9, 2021
Update per OP comments
You don't actually need to nest, group_by
should be enough:
library(dplyr)
bind_rows(df1, df2, df3, df4) %>%
group_by(year, id) %>%
summarise(across(everything(), na.omit))
# Groups: year [2]
year id name age
<dbl> <dbl> <fct> <dbl>
1 2010 1 A 21
2 2010 2 B 22
3 2010 3 C 25
4 2010 4 D 29
5 2011 5 W 30
6 2011 6 X 35
7 2011 7 Y 40
8 2011 8 Z 50
Previous
With dplyr
, you can join each pair of data frames first, then bind_rows
:
library(dplyr)
inner_join(df1, df2) %>% bind_rows(inner_join(df3, df4))
year id name age
1 2010 1 A 21
2 2010 2 B 22
3 2010 3 C 25
4 2010 4 D 29
5 2011 5 W 30
6 2011 6 X 35
7 2011 7 Y 40
8 2011 8 Z 50
Answered by andrew_reece on February 9, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP