Stack Overflow Asked by ip2018 on February 21, 2021
I have a dataframe as:
example_df = data.frame(Genes=c("A", "B"),
Sequence = c("MAMAMAM", "ABABABAB"),
Domains = c("DOMAIN 90..122; /note=ABC transporter 1; DOMAIN 129..231; /note=ABC transporter 1",
"DOMAIN 10..12; /note=C2H2 ZNF"))
Ideally within a dplyr
pipe, I want to find all instances of the word ‘DOMAIN’ in ‘Domains’ column, break the string and add a new row below while keeping the information for all other columns. The desired output is
output_df = data.frame(Genes=c("A", "A", "B"),
Sequence = c("MAMAMAM", "MAMAMAM", "ABABABAB"),
Domains = c("DOMAIN 90..122; /note=ABC transporter 1;",
"DOMAIN 129..231; /note=ABC transporter 1",
"DOMAIN 10..12; /note=C2H2 ZNF"))
I have no idea how to tackle this problem. Any help will be much appreciated. Thanks
You can use separate_rows
to split rows on 'DOMAIN'
.
library(dplyr)
library(tidyr)
example_df %>%
separate_rows(Domains, sep = '(?=DOMAIN)') %>%
filter(Domains != '')
# Genes Sequence Domains
# <chr> <chr> <chr>
#1 A MAMAMAM "DOMAIN 90..122; /note=ABC transporter 1; "
#2 A MAMAMAM "DOMAIN 129..231; /note=ABC transporter 1"
#3 B ABABABAB "DOMAIN 10..12; /note=C2H2 ZNF"
Correct answer by Ronak Shah on February 21, 2021
We can do this in base R
with strsplit
lst1 <- strsplit(example_df$Domains, "\s+(?=DOMAIN)", perl = TRUE)
out <- transform(example_df[rep(seq_len(nrow(example_df)),
lengths(lst1)),], Domains = unlist(lst1))
row.names(out) <- NULL
out
# Genes Sequence Domains
#1 A MAMAMAM DOMAIN 90..122; /note=ABC transporter 1;
#2 A MAMAMAM DOMAIN 129..231; /note=ABC transporter 1
#3 B ABABABAB DOMAIN 10..12; /note=C2H2 ZNF
Or with separate_rows
by specifying the sep
as one or more spaces (\s+
) that precedes the 'DOMAIN' keyword
library(dplyr)
library(tidyr)
example_df %>%
separate_rows(Domains, sep = "\s+(?=DOMAIN)")
# A tibble: 3 x 3
# Genes Sequence Domains
# <chr> <chr> <chr>
#1 A MAMAMAM DOMAIN 90..122; /note=ABC transporter 1;
#2 A MAMAMAM DOMAIN 129..231; /note=ABC transporter 1
#3 B ABABABAB DOMAIN 10..12; /note=C2H2 ZNF
Answered by akrun on February 21, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP