TransWikia.com

Extract or subset hundreds of columns from a data frame

Data Science Asked on March 6, 2021

I need to extract many columns from a dataset.
I have a very large csv file with thousands of columns and rows, and I read it into R using:

mydata <- read.csv(file = "file.csv",header = TRUE,sep = ",",row.names = 1)

Each column is a gene name. I know how to extract specific columns from my R data.frame by using the basic code like this:

dataset[ , "GeneName1", "GeneName2"]

But my question is, how do I pull hundreds of gene names? Too many to type in? They are listed in a txt file.
I’m new, so please go easy on jargon and abbreviations.

One Answer

Hopefully I've understood your question correctly.

Assuming your text file looks like this?

GeneName1
GeneName2

You can read that in using the readLines() function:

cols <- readLines("name_of_text_file")

Which returns cols as a vector of those names:

> cols
[1] "GeneName1" "GeneName2"

Which can then be used to subset the data frame as per your example:

mydata[ , cols]

Answered by Jerb on March 6, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP