Bioinformatics Asked on April 3, 2021
I am stuck on how to do correlation for two independent data sets with common row and column names.
A and B are datasets that contain as many rows as genes and as many columns as samples.
The rows in A and B represent a common set of genes but measured in two different tissues.
The columns represent measurements in the same 5 samples in both A and B.
I want to do a correlation between the set of genes in A and B. This is to see if the same genes in both tissues are correlated or not.
Since the matrix would be big in my actual data, I only want to retain a correlation coefficient higher than 0.5.
Here I simulate the data set.
set.seed(1)
A <- data.frame(rnorm(100),
rnorm(100),
rnorm(100),
rnorm(100),
rnorm(100))
row.names(A) <- paste0("G_", 1:100)
colnames(A) <- paste0("M_", 1:5)
set.seed(42)
B <- data.frame(rnorm(100),
rnorm(100),
rnorm(100),
rnorm(100),
rnorm(100))
row.names(B) <- paste0("G_", 1:100)
colnames(B) <- paste0("I_", 1:5)
Thank you!
You can use mapply()
. ta
and tb
being transposed data frames of your A
and B
data frames respectively:
> mapply(cor, ta, tb)[mapply(cor, ta, tb) > 0.5]
G_3 G_5 G_9 G_10 G_11 G_15 G_20 G_23 G_25 G_26 G_33 G_40 G_43 G_48 G_57 G_60
0.5346591 0.8066507 0.8379777 0.6752681 0.7221359 0.5285787 0.7333045 0.5627962 0.6533379 0.7256878 0.5996492 0.6486557 0.5108215 0.7386332 0.6596823 0.6919915
G_63 G_72 G_76 G_80 G_81 G_90 G_97 G_98 G_99
0.5589583 0.8391917 0.7608801 0.8003665 0.6364557 0.5030968 0.7298439 0.5693024 0.5709411
Correct answer by haci on April 3, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP