TransWikia.com

How to quantile normalization on RNA seq counts

Bioinformatics Asked on February 8, 2021

I have a read count data (RNAseq) and want to perform quantile normalization. Could you please help me how to do it. I tried some scripts in R but it didn’t work. I want the result output in matrix form.

  gene_id   SRR896664   SRR896663   SRR896665
  ENSG00000000003   46106   36353   40614
  ENSG00000000005   198 399 1200
  ENSG00000000419   40364   37769   40849
  ENSG00000000457   18924   16211   16057
  ENSG00000000460   31040   28888   29901
  ENSG00000000938   200 0   394
  ENSG00000000971   14935   14353   12522

The script which i tried is

  data <- read.csv("testquantile.csv",header=T)
  head(data)
  rownames(data) <- data[,1]
  data_mat <- data.matrix(data[,-1]) 
  head(data_mat)
  data_norm <- normalize.quantiles(m, copy = TRUE)

3 Answers

On google there are many tutorials about quantile normalzation, for example here. In that tutorial they made a function to calculate quantile normalization. Here an example with that function on your small data set.

data
          gene_id SRR896664 SRR896663 SRR896665
1 ENSG00000000003     46106     36353     40614
2 ENSG00000000005       198       399      1200
3 ENSG00000000419     40364     37769     40849
4 ENSG00000000457     18924     16211     16057
5 ENSG00000000460     31040     28888     29901
6 ENSG00000000938       200         0       394
7 ENSG00000000971     14935     14353     12522

rownames(data) <- data$gene_id

quantile_normalisation <- function(df){
  df_rank <- apply(df,2,rank,ties.method="min")
  df_sorted <- data.frame(apply(df, 2, sort))
  df_mean <- apply(df_sorted, 1, mean)

  index_to_mean <- function(my_index, my_mean){
    return(my_mean[my_index])
  }

  df_final <- apply(df_rank, 2, index_to_mean, my_mean=df_mean)
  rownames(df_final) <- rownames(df)
  return(df_final)
}

new_data <- quantile_normalisation(data[,2:4])
new_data
                 SRR896664  SRR896663  SRR896665
ENSG00000000003 41574.6667 39110.3333 39110.3333
ENSG00000000005   197.3333   599.6667   599.6667
ENSG00000000419 39110.3333 41574.6667 41574.6667
ENSG00000000457 17064.0000 17064.0000 17064.0000
ENSG00000000460 29943.0000 29943.0000 29943.0000
ENSG00000000938   599.6667   197.3333   197.3333
ENSG00000000971 13936.6667 13936.6667 13936.6667

boxplot(data[,2:4])

enter image description here

boxplot(new_data)

enter image description here

Answered by benn on February 8, 2021

Answered by SmallChess on February 8, 2021

Ma be CQN from Bioconductor will be useful, though it doesn't perform just quantile normalisation.

Answered by geek_y on February 8, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP