Bioinformatics Asked on July 19, 2021
I’m trying to understand why RPKM is not an appropriate way to normalize for RNA seq (I understand the general idea, but I’d like to gain a deeper understanding). So I’m reading the original paper by Wagner et al 2012 that suggests TPM as an alternative. However, I’m struggling to understand the following paragraph, and was wondering if anyone can shed some light on what the authors are trying to say:
The reason for the inconsistency of RPKM across samples arises from the normalization by the total number of reads. While rmc as well as qPCR results are ratios of transcript concentrations, the RPKM normalizes a proxy for transcript number by $r_g times 10^3 / fl_g$ the number of sequencing reads in millions, $R / 10^6$. The latter, however, is not a measure of total transcript number. The relationship between $R$ and the total number of transcripts sampled depends on the size distribution of RNA transcripts, which can differ between samples. In a sample with, on average, longer transcripts the same number of reads represents fewer transcripts.
One part that I’m particularly confused by is: "The RPKM normalizes a proxy for transcript number by $r_g times 10^3/fl_g$ the number of sequencing reads in millions, $R/10^6$." What is the "proxy" here?
That is why: https://www.biostars.org/p/9465851/#9465854.
All these naive per-million scaling methods may fail to correct for library composition differences.
Answered by ATpoint on July 19, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP