Within and between sample count normalization

Question

recently I came across a situation where RNASeq sample quality was expresses trough DESeq sizeFactor. So authors reported quality of their samples with respect to some publicly available datasets as good/bad if the scaling factor of their samples was equal/higher/lower than 1. What is more, they computed it using FPKM's from Cufflinks and executed geometric mean calculation over genes (as the one done in DESeq after log transforming the raw counts). Underlying logic being that with FPKM the samples were normalized "within" and with the additional DESeq, normalization "between", thus ready ready for any downstream analysis (differential expression, AS, etc.)
I haven't seen something like that before nor the use of DESeq scaling factor for estimating the quality of a sample (all samples are from the same species/tissue/condition/etc. )
Can anyone comment on why is this ok/not ok because I cannot wrap my head around the logic (then again I am just trying to replicate the study, no value contribution on my end.)
Thank you

PPK · Answer

The approach you are describing seems very strange.
Crucially, the Vignette for DESeq2 states that the model only works correctly with unnormalized counts as input:

It is important to provide count matrices as input for DESeq2’s
statistical model (Love, Huber, and Anders 2014) to hold, as only the
count values allow assessing the measurement precision correctly. The
DESeq2 model internally corrects for library size, so transformed or
normalized values such as counts scaled by library size should not be
used as input.

Normalized counts like FPKM are not supported and should be converted to counts using the appropriate import funcion.
I don't really understand the use of the normalization factors for quality control because the number of reads in a sample does not correlate well with the quality. Maybe you could link to the paper in case there is something hidden in the experimental design that would explain this.
If you are interested in quality control a useful metric in my experience is the 3' sequencing bias that can be calculated using, for example, RNA-SeQC.

Within and between sample count normalization

One Answer

Add your own answers!

Ask a Question