Bioinformatics Asked on November 9, 2020
When correcting my data for a batch effect using removeBatchEffect, some of the gene expression values become negative.
When searching for differentially expressed genes, I do not use the data above, but rather model the batch using deseq2 (design=~Batch + Condition).
However, I started worrying. When DESeq2 introduces a batch into the model, does it allow for negative values "behind the scenes"?
If it does, I do not understand how that can make sense, in the context of RNA-Seq data.
DESeq2 uses the batch information (and everything else in the design) to produce offsets for its GLM. For a background on that please check how linear models work, e.g. using the StatQuest series of statistics videos over at YouTube.
It still operates on the raw counts. The same goes for the normalization factors.
removeBatchEffect
fits a linear model to the data including the batch information and then subtracts the batch component from the counts (that is basically the baseline difference).
If you are interested in preserving the integer nature of counts and preserving zeros as smallest values after explicit batch correction you may want to check ComBat-seq()
from the sva package. It operates on raw counts and returns batch-corrected raw counts which you could then normalize and calculate CPMs from (if you need batch corrected CPMs with no negative values). I find it useful and prefer it over removeBatchEffect
as it avoids the unfortunate negative counts which sometimes messes up plotting scripts that expect zero as the smallest possible value.
Correct answer by ATpoint on November 9, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP