DESeq2 multiple treatments, multiple time points, multiple cell lines

Question

Note: this question has also been asked on Bioconductor Support
I know there are a lot of questions asking similar things here on this forum and I checked the vingette and did a lot of other research but I still have a hard time wrapping my head around it.
In my position, data is unfortunately just put in front of me and I'm asked for pretty pictures. Even making it clear to the wetlab people in the lab that we need replicates is an issue.
I have multiple cell lines, multiple time points and multiple treatments:
Cell lines: CL1, CL2, CL3 Time points: 6h, 24h Treatments: T1, T2, T3, Control
For each cell line and each time point, there are 3 different treatments plus a control. 3 replicates for each sample -> 72 samples
What I want is actually quite simple; I want to measure control vs each treatment at each time point in each cell line. No testing across time points or across cell lines.
My first thought was to separate the data into 6 different data sets (CL1, 6h | CL1, 24h | CL2, 6h | CL2 24h | CL3, 6h | CL3, 24h) and simply do the DE analysis separately, but I read that this is not the way to go. In the end it would also be nice to get a normalized count matrix with all data normalized together, for PCAs and similar.
I hope I made it understandable. How do I design the DESeqDataSet?

gringer · Accepted Answer

As Mike Love alluded to on the Bioconductor pages, it's best to have specialised help for datasets that are this complex; doing analysis on anything which has multiple compounding factors is not simple. It's also a good idea to talk through the analysis before doing sequencing, so you can discuss things like having six replicates per sample (including controls).
In any case, what I did for something similar to this (eventually, after lots of discussion with biologists about possible models) was to combine all the non-tested factor columns into a single factor. Something like this:
meta.combined.df <- meta.df %>%
  mutate(lineTime = paste0(cellLine, "_", timePoint))

And then subset the data based on this statistic:
counts.sub.mat <- 
   counts.sub.mat[,meta.df$lineTime == desired.lineTime]
meta.sub.df <-
   subset(meta.combined.df, lineTime == desired.lineTime)

Then use ~ Treatment as the design formula (assuming "Treatment" is a field containing either the treatment type or "control").
But... this is explicitly treating each line/time point as a separate experiment, which may not be what you want. There are complications associated with batch effects and outliers that need to be talked through, because you could end up identifying "differential expression" that is not reproducible.

swbarnes2 · Answer

To compare a subset of samples to another subset of samples, make one column of column data which will distinguish them all the way you want (so probably join cell line, treatment, timepoint), and make that column your design, and then when you call results, you specify the pair you want to contrast, one pair at a time.

Answered by swbarnes2 on June 8, 2021

DESeq2 multiple treatments, multiple time points, multiple cell lines

2 Answers

Add your own answers!

Ask a Question