DESeqDataSetFromTximport all(lengths 0) is not TRUE

Question

I am suddenly running into an error when running the DESeqDataSetFromTximport

txi.rsem <- tximport(files, type = "rsem", txIn = FALSE, txOut = FALSE)
dds <- DESeqDataSetFromTximport(txi = txi.rsem,
                                colData = SampleFile,
                                design = ~ Compound)`

using counts and average transcript lengths from tximport
Error in DESeqDataSetFromTximport(txi = txi.rsem, colData = SampleFile,  : 
  all(lengths > 0) is not TRUE

any suggestions what causes this error?

winni2k · Answer

I had this issue as well. It looks like all zero-length genes also have zero expression across all samples. The code below makes sure only to exclude genes that have both zero length and zero expression.
txi = tximport(file_paths, type = "rsem", txIn = TRUE, txOut = TRUE)

zero_length_and_unexpressed = (apply(txi$abundance, 1, max) == 0) &
                              (apply(txi$length, 1, min) == 0)

txi$length = txi$length[!zero_length_and_unexpressed,]
txi$abundance = txi$abundance[!zero_length_and_unexpressed,]
txi$counts = txi$counts[!zero_length_and_unexpressed,]

dds = DESeqDataSetFromTximport(txi, sampleTable, ~ 1)
```

TJ Butler · Answer

I found this same error as of today using this code:
pad.gene.data <-
  tximport(files = file.list.for.gene.analysis,
           type = "rsem",
           txIn = FALSE,
           txOut = FALSE)

dds.gene.level <-
  DESeqDataSetFromTximport(txi = pad.gene.data,
                           colData = pad.characteristic.data,
                           design = ~ condition)

I had done this same differential gene expression analysis using transcript counts aggregated to the gene level, on a mac about 5 weeks ago, but as of this past week switched to a windows and re-installed R 3.6.3 and whatever the current version of tximport and DESeq2 is.
I was able to replicate the results I had gotten on the mac 5 weeks ago removing all rows where all gene lengths are greater than 0:
pad.gene.data$abundance <-
  pad.gene.data$abundance[apply(pad.gene.data$length,
                                1,
                                function(row) all(row !=0 )),]

pad.gene.data$counts <-
  pad.gene.data$counts[apply(pad.gene.data$length,
                             1,
                             function(row) all(row !=0 )),]

pad.gene.data$length <-
  pad.gene.data$length[apply(pad.gene.data$length,
                             1,
                             function(row) all(row !=0 )),]

So first removing any gene where there is a length count of 0 from the abundance and counts matrix, then removing the genes from the length matrix so that they are all the same size. It was able to run that way.
I am definitely skeptical of why I would need to do that as it doesn't seem that it was advertised, if this is a new version, that we need to preprocess the data beforehand. I hope someone who might know more of the code can comment on this.

DESeqDataSetFromTximport all(lengths > 0) is not TRUE

2 Answers

Add your own answers!

Ask a Question