Bioinformatics Asked by AmadeusDrZaius on April 27, 2021
If I have a sample sheet that contains both single-indexed and dual-indexed samples, I can split it up into two sample sheets and then run bcl2fastq on each one. However, when doing this, large Undetermined fastq files are generated. E.g., when processing the single-indexed samples, all the dual-indexed samples go to Undetermined. And when processing the dual-indexed samples, the single-indexed samples go to Undetermined.
Additionally, because they are being processed separately, if any single index A is part of a dual index A+B, then when processing the single indexes, an A+B may be mistaken for an A, so it would seem that they need to processed simultaneously to avoid this mis-assignment.
Given a sample sheet and directories of BCL files, how can such a set of sequencing data be demultiplexed correctly, either using bcl2fastq or the Picard tools?
To put it another way, I want to demultiplex a single sequencing run that contains both single-indexed and dual-indexed samples. It can be assumed that the indexes are sufficiently distinct such that any sample’s index configuration is different from any other. But assuming that the different index configurations are not segregated to particular lanes of the sequencer, the question is how to demultiplex the files correctly such that both the single-indexed and dual-indexed samples are recognized.
If a sample sheet of a mix of indexes is given to bcl2fastq (v2.17) directly, it produces the error
ERROR: bcl2fastq::common::Exception: Success (0): .../bcl2fastq2/src/cxx/lib/layout/BarcodeCollisionDetector.cpp(127):
Throw in function void bcl2fastq::layout::BarcodeCollisionDetector::validateNewBarcodeSizesAgainstExisting(const std::vector<long unsigned int>&) const
Dynamic exception type: boost::exception_detail::clone_impl<bcl2fastq::layout::BarcodeCollisionError>
std::exception::what: Barcodes have an unequal number of components.
Barcodes have an unequal number of components.
It seems that it should be possible using the Picard tools, but I have not found a way to set up the inputs to ExtractIlluminaBarcodes and IlluminaBasecallsToFastq that processes this configuration correctly.
The syntax for supporting multiple index configurations is not entirely clear. But when using various combinations of N and ‘*’ on multiplex_params.tsv and barcodes.txt required by the Picard tools, a large Undetermined file is still produced and actual sample fastq files are tiny, indicating it is not processing them correctly.
As indicated in the discussion above, this suffers from the problem of having all the dual indexed samples going to the “Undetermined” file when processing the single-indexed samples, or vice versa, and creates two output directories of reports and stats, which must be merged.
By padding, I mean converting ACGTACGT to ACGTACGT+NNNNNNNN so that the single-index samples in the sample sheet are “dual” as well. This strategy seems that it would work except there is a bug in bcl2fastq that it treats “N” literally instead of as a wildcard. See the release notes for details.
Have you just tried giving bcl2fastq one sample sheet with a mix of single and dual indices? I don't think what you are trying to do is a problem.
if any single index A is part of a dual index A+B,
Well, it might be a problem if you did that. That was poor planning.
Answered by swbarnes2 on April 27, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP