Bioinformatics Asked by DumbledoreTheGrey on March 1, 2021
I am working on a project and used the following command:
vsearch --derep_fulllength filtered_merged.fa -sizeout -relabel Uniq -output dereplicated_filtered_merged.fa
and got the following output:
87373926 nt in 203453 seqs, min 310, max 480, avg 352
Sorting 100%
10981 unique sequences, avg cluster 2.0, median 1, max 1287
Writing output file 100%
The output had provided me with the data that 10981 unique sequences have been identified. But I cant seem to identify how many reads of the most common sequence were present in the input data.
Any suggestions will be kindly appreciated!
According to the VSEARCH docs, since you have specified --sizeout
your abundances have been written into the FASTA headers:
--sizeout
Take into account the abundance annotations present in the input fasta file (search for the pattern ’[>;]size=integer[;]’ in sequence headers). That option is active by default when rereplicating.
Add abundance annotations to the output fasta file (add the pattern ’;size=integer;’ to sequence headers). If --sizein is specified, each unique sequence receives a new abun- dance value corresponding to its total abundance (sum of the abundances of its occur- rences). If --sizein is not specified, input abundances are set to 1, and each unique sequence receives a new abundance value corresponding to its number of occurrences in the input file.
Correct answer by Maximilian Press on March 1, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP