Bioinformatics Asked by 20 21 on June 7, 2021
I have a file containing multiple fast sequneces. For a specific consensus pattern as input, I extracted all the matching patterns from target fasta sequences with
grep -o -E "CC[GT]AAA[GC][AC]TT[GC]" input.fasta
However, the above command will retrieve just the matching sequences from fasta sequences and I also wanted to get the corresponding fasta header of each match.
For example, if input.fasta file is something like this,
>Gene 1
TGATGAAAAATGATAGAT
ATTGGGGGAAAAAAAAAT
>Gene 2
TTTCCTAAAGATTGT
AAATTTAAAAATGTTTTT
(Gene 2 has matching subsequence CCTAAAGATTG)
Output:
CCTAAAGATTG Gene2
I prefer a solution with grep. But other possible solutions also helpful.
Use this Perl one-liner:
perl -lne '$id = $1 if /^>(.+)/; ($m) = /(CC[GT]AAA[GC][AC]TT[GC])/; print join "t", $id, $m if $m;' input.fasta
The Perl one-liner uses these command line flags:
-e
: Tells Perl to look for code in-line, instead of in a file.
-n
: Loop over the input one line at a time, assigning it to $_
by default.
-l
: Strip the input line separator ("n"
on *NIX by default) before executing the code in-line, and append it when printing.
SEE ALSO:
perldoc perlrun
: how to execute the Perl interpreter: command line switches
Correct answer by Timur Shtatland on June 7, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP