How to get matching pattern along with ID in a single command in grep?

Question

I have a file containing multiple fast sequneces. For a specific consensus pattern as input, I extracted all the matching patterns from target fasta sequences with
grep -o -E "CC[GT]AAA[GC][AC]TT[GC]" input.fasta

However, the above command will retrieve just the matching sequences from fasta sequences and I  also wanted to get the corresponding fasta header of each match.
For example, if input.fasta  file is something like this,
>Gene 1
TGATGAAAAATGATAGAT
ATTGGGGGAAAAAAAAAT

>Gene 2
TTTCCTAAAGATTGT
AAATTTAAAAATGTTTTT

(Gene 2 has matching subsequence CCTAAAGATTG)
Output:
CCTAAAGATTG   Gene2

I prefer a solution with grep. But other possible solutions also helpful.

Timur Shtatland · Accepted Answer

Use this Perl one-liner:
perl -lne '$id = $1 if /^>(.+)/; ($m) = /(CC[GT]AAA[GC][AC]TT[GC])/; print join "t", $id, $m if $m;' input.fasta

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("n" on *NIX by default) before executing the code in-line, and append it when printing.
SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

How to get matching pattern along with ID in a single command in grep?

One Answer

Add your own answers!

Ask a Question