Bioinformatics Asked by Eliran Turgeman on December 8, 2020
I want to write a script that filters a pileup file from a site lists file.
As an input I get a reference genome, pileup and site lists files.
Example of an output for this script:
Pileup File :
NC_000001.10 11456 A 0 * *
NC_000001.10 11457 C 0 * *
NC_000001.10 11458 T 0 * *
NC_000002.11 11460 G 1 , E
NC_000002.11 13914 G 1 , E
NC_000003.11 10503990 C 1 , E
Reference File (After using grep by ‘>’):
>NC_000001.10 Homo sapiens chromosome 1, GRCh37.p13 Primary Assembly
>NT_113878.1 Homo sapiens chromosome 1 unlocalized genomic contig, GRCh37.p13 Primary Assembly
>NT_167207.1 Homo sapiens chromosome 1 unlocalized genomic contig, GRCh37.p13 Primary Assembly
>NC_000002.11 Homo sapiens chromosome 2, GRCh37.p13 Primary Assembly
>NC_000003.11 Homo sapiens chromosome 3, GRCh37.p13 Primary Assembly
Lists File:
chr1 11456 C1orf186 - intronic intronic no no N N N
chr1 11457 C1orf186 - intronic intronic no no N N N
chr2 13914 intergenic - intergenic intergenic no no N N N
chr7 30504355 NOD1 - intronic intronic no no N N N
chr3 10503990 SSX2IP - Syn Gln->Gln no no N N N
chr1 13131320 MEF2A + intronic intronic no no N N N
Output File:
NC_000001.10 11456 A 0 * *
NC_000001.10 11457 C 0 * *
NC_000002.11 13914 G 1 , E
NC_000003.11 10503990 C 1 , E
So for every refseq in the pileup file I would convert it to the matching chromose in which is found in the reference file, and then overlap it with my lists file so I would get the lines with the same chromose and position.
The problem is that while there are many refseq coordinates that belong to chr1 for example, alignment to each of them will give a position, with respect to the specific coordinate. But on the list there is a full, single, chr1.
I need to be able to compare between:
EDIT: If I were to convert my pileup file into a bed file (by repeating the position twice) could I use somehow bedtools to solve my problem?
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP