Bioinformatics Asked on January 22, 2021
I’m very new to python, and having some difficulty getting hang of some more complicated things
I have multiple files which look like so:
hCoV-19/Singapore/4/2020|EPI_ISL_410535|2020-02-03
hCoV-19/USA/WA13-UW9/2020|EPI_ISL_413601|2020-03-02
hCoV-19/USA/WA-UW142/2020|EPI_ISL_416680|2020-03-11
Please be aware that the lines above are meant to be one file
I want to extract the EPI_ISL_000000 for an easy comparison among files.
Could someone please advise on:
A programme to extract this data into new files (There’s many lines in each file – 1000+)
A programme to then give a % comparison between two or more files – comparing all lines in one file against all lines in a second+ file
left_lineagelist = [x.split('_')[-1].split('|')[0]
for x in left_lineagelist]
right_lineagelist = set([x.split('_')[-1].split('|')[0]
for x in right_lineagelist])
Allows for extraction of 6 digit EPI, provided the file has had sequences removed prior; as such:
for line in lines:
if line[0] == '>':
print(line[1:])
Correct answer by Theo Jones on January 22, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP