Bioinformatics Asked on June 3, 2021
Are there best practices to load different bioinformatics file formats such as VCF, BED, GFF, and SAM to SQL databases? I am wondering how people out there do that efficiently.
All of these three formats are tab-separated files, so basically the following should work. I feel weird about it since most people I know don’t use MySQL to work with these files.
LOAD DATA LOCAL INFILE 'bed.bed' INTO TABLE bed-file FIELDS TERMINATED BY 't' LINES TERMINATED BY 'n' IGNORE 1 ROWS (list of the columns) SET creation_date = STR_TO_DATE(@creation_date, '%m/%d/%y');
Answer from @liam-mcintyre converted from comment:
I don't use dask as it doesn't support enough pandas functionality (unfortunately). With pandas I do it with read_csv... if its big then read in chunks and send chunks to separate threads. If you want to ask a specific question with example data etc then I can show code.
Answered by gringer on June 3, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP