TransWikia.com

How to convert DNA sequences in FASTA format to OneHot Encoded Pandas Dataframe for Neural Networks?

Data Science Asked by David293836 on July 8, 2021

DNA sequences in FASTA format look like:

CATGCATTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCA... 

I am trying to convert them into one-hot encoded data in a Pandas dataframe so that I can use various neural networks to analyze them. This has probably been done many times. Can someone point me to references or Python packages for it?

One Answer

I am not sure about any python package but you can simply do integer encoding using labelencoder() and then do one-hot encoding.

label_encoder = LabelEncoder()
integer_encoded_seq = label_encoder.fit_transform(seq)    
onehot_encoder = OneHotEncoder(sparse=False)
integer_encoded_seq = integer_encoded_seq.reshape(len(integer_encoded_seq), 1)
onehot_encoded_seq = onehot_encoder.fit_transform(integer_encoded_seq)

Refer this.

Correct answer by prashant0598 on July 8, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP