Data Science Asked by David293836 on July 8, 2021
DNA sequences in FASTA format look like:
CATGCATTAGTTATTAATAGTAATCAATTACGGGGTCATTAGTTCA...
I am trying to convert them into one-hot encoded data in a Pandas dataframe so that I can use various neural networks to analyze them. This has probably been done many times. Can someone point me to references or Python packages for it?
I am not sure about any python package but you can simply do integer encoding using labelencoder() and then do one-hot encoding.
label_encoder = LabelEncoder()
integer_encoded_seq = label_encoder.fit_transform(seq)
onehot_encoder = OneHotEncoder(sparse=False)
integer_encoded_seq = integer_encoded_seq.reshape(len(integer_encoded_seq), 1)
onehot_encoded_seq = onehot_encoder.fit_transform(integer_encoded_seq)
Refer this.
Correct answer by prashant0598 on July 8, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP