One-Hot Encoded Matrix Inupt/Ouput for Autoencoder

Data Science Asked by aslconwnb on April 29, 2021

I am trying to write an autoencoder to reduce the dimensionality of my genomic data. Currently, my data is in the form of a $273278 times 1$ vector. Each element of the vector indicates whether a position has no mutations (0), one mutation (1), or two mutations (2). As such, the input and output of my autoencoder looks like this:

$$begin{bmatrix}
0
1
0
2
vdots
end{bmatrix}$$

This uses label encoding to represent the categorical data. This works, but the autoencoder isn’t very accurate since the 0, 1, and 2 data are not related to each other.

I am considering using one-hot encoding to create a $273278 times 3$ matrix where each column corresponds to 0, 1, or 2. As such, the above vector would turn into this:

$$begin{bmatrix}
1 & 0 & 0
0 & 1 & 0
1 & 0 & 0
0 & 0 & 1
vdots & vdots & vdots
end{bmatrix}$$

However, I am unsure of how to input this matrix into a (keras) neural network. Is there a function to do this? Would flattening this matrix be mathematically appropriate? Is there another method to do this?

autoencoder deep learning keras neural network one hot encoding

Add your own answers!

Ask a Question

Get help from others!