Neural Nets: unordered sets of ordered tuples as features of data

Question

I'm working on a very small scale pet project in which inputs are essentially sets of (x, y) pairs, and are to be classified into categories, using deep learning, specifically using Keras (I know this may not be the best for this, but it's more of a proof of concept / I want to try it out).

However, I'm not sure how to go about representing the data.

I'm starting with a simple classification problem (i.e. if (a, b) is a feature of a sample, and a and b are both within 5% of a certain (c, d), then they are a positive example, and not otherwise), but I'm not sure how to represent the data such that the network can learn this.

I was thinking of doing one-hot encoding, but then the dimensionality of the data may grow immensely (x and y both take on values in a continuous interval), and I'm worried that it would not generalize well to data augmentation (I would augment the data by adding noise to each x, y).

Any ideas?

Benji Albert · Answer

Your features are already numerical, so you can simply pass your array of numbers into a MLP.  For instance, if you have 5 sets of (x,y) coordinates per sample, your MLP will have 10 inputs.

One-hot encoding is used to convert categorical features into numerical values.  This image from Kaggle does a great job of illustrating how you might use one-hot encoding:

If you were to directly convert categorical values to numerical values, that would imply that one color is greater than another, so we need to use a boolean array instead.

As for the variable length feature vector, it is difficult to give an exact answer without knowing more details about your problem, but here are general possible solutions:

use a recurrent neural network
use distance-based regression/classification algorithms such as k nearest neighbors
transform your data into feature vectors of fixed length (which may not be reasonable depending on your data and what it represents)

A quick high level introduction to neural networks might be helpful.

Neural Nets: unordered sets of ordered tuples as features of data

One Answer

Add your own answers!

Ask a Question