Data Science Asked by JunjieChen on September 30, 2020
Suppose we have two kinds of input features, categorical and continuous. The categorical data may be represented as one-hot code A, while the continuous data is just a vector B in N-dimension space. It seems that simply using concat(A, B) is not a good choice because A, B are totally different kinds of data. For example, unlike B, there is no numerical order in A. So my question is how to combine such two kinds of data or is there any conventional method to handle them.
In fact, I propose a naive structure as presented in the picture
As you see, the first few layers are used to change(or map) data A to some middle output in continuous space and it is then being concated with data B which forms a new input feature in continuous space for later layers. I wonder whether it is reasonable or it is just a “trial and error” game. Thank you.
There's three main approaches to solving this:
The ensemble approach is the most straight forward option and will yield decent results, however, it will not work as well as the option you proposed as because the ensemble network only receives class probabilities from the two networks as an input and will in comparison to your approach miss out on more complex relationships between the data types.
The second approach is in theory not that different from your proposed approach, with the difference being that it assumes that the network will on it's own figure out that the input consists of two types of data (as they are both in the same vector/tensor). It will take a lot of training time for the network to learn that and you might get stuck in a local minima before that even happens.
Based on my personal experience, the network you proposed is the best option as and it's likely to have shortest train time and once you get the architecture right you will find it very easy to train and maintain (retrain) the network in production as well as you will only ever have to retrain one single model.
Answered by Tadej Magajna on September 30, 2020
As @Tadej Magajna already said there are different solutions for that. I would like to add a post about how to implement the third approach in keras:
Combining numerical and text features in (deep) neural networks
Answered by ixeption on September 30, 2020
I had been using the naive structure proposed by you for quite some time by now. In a well framed problem and with enough data, this type of architecture works quite well. However here are a few things which I learnt:
The key design would be the concatenation layer and where would you want to put it in the architecture. Also using embedding layers gives you additional benefit of using those learnt embeddings in some other tasks / visualizations.
These type of architecture has been used in Kaggle competitions and is also taught in the Fast.ai course by Prof. Jeremy Howard.
Answered by Soumyajit on September 30, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP