TransWikia.com

Tree Based Classification (XGBoost, LightGBM, etc) - Features from embeddings for sparse features?

Data Science Asked by user1157751 on April 8, 2021

I’m wondering if there is a possibility from using embeddings as inputs for tree based classification models?

For example we have a field called type of food, and there are many types of food out there.

One might transform this feature by:

  1. Hashing Trick + OHE: We reduce our feature sizes, so that when we perform OHE we won’t have too many columns.
  2. Label Encoding: Transform features into numeric values, best works if there is some type of ordinality, however if not, sometimes it works too.

However based on a lot of paper like: DeepFM, where people use OHE encoded values, and process them into an embedding layer that is basically an autoencoder and generate dense features for the next layers.

But I’ve never seen people use an autoencoder as an input to a tree, nor is there a library that can do this without utilizing using PyTorch/TensorFlow to build from scratch to keep things simple.

On the other hand, if an individual autoencoder is trained, the autoencoder does not converge based on the classification loss. So I’m wondering if this is even feasible? Even though one can just try it out and build an autoencoder using PyTorch and feed it to XGBoost, etc.

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP