Data Science Asked by Maeaex1 on April 28, 2021
I want to build a machine learning model (xgb and lgbm) that has to handle streaming data on a weekly basis. The models are trained on a bi-weekly basis. The data includes order information and I want to predict the likelyhood that the order will be delivered. The orders are entered in the system and a week after one can say if the orders were indeed delivered or not.
For nominal data like supplier I use pd.get_dummies() for transformation. However, lets say I receive my order data for the orders that arrive next week. There is a new supplier that the trained model doesn’t know yet as the column supplier_new_unkown_supplier
does not exist in the saved model parameters.
Does anyone know how to deal with such cases?
If you want to make an inference on an order where a categorical variable was not seen in the training data, you could train the model on a hash bucket representation of that variable.
If using tensorflow, you can leverage: https://www.tensorflow.org/api_docs/python/tf/feature_column/categorical_column_with_hash_bucket
Or implement yourself.
Answered by zfact0rial on April 28, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP