TransWikia.com

Widening prediction intervals when data is missing

Data Science Asked on January 21, 2021

Question: How can I get a prediction interval that widens when data is missing?

Problem:

I’m trying to predict a likely range of customer spending. Prediction interval seems like an excellent answer; my feature distributions are unknown, and are probably unknowable. I have a reasonably rich dataset, but the any given customer is likely to be missing some random features.

Intuitively, I really want my intervals to widen when data is missing — the less information informs my prediction, the less precise my estimate can be, right?

Attempts:

For this style of problem, I typically use ensemble tree models like LGB or more recently NGBoost — trees seem to do a good job of inferring the existence and behaviour of demographics of which I have no prior knowledge. These models deal with missing data by imputing — either (a) before prediction, or (b) in-model, but allocating values to the side that better reduces loss. Both of these techniques have the opposite of the desired effect!

I’ve also had a look at Bayesian models, thinking they should be able to do this just by eliminating the terms for the missing features, but… none of them seem to do so. I can’t find anything that works appropriately, and rolling my own from first principles seems like an incredibly deep rabbit hole.

TLDR: Can you suggest a quantile regression model that widens to account for missing data?

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP