Widening prediction intervals when data is missing

Question

Question: How can I get a prediction interval that widens when data is missing?
Problem:
I'm trying to predict a likely range of customer spending.  Prediction interval seems like an excellent answer; my feature distributions are unknown, and are probably unknowable.  I have a reasonably rich dataset, but the any given customer is likely to be missing some random features.
Intuitively, I really want my intervals to widen when data is missing -- the less information informs my prediction, the less precise my estimate can be, right?
Attempts:
For this style of problem, I typically use ensemble tree models like LGB or more recently NGBoost -- trees seem to do a good job of inferring the existence and behaviour of demographics of which I have no prior knowledge.  These models deal with missing data by imputing -- either (a) before prediction, or (b) in-model, but allocating values to the side that better reduces loss.  Both of these techniques have the opposite of the desired effect!
I've also had a look at Bayesian models, thinking they should be able to do this just by eliminating the terms for the missing features, but... none of them seem to do so.  I can't find anything that works appropriately, and rolling my own from first principles seems like an incredibly deep rabbit hole.
TLDR: Can you suggest a quantile regression model that widens to account for missing data?

Widening prediction intervals when data is missing

Add your own answers!

Ask a Question