Cross Validated Asked by kh_one on December 13, 2020

Say I have a binary response variable, *Y*, that I model using a logistic model with four predictors, *A*, *B*, *C* and *D*. To make matters concrete, imagine that *Y* = 1 designates a respondent registering support for something, and 0 an absence of support.

Having estimated the relevant parameters on some sample, *S*, I then want to see what proportion of 1s (i.e., support) I likely *would* have seen, had all observations in *S* taken on a particular value on *A*. Assume conditions for causal inference are satisfied for *A*, so that changing its value can be thought of as a (hypothetical) intervention.

So I create a "new" sample, *S**, identical, to *S*, save for each observation taking on the desired value on *A*. I then use the fitted model to “predict” the probability of *Y* = 1 for each observation in that sample. Taking the mean of those predictions I get an estimated proportion of support under the relevant intervention.

My question is: how should I quantify the uncertainty of that estimate? I can think of three ways, but am not sure which one (if any) makes sense:

- Resample from the predicted probabilities of the model and bootstrap a confidence interval for the relevant mean that way.
- Calculate a confidence interval for the prediction made on each observation (the probability that respondent 1 registers support, etc.) like here, and then create a confidence interval for mean support by taking the means of the upr and lwr values of the individual predictions.
- Resample from
*S** to fit a large number of models, generate predictions on each model, and then bootstrap a confidence interval for the relevant mean from these predictions.

Any advice here would be greatly appreciated.

If I am understanding you correctly, you want to create a confidence interval around the proportion of observations that would have resulted in $Y = 1$ given $A = a$. Your logistic regression already provides us with $P(Y = 1 | A = a, B = b, C = c, D = d)$. You are justified in taking the mean as a point estimate of $P(Y = 1|A = a)$ by the law of total probability. Because the proportion of observations where $Y = 1$ is logically equivalent to the probability that an observation will yield $Y = 1$ you could use the normal approximation computed on the predicted probabilities themselves to create a confidence interval.

Here is some R code that would do the trick

```
# fake data for demonstration
# 20 samples drawn from a uniform distribution between 0 and 1
predicted_probs <- runif(n = 50, min = 0, max = 1)
# estimate of the average probability
global_prob_est <- mean(predicted_probs)
# standard error of the estimate
global_prob_se <- sd(predicted_probs)/length(predicted_probs)
# 90% CI using the 5th and 95th percentiles of a normal distribution.
qnorm(p = (0.05, 0.95), mean = global_prob_est, sd = global_prob_se)
```

Answered by David Telson on December 13, 2020

Get help from others!

Recent Answers

- haakon.io on Why fry rice before boiling?
- Peter Machado on Why fry rice before boiling?
- Lex on Does Google Analytics track 404 page responses as valid page views?
- Jon Church on Why fry rice before boiling?
- Joshua Engel on Why fry rice before boiling?

Recent Questions

- How can I transform graph image into a tikzpicture LaTeX code?
- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP