Cross Validated Asked by kh_one on December 13, 2020
Say I have a binary response variable, Y, that I model using a logistic model with four predictors, A, B, C and D. To make matters concrete, imagine that Y = 1 designates a respondent registering support for something, and 0 an absence of support.
Having estimated the relevant parameters on some sample, S, I then want to see what proportion of 1s (i.e., support) I likely would have seen, had all observations in S taken on a particular value on A. Assume conditions for causal inference are satisfied for A, so that changing its value can be thought of as a (hypothetical) intervention.
So I create a "new" sample, S*, identical, to S, save for each observation taking on the desired value on A. I then use the fitted model to “predict” the probability of Y = 1 for each observation in that sample. Taking the mean of those predictions I get an estimated proportion of support under the relevant intervention.
My question is: how should I quantify the uncertainty of that estimate? I can think of three ways, but am not sure which one (if any) makes sense:
Any advice here would be greatly appreciated.
If I am understanding you correctly, you want to create a confidence interval around the proportion of observations that would have resulted in $Y = 1$ given $A = a$. Your logistic regression already provides us with $P(Y = 1 | A = a, B = b, C = c, D = d)$. You are justified in taking the mean as a point estimate of $P(Y = 1|A = a)$ by the law of total probability. Because the proportion of observations where $Y = 1$ is logically equivalent to the probability that an observation will yield $Y = 1$ you could use the normal approximation computed on the predicted probabilities themselves to create a confidence interval.
Here is some R code that would do the trick
# fake data for demonstration
# 20 samples drawn from a uniform distribution between 0 and 1
predicted_probs <- runif(n = 50, min = 0, max = 1)
# estimate of the average probability
global_prob_est <- mean(predicted_probs)
# standard error of the estimate
global_prob_se <- sd(predicted_probs)/length(predicted_probs)
# 90% CI using the 5th and 95th percentiles of a normal distribution.
qnorm(p = (0.05, 0.95), mean = global_prob_est, sd = global_prob_se)
Answered by David Telson on December 13, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP