Causal estimates have high correlation with naive estimates - what may this imply?

Question

In an observational study, suppose that individuals can choose from $N$ different treatments, and there are the same confounders for each treatment and the outcome. The naive probability, $frac{text{successful outcomes}_i}{text{cases of treatment}_i}$, for treatment $i$ is just a correlation and has no causal interpretation.

Given further assumptions, I run a causal inference model on each treatment and get a measure of effect for that treatment (I'm using a logistic model so I have $N$ coefficients, which I interpret causally).

However, the correlation between the naive probability estimates and the causal coefficients is quite high and positive. What does this mean? Some of my thoughts:

1) There is little confounder bias. I think it is true that the less confounder bias, the more correlated the two datasets should be. Given that, how can I measure the confounder bias?

2) Hidden confounder. Maybe I missing a confounder, though I don't think so given my problem.

3) Misspecified causal model. Maybe my (simple) causal model is just "window-dressing" and not really giving my better estimates. Any way to show this?

I'm looking for advice on what I could do to help interpret my results, and possibly narrow down where I should be looking.

Ed Rigdon · Answer

It could mean that the impact of the confound is all in one direction. You indicate that all relations share the same confounders. So why should not the direction of bias be consistent across relations?

Answered by Ed Rigdon on December 1, 2021

Marcello · Answer

To be clear, you ran a logistic regression on the data and had positive results showing the treatment was associated with the outcome, right?
Let's leave the causation part aside since statistics cannot show causation, that require reasoning.

It is unclear whether you ran a multivariate or a univariate model. In any case, as you know, the coefficients you get on a logistic regression represent the change on the log odds (or logit) for one unit change in your predictor, or, in your case the difference in the logit for each treatment compared to the one you chose as the baseline.

Therefore, I would expect that there is a strong correlation between the probability of having the outcome with each treatment and the logit of developing the outcome with each treatment. Wouldn't you?

Just to illustrate, here is a graph of LOGIT vs. PROBABILITY. In your case, the coefficients would be on the Y axis and represent the change (in that same scale) between one treatment and the other.

In my view, that tells you nothing about confounding. If you have true confounding that you did not include in your model, then that change in probability and the change in the log odds may not be because of the treatment at all.

If you did run an MV model, then the cofactors that you entered did not explain away the effect of the treatment. But you don't need to test the correlation for that.

Statistics is not going to help you design a better causal model. You should try to draw a causal diagram to help you think about the problem.

In summary, the logit should be correlated with the probability. As the probability changes, so does the logit. If there are confounders and you add them to the model then the correlation will go down. Having a high correlation could mean either having a true strong relationship between treatment and effect OR having a confounder you did not add to the model.

Causal estimates have high correlation with naive estimates - what may this imply?

2 Answers

Add your own answers!

Ask a Question