Cross Validated Asked by MoneyBall on November 9, 2021
I am working on some problems in All of statistics by Wasserman, and I am not quite sure how to tackle this problem.
Suppose you are given data $(X_1, Y_1), dots, (X_n, Y_n)$ from an observational study where $X_i in {0, 1}$ and $Y_i in {0, 1}$. Although it is not possible to estimate the causal effect $theta$, it is possible to put bounds on $theta$. Find the upper and lower bounds on $theta$ that can be consistently estimated from the data.
The hint says to use $mathbb{E}(C_1) = mathbb{E}(C_1 | X=1) mathbb{P}(X=1) + mathbb{E}(C_1 | X=0) mathbb{P}(X=0)$. Then,
begin{align*}
theta &= mathbb{E}(C_1) – mathbb{E}(C_0) \
&= mathbb{E}(C_1|X=1)mathbb{P}(X=1) + mathbb{E}(C_1 | X=0)mathbb{P}(X=0) – mathbb{E}(C_0 | X=1) mathbb{P}(X=1) – mathbb{E}(C_0 | X=0)mathbb{P}(X=0)
end{align*}
Not sure where I go from here…
EDIT:
here’s a nice table that summarizes the setup
begin{array}{|c|c|c|c|}
hline
X& Y & C_0 & C_1 \
hline
0 & 0 & 0 & 0^* \
hline
0 & 0 & 0 & 0^* \
hline
0 & 0 & 0 & 0^* \
hline
1 & 1 & 1^* & 1 \
hline
1 & 1 & 1^* & 1 \
hline
1 & 1 & 1^* & 1 \
hline
end{array}
I'm impressed that a stats book has causality in it. This is Problem 16.6.3. The author defines $theta$ as the average causal effect, or average treatment effect. In your toy data example, you could actually compute $theta$ as $$theta=frac{0+0+0+1+1+1}{6}-frac{0+0+0+1+1+1}{6}=0,$$ as the author does in Example 16.2. But in your actual problem, you don't have this data.
So I think here's an idea for moving forward. Your initial writing out of $E(C_1)$ and $E(C_1)-E(C_0)$ is exactly right. Let's continue. First of all, note that every expectation you wrote down is non-negative; indeed, they are all in the interval $[0,1].$ So that for an upper bound, we can say that begin{align*} E(C_1)-E(C_0) &=E(C_1|X=1)P(X=1)+E(C_1|X=0)P(X=0)\ &quad-E(C_0|X=1)P(X=1)-E(C_0|X=0)P(X=0)\ &le E(C_1|X=1)P(X=1)+E(C_1|X=0)P(X=0)-E(C_0|X=0)P(X=0). end{align*} We're not done, yet, because in this last expression, we still have the counterfactual $E(C_1|X=0)P(X=0),$ which we can't get from the data. We can get the first term and the last term from the data. Well, what's the biggest that the counterfactual can be? It can't be bigger than if all the missing rows were $1$'s, so we can say that $$E(C_1)-E(C_0)le E(C_1|X=1)P(X=1)+P(X=0)-E(C_0|X=0)P(X=0),$$ which is all available in the data.
Similarly, we can write begin{align*} E(C_1)-E(C_0) &ge E(C_1|X=1)P(X=1)-E(C_0|X=1)P(X=1)-E(C_0|X=0)P(X=0)\ &ge E(C_1|X=1)P(X=1)-P(X=1)-E(C_0|X=0)P(X=0). end{align*}
Answered by Adrian Keister on November 9, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP