Cross Validated Asked on November 12, 2021
If I run the margins
command on a dichotomous variable, what does the output tell me exactly? On a continuous variable I understand that it tells me the average value for a given category but if I run on a dichotomous outcome variable and a categorical independent variable, what does it tell me?
Is there a cut-off at 0.5 so if it’s 0.25 that means that the average value at that level of the categorical variable is closer to 0 than 1, so the result (if significant) says that the effect is significantly lower?
As an example say I’m looking at cancer sizes and seizures. Cancer sizes is a 4 level categorical ordinal variable.
I run a logistic regression for seizure no/yes and cancer sizes. I then run the margins
command and see that size 2 has a "margin" of 0.25. Does that mean that people are more likely to NOT (no being 0) experience seizures at this level?
And also how is this different than running a logistic regression on each level of the categorical variable dichotomized to dummy variables?
It is hard to answer this precisely without seeing what you actually typed in Stata (both your logit
specification and your margins
command, and do note the correct spelling).
From the verbal description, it sounds like you are
This model says that you can expect 1 in 4 people with a cancer size of 2 to have a seizure. The predictions are on on a scale of [0,1], so 1 in 4 is 0.25.
Here's an reproducible example demonstrating this calculation, where we will model probability of a low weight birth given quartile of mother's age:
. webuse lbw, clear
(Hosmer & Lemeshow data)
. xtile age_qrt = age, nq(4)
. table age_qrt, c(min age max age)
----------------------------------
4 |
quantiles |
of age | min(age) max(age)
----------+-----------------------
1 | 14 19
2 | 20 23
3 | 24 26
4 | 27 45
----------------------------------
. logit low i.age_qrt, nolog
Logistic regression Number of obs = 189
LR chi2(3) = 5.50
Prob > chi2 = 0.1383
Log likelihood = -114.58352 Pseudo R2 = 0.0235
------------------------------------------------------------------------------
low | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_qrt |
2 | .2876821 .4149967 0.69 0.488 -.5256964 1.101061
3 | .5389965 .45687 1.18 0.238 -.3564522 1.434445
4 | -.5382246 .4822682 -1.12 0.264 -1.483453 .4070036
|
_cons | -.8754687 .3073181 -2.85 0.004 -1.477801 -.2731362
------------------------------------------------------------------------------
. margins age_qrt
Adjusted predictions Number of obs = 189
Model VCE : OIM
Expression : Pr(low), predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_qrt |
1 | .2941176 .0638031 4.61 0.000 .1690659 .4191694
2 | .3571429 .0640301 5.58 0.000 .2316462 .4826396
3 | .4166667 .0821678 5.07 0.000 .2556208 .5777125
4 | .1956522 .0584905 3.35 0.001 .0810129 .3102915
------------------------------------------------------------------------------
. /* margins by by hand */
. forvalues v=1/4 {
2. replace age_qrt=`v'
3. predict double phat`v', pr
4. }
(138 real changes made)
(189 real changes made)
(189 real changes made)
(189 real changes made)
. sum phat*
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
phat1 | 189 .2941176 0 .2941176 .2941176
phat2 | 189 .3571429 0 .3571429 .3571429
phat3 | 189 .4166667 5.57e-17 .4166667 .4166667
phat4 | 189 .1956522 2.78e-17 .1956522 .1956522
Here the highest risk group of the third age quartile, though the differences are probably not significant. Stata is calculating
$$AM_k =sum_{i=1}^N left[ hat p(x=k) right].$$
What you are describing sounds more like marginal effects, which involve comparing how probabilities change as you alter cancer size. These can be calculated like this:
margins, dydx(age_qrt)
/* margins, dydx() by by hand */
replace age_qrt = 1
predict phat1
replace age_qrt = 3
predict phat3
gen finite_diff3vs1 = phat3 - phat1
sum phat3 phat1 finite_diff3vs1
The output of the first command is:
. margins, dydx(age_qrt)
Conditional marginal effects Number of obs = 189
Model VCE : OIM
Expression : Pr(low), predict()
dy/dx w.r.t. : 2.age_qrt 3.age_qrt 4.age_qrt
------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_qrt |
2 | .0630252 .0903919 0.70 0.486 -.1141396 .24019
3 | .122549 .1040306 1.18 0.239 -.0813473 .3264453
4 | -.0984655 .0865562 -1.14 0.255 -.2681125 .0711815
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
This says that the expected change in probability associated with going from the lowest to the second highest age quartile (from 1 to 3 of 4) is 0.122549, so giving birth to a low weight baby is becomes more somewhat likely. This is 12 percentage point increase, which is a 42% increase. The note explains that this is a finite difference, and not really a derivative:
$$AME_k =sum_{i=1}^N left[ hat p(x=k)-hat p(x=baseline) right],$$
where $hat p(.)$ is the predicted probability from the logit model.
Answered by dimitriy on November 12, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP