Cross Validated Asked by Jonas8 on January 9, 2021
My data consists of 1 dependent variable and 4 independent variables, all the IV’s are continuous.
Some of my data is presented below to give an idea of how it is structured. The dependent variable is divided into two columns and represents the number of complaints filed against a school during 1 year (filed by the parents). Notice that the count of complaints is divided into two columns, one if the child whoose parents filed the complaint is a boy (M) and one column if the child is a girl (F).
The data consists of 170 schools and for 29 of these schools the count for both boys and girls is zero, look at row 2 as an example. For 11 schools the count for either boys or girls is zero, while the other is >0, look at row 3 as an example.
School | Count, M | Count, F | X1 | X2 | X3 | X4 |
---|---|---|---|---|---|---|
A | 6 | 3 | 43 | 31 | 201 | 44 |
B | 0 | 0 | 35 | 33 | 176 | 34 |
C | 0 | 7 | 33 | 44 | 163 | 42 |
D | 0 | 0 | 38 | 41 | 155 | 33 |
E | 4 | 5 | 41 | 39 | 161 | 38 |
I want to model the data in such a way that enables me to:
My first thought was to try and model this using a Poisson/Negative binomial model by summarizing all the complaints for every school which would give me a total count per school (complaints concerning boys + girls). The data seems to be overdispersed so I ended up with a negative binomial model. I also experimented with a zero-inflated model, but the number of zeros seems to be too low for it to be needed. I did this in R using the packages countreg
and MASS
.
However, I don’t know how to handle the part about being able to compare the effect X1-X4 has for boys and girls.
Therefore, my question is: How can I model my data to be able to answer my second research question (is there a difference in effect from X1-X4 on the number of complaints that concerns boys vs those that concern girls)?
One way suggested by a colleague of mine was to model the data using two separate regression models, one for the complaints concerning boys and one for the complaints concerning girls. I don’t believe this is the way to do it though since I would need a p-value to be able to compare the effects X1-X4 has for boys and girls, right? Running two separate models would not produce this.
I thought of modeling the data by using some multivariate method (something like: (Count M, Count F) ~ X1 + X2 + X3 + X4
, maybe using a multivariate poisson/negative binomial regression (if that exists? hence the title)?
Ignoring the fact that my dependent variable is a count and not normal, using a multivariate regression model, would that make it possible to compare the effect between boys and girls? I have very little experience using multivariate analysis though and I have no idea if it is possible in this situation or even a good idea, so I’m stuck at the moment.
Any help or guidance on how to model my data using R would be very appreciated!
You need to do the following
1 - reshape your data so as to have two rows for each school one with the count for boys and one for girls and a new variable for sex of child. All other rows remain the same.
2 - fit a mixed effect Poisson or negative binomial regression with a random intercept for school. In that model the interaction between sex and Xi tell you what I believe you want.
As an additional comment if you have different numbers of boys and girls within a school and you have that number it might be appropriate to add those numbers as offsets although if they are similar this will not matter.
Answered by mdewey on January 9, 2021
It looks like you need to add an interaction term to asses the different effect of IVs on DV based on the gender factor, something like Count ~ (X1 + X2 + X3 + X4)*GENDER
Answered by N7N9 on January 9, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP