Cross Validated Asked by kabindra shahi on November 2, 2021
I have data in which the response variable (attitudes towards tourism) is scaled in nature ranging from -10 to +10 (calculated from the summation of scores of a few questions related to tourism). The independent variables are continuous, categorical (3-4 categories), and binary (Yes/NO) in nature. I can categorize the response (attitude) into Negative, Neutral, and Positive and run an ordinal logistic regression (not very sure even this is the correct way). But I have come across some literature relevant to my study where they have used multiple linear regression even when their independent variable was Categorical (Negative, Neutral and Positive) and some cases when the independent variable was scaled like mine. Can I use multiple linear regression in this case?? IF not, what about ordinal logistic regression?
First off, +1 to Robert's answer.
In particular, he makes a great point that you should not bin your data into "negative"/"neutral"/"positive" categories, because that just needlessly loses too much information. For instance, if you bin $[-10,-4]$ as "negative", $[-3,3]$ as "neutral" and $[4,10]$ as "positive", then this binning treats someone with a score of $-10$ exactly the same as someone with a score of $-4$ - but presumably the first respondent was quite a bit more negative than the second one. So we have lost information. Don't do it.
As Robert writes, with 21 ordered categories, you can likely enough use Ordinary Least Squares (OLS) rather than ordered logistic regression, because the difference between the two approaches will be very small for such a large number of categories. And in any case, this difference in models will be completely dominated by the fact that your measurements only imperfectly measure the underlying construct you are really interested in.
I would not put too much emphasis on the normality of residuals, whether assessed graphically or by formal tests. This is nice to have, but regression parameters are quite robust to departures from normality. I would rather run diagnostic plots of residuals against your predictors: if there is a pattern in such plots, it suggests some unmodeled nonlinearities. (Note that such diagnostics will bias your p-values, so if these are what you are interested in, then don't overdo it.)
If you are really interested in deciding between OLS and ordered logistic regression, and if you have enough data, then consider cross-validating both approaches and seeing which one yields lower out-of-bag mean squared errors. If they are within one standard error, then go with the simpler model, which here is the OLS one.
Answered by Stephan Kolassa on November 2, 2021
With a response variable on an ordinal scale of -10 to +10 this is a valid reason for treating it as numeric and running a standard multivariable regression. You will need to inspect residual plots in order to assess whether they are plausibly normally distributed if you are going to make certain inferences.
This will likely be much better than categorizing into 3 levels as this will result in a lot of information loss.
Answered by Robert Long on November 2, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP