Cross Validated Asked by Matthias on January 17, 2021
I am currently evaluating the results of a 72 question survey with response levels from “strongly disagree” to “strongly agree”. I would like to cluster the questions by response patterns using R (I’m using “clara”).
Here’s the rub: There is also a response option “N/A”, because some questions are not relevant for some of the respondents, so they couldn’t have an opinion of whether to agree or disagree with the premise of the question.
Currently, I have coded the agreement levels from -2 to 2, and “N/A” got a -3 just to see if everything works. It does, so this is not a coding question.
My question is: Do you know of a clever distance function that I could use in this situation to calculate more meaningful clusters? The goal would be to compare only the responses of those for whom the question is relevant. I don’t think I can just drop the “N/A” because that would give me points of different dimensions, so both Euclidean and taxicab metric would not be happy.
EDIT 1: One possibility would be to apply a chi-squaresque metric that only compares the distributions of the desired responses, but that strikes me as too crude.
EDIT 2: Another possibility would be to adapt the Euclidean or taxi-cab metric with appropriate weights so that “proper” responses would be given higher consideration.
PS: No, I cannot eliminate the N/As because I need to be able to calculate a “distance” between the response patterns of any two questions, so I can find out which groups of questions tended to be answered similarly throughout.
The following measure brought good results heuristically. I have not checked into its mathematical properties:
Let $x_1, x_2$ be two response vectors in $mathbb{R}^N$, where $N$ is the number of responses. Let $n$ be the number of components where both $x_1$ and $x_2$ are not equal to "N/A". Let $k$ be the number of such components with absolute difference at most 1 (say).
Then define the measure of dissimilarity between $x_1$ and $x_2$ as $frac{k}{n}$, so $1-frac{k}{n}$ appears to give a reasonably good "metric." (As I said, I haven't checked in how far this violates the axioms for a metric.)
Answered by Matthias on January 17, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP