Data Science Asked by bazooka720 on December 22, 2020
For something we are working on, we were looking for a simple way to compare from review/feedback data against a question (for which there are multiple responses from multiple people), the following:
What are the common things (things defined as phrases/sentences) they are saying (Some way to quantify the commonality too if possible). The point is to identify what seems to be areas of agreement about their review
What are things that are not common (basically…what are those on-off sentences/phrases that have been told that are very uncommon)
Where is there disagreement (i.e. are there sentences/phrases where there is disagreement potentially between the responses)
The goal is to find a simple solution to this and not necessarily model driven (there is paucity of data). Also…it needs to be directional at this time…as the goal is to prove that this can work and can product reasonable results.
Any help advise?
Thanks much!
PS: We would need some ‘intelligence’ in identifying the commonality or vice-versa (i.e. different words meaning same within a phrase should be considered common).
Note: The questioner requests for the response to be simple. Although this approach below is not simple, it is an attempt to provide a perspective that could help.
Understanding the problem stated:
At a conceptual level, there are arguably 3 concepts in this question as given below.
Evaluating approaches:
There could be many potential solutions. Following steps are one such attempt.
Step 1 Deploy a cosine similarity algorithm to measure the similarity between responses. In order to bring it a step closer to semantic similarity, use WORDNET to build the features for computing cosine similarity. This will ensure that tokens such as "path" are treated closer to token "road".
Step 2 Group responses (for the same question) beyond a threshold cosine value (example: 0.75) as similar responses. This can be loosely considered as a set of different responses for a question.
Step 3 Train a model to identify agreement or disagreement between responses. This training can be based on a classification algorithm or at the least based on a bag of words approach(hard coded tokens such as "i dont agree", "it is incorrect" etc). This step is perhaps the least scientific one but the only pragmatic one that the author can think of.
Answered by SidharthMacherla on December 22, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP