TransWikia.com

NLP - Simple approach to identify commonalities in text comments between people

Data Science Asked by bazooka720 on December 22, 2020

For something we are working on, we were looking for a simple way to compare from review/feedback data against a question (for which there are multiple responses from multiple people), the following:

  1. What are the common things (things defined as phrases/sentences) they are saying (Some way to quantify the commonality too if possible). The point is to identify what seems to be areas of agreement about their review

  2. What are things that are not common (basically…what are those on-off sentences/phrases that have been told that are very uncommon)

  3. Where is there disagreement (i.e. are there sentences/phrases where there is disagreement potentially between the responses)

The goal is to find a simple solution to this and not necessarily model driven (there is paucity of data). Also…it needs to be directional at this time…as the goal is to prove that this can work and can product reasonable results.

Any help advise?

Thanks much!

PS: We would need some ‘intelligence’ in identifying the commonality or vice-versa (i.e. different words meaning same within a phrase should be considered common).

One Answer

Note: The questioner requests for the response to be simple. Although this approach below is not simple, it is an attempt to provide a perspective that could help.


Understanding the problem stated:
At a conceptual level, there are arguably 3 concepts in this question as given below.

  1. Semantic similarity i.e. how similar the responses (in meaning) are to one another. In a way, this similarity could be loosely inferred as similar proposition.
  2. Syntactic similarity i.e which aspects (tokens or entities or chunks) overlap between the responses.
  3. Text classification i.e. "is one feedback in agreement with another or neutral or against?".

Evaluating approaches:
There could be many potential solutions. Following steps are one such attempt.

  1. Step 1 Deploy a cosine similarity algorithm to measure the similarity between responses. In order to bring it a step closer to semantic similarity, use WORDNET to build the features for computing cosine similarity. This will ensure that tokens such as "path" are treated closer to token "road".

  2. Step 2 Group responses (for the same question) beyond a threshold cosine value (example: 0.75) as similar responses. This can be loosely considered as a set of different responses for a question.

  3. Step 3 Train a model to identify agreement or disagreement between responses. This training can be based on a classification algorithm or at the least based on a bag of words approach(hard coded tokens such as "i dont agree", "it is incorrect" etc). This step is perhaps the least scientific one but the only pragmatic one that the author can think of.

Answered by SidharthMacherla on December 22, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP