TransWikia.com

How to compare between two datasets of lexical density?

Data Science Asked by Samir Ahmane on February 11, 2021

I have two dataset from two different texts representing lexical density as a proportion based on a corpus. Both datasets are represented in the images below. Now, let’s suppose I want to know which text has more uncommon vocabulary. How should I proceed? What statistics should I use? Should it be a t-students test or Wilcoxon signed-rank test? I’m lost on this one, and I don’t wanna apply inference blindly. I am using the python library wordfreq to get word frequencies data.

Dataset

enter image description here

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP