Data Science Asked by horcle_buzz on September 27, 2021
I have a corpus of manually annotated (aka "gold standard) documents and a collection of NLP systems annotations on the text from the corpus. I want to do a bootstrap sampling of the system and gold standard to approximate a mean and standard error for various measures so that I can do a series of hypotheses tests using possibly ANOVA.
The issue is how do I do the sampling. I have 40 documents in the corpus with ~44K manual annotation in the gold standard. I was thinking of using each document as a sampling unit, and taking 60% of documents for each sample (or 24 documents per sample). However, the issue is that each manually annotated documents does not have the same number of annotations, so that violates using same sample size for each sample.
Any suggestions on how to achieve this bootstrap?
It simply depends what you count as your object of interest: from your description the unit can be either document or annotation. Your method describes using the document as unit, it's fine as long as the tests you plan to do are compatible with this.
Another option is to use the annotation as unit: in this case you would pick 60% of the 44k annotations every time, so you would have a mix of annotations from multiple documents. Depending what you test exactly, this might be an issue, in particular I don't see how you would count False Negative cases in this way.
Since you have text documents of varying size (I assume), you could also consider different options: sentence, paragraph, block of N sentences, etc.
Answered by Erwan on September 27, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP