Cross Validated Asked by user3136 on January 12, 2021
I am looking for some suggestions about assessing the representativeness of a particular dataset I am analyzing.
In this dataset I am looking at the relationship between two variables (e.g., X and Y) in a population that is split into five distinct blocks. The main problem is that the data is based upon reports from the public, so some blocks have much more data than others.
The goal is to assess whether the relationship between X and Y differs between the blocks, but also to determine how reliable such estimates are given that we do not have a truly random sample of the overall population.
Any suggestions appreciated.
Thanks
In survey sampling for commercial and government studies the orthodox approach is as compare the characteristics of the sample with those of the population. For example, comparing the % female, % under 24, etc. The closer the correspondence between the sample and known data for the entire population, the more confidence one can have in the sample. Similarly, the greater the difference between the sample statistics and known population parameters, the greater the uncertainty.
Typically, when performing this approach researchers weight the data to remove any obvious biases.
This approach has been used to justify the moving of most commercial research from phone samples to online samples over the past 15 years.
Of course, while this approach is the orthodoxy it has no real support in the academic literature as the theoretical rigor of the approach can best be characterized as: "looks like a duck, walks like a duck, I'm going to call it a duck". Nevertheless, the approach is the orthodox approach due to the absence of any other alternatives.
Answered by Tim on January 12, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP