Open Data Asked by philophilosophia on September 29, 2021
Background:
I’m teaching an intro course to stats, and this term, I have decided to use real-world public data sets to demonstrate the methods on, instead of synthetic data. I was surprised that I wouldn’t find basic data such as height/weight/IQ of men and women (which are famously well-approximated by Gaussian). I do find parameters (mean/variance of weight of Americans, for example), but I don’t want to synthesize a Gaussian based on parameters. Rather, I’m looking for actual data, so the students experience the noisy-ness of real data, and how approximations work. I have the same problem for finding non-Normal data, e.g., wealth distribution and other heavy-tailed ones. Parameters exist but I cannot find actual data sets.
TLDR:
For an introductory Stats course, I’m looking for publicly available data sets with medium-size sample sizes, i.e., $N=O(10^3)$ or $O(10^4)$. Preferably, with close-to-Gaussian distributions, but anything is useful.
You can find the best publicly available datasets on Kaggle with kernels/notebooks for references. This is the best place to find the relevant data for your teaching. Need to signup to download the datasets
Answered by Pluviophile on September 29, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP