difference btw data distribution and frquency distribution

Question

I have a dataset with 'n' features and corresponding labels(binary in nature). How can I calculate the data distribution and frequency distribution of the same? What is the difference btw the two?

Pieter21 · Answer

If you don't know the definition, how could you calculate anyway?

I looked at a few reliable definitions,

https://www.spss-tutorials.com/frequency-distribution-what-is-it/
https://www.statisticshowto.datasciencecentral.com/data-distribution/
http://makemeanalyst.com/observational-studies-and-experiments/population-distribution-sample-distribution-and-sampling-distribution/

The differences are subtle, and sometimes depend on who you ask.

What I conclude is that the frequency (or sample) distribution is statistics on an actual sample, counted per bin, maybe percentages added to the statistics.

The (population) data distribution is the distribution that you'd expect from the whole population.

For fair coin tosses, the data distribution would be 50/50, though a sample distribution of 10 could give 6/4.

My advice, either use the textbook definitions, or present the statistics that you see fit
for your analysis. Repeat the definition if necessary.

If you have a large enough random sample the frequency distribution becomes an estimate for the data distribution anyway (but sometimes you have to prove this to show your sample is random).

When you have $n$ features, you repeat this for all features.
E.g. when you have people's 'gender', 'married', 'smokes', 'employed', features, you have to repeat for all these features.

difference btw data distribution and frquency distribution

One Answer

Add your own answers!

Ask a Question