Data Science Asked by anishtain4 on February 26, 2021
I have two sets of samples (A, B) with a relatively high number (~10,000) and I want to see if a factor has affected sample B or not. Naturally, I should use A/B testing. The problem is, the distributions are not normal and I’m interested in the maximum change, not the mean values! So if all you know is how CLT is gonna make everything Gaussian, this is a good point to stop and move on to the next question.
The data are distances, so there’s a minimum of 0, but there’s no max and no guarantee what the distribution is going to look like. As an example, the histograms look like this:
My gut feeling tells me that the maximum of orange sample is just randomly higher than the blue one, but gut feelings are usually wrong. So I want to have some method of testing. I would appreciate any input.
PS: Welch’s t-test tells me that with 100.000% confidence, these two distributions are different, but are they?
Welch's t-test assumes normal distribution. I'd assume your sample size is big enough to see that these two distributions are different, based on mean, variance and range differences
Answered by ripintheblue on February 26, 2021
Yeah looks like it. perhaps your data is dependent on first random variables, which in turn effect overall distribution.
Answered by ripintheblue on February 26, 2021
One option is a permutation test. A permutation does not make any assumptions about the distribution of the data and allows for testing maximum change.
For a permutation test, you randomly assign data points to labels and then calculate the maximum change under the null hypothesis. Repeat until you are confident the observed differences are or are not likely to happen by chance.
Answered by Brian Spiering on February 26, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP