Cross Validated Asked by BiGYaN on November 2, 2021
There is a metric which has a natural cyclic pattern. We want to measure the effect on this metric through a A/B test.
Examples:
A underlying cyclic pattern on the metric violates the normal assumption and results in high SD when the samples are assumed to be i.i.d. This in turn leads to extremely large sample size for measuring small lifts. A paired t-test alleviates this somewhat. But all paired t-test examples seem to be centered around "multiple measurement of the same subject" idea.
My understanding is that the independent sample t-test is wrong simply because the samples are not i.i.d. (mean shifts WRT time) — this leaves out most tests; even permutation test which does not assume a known distribution. Paired t-test seems like a plausible idea, but so far have not encountered a similar recommendation.
Here’s a synthetic example in python (run code):
import numpy as np
from scipy import stats
x_data = np.linspace(0,1,101)
num_period = 3
treatment1 = np.sin(num_period*2*np.pi*x_data) + 1 # cyclic data
treatment2 = treatment1 + np.random.normal(0.05,0.05,len(treatment1)) # T1 + N(0.05,0.05)
stats.ttest_ind(treatment1,treatment2)
# Ttest_indResult(statistic=-0.5252661250185608, pvalue=0.5999800249755889)
stats.ttest_rel(treatment1,treatment2)
# Ttest_relResult(statistic=-10.13042526535737, pvalue=5.12638080641741e-17)
```
One approach might be to used a mixed model with an indicator for day + a random effect for truck ID. This way, you can account for any truck level variation and assess the effect of the treatment via an indicator. This sounds feasible especially if you have lots of data to make up to the degrees of freedom being used by the indicators.
Here is an example of how this might be performed. I have 10 trucks, each truck's sales are measured over the course of a week. We assume that each truck has some differences due to the driver (or something, maybe one truck is newer and is more attractive than older ones, who knows). The hypothesized intervention increases sales by 2 units. Here is a plot of the data where each line is for a specific truck with colors indicating treatment group.
A linear mixed effect model for this data may look like
model = lmer(sales ~ factor(ndays) + trt + (1|truck), data = design )
The test you case about the the test for the trt
variable, assuming you hypothesize additive effects (sales increase by the same amount on each day, not just on weekends). Here is a plot of the model for each truck with the data plotted over the model fit with an opacity.
Finally, I'm sure there is a way to do this without mixed effect models. In my own opinion, regression is a natural way to think of these sorts of comparisons, but a cleverly computed t-test is likely capable of accomplishing the same thing. Think of this approach as the most straight forward (in so far as it directly considers the generative processes) but perhaps not the easiest or even best.
Answered by Demetri Pananos on November 2, 2021
Pairing of some kind seems crucial because you want to compare Truck A on Wednesdays with Truck B on Wednesdays. However, as you say, a cyclic sales pattern may tend to be non-normal (but see Note at end). In order to have pairing without concern over normality, you might use a paired Wilcoxon test. It seems especially appropriate because the weekly distributional pattern will be similar for the two trucks.
Fake data for just one week and paired Wilcoxon test, in R:
x1 = c(120, 75, 80, 70, 85, 82, 130)
x2 = c(130, 89, 91, 79, 93, 99, 142) # consistently higher
wilcox.test(x1,x2, pair=T)
Wilcoxon signed rank test
data: x1 and x2
V = 0, p-value = 0.01563
alternative hypothesis:
true location shift is not equal to 0
The null hypothesis that the two trucks have similar sales is rejected with P-value 0.016 < 0.05, even though there is a weekly trend of higher sales on Sun and Sat.
A two-sample Wilcoxon test without pairing does not detect that the second truck has consistently higher sales. [There is a warning message about ties (not shown here), so the P-value may not be exactly correct.]
wilcox.test(x1,x2)$p.val
[1] 0.1792339
Note: In judging normality for a paired t test, it is the paired differences that should be tested for normality. They may not show as aggressive a weekly pattern as do sales by individual trucks.
Answered by BruceET on November 2, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP