Cross Validated Asked on November 22, 2020
There are two samples (sufficiently large and independent). One, size $n_1$ has the mean $m_1$ and the standard deviation $s_1$, the other, size $n_2$ with the mean $m_2$ and the standard deviation $s_2$. Is there a procedure to normalize the second sample so that it has the mean and standard deviation of the first?
Consider the following two samples, from R:
set.seed(2020)
x1 = rnorm(20, 100, 15)
m1 = mean(x1); m1; s1 = sd(x1); s1
[1] 98.46448
[1] 21.39371
x2 = rnorm(30, 50, 10)
m2 = mean(x2); m2; s2 = sd(x2); s2
[1] 52.77616
[1] 8.347496
Step 1: Standardize x2
z = (x2 - m2)/s2
mz = mean(z); mz; sz = sd(z); sz
[1] 1.494302e-16 # essentially 0
[1] 1
Step 2: Rescale z2
(called y2
) to match sample mean and SD of x1
.
y2 = s1*z + m1
mean(y2); sd(y2)
[1] 98.46448 ## compare 98.46448 above
[1] 21.39371 ## compare 21.39371
Stripchart (bottom to top) of original x1
and x2
and y2
(original
x2
rescaled to match sample mean and SD of x1
.
stripchart(list(x1, x2, y2), ylim = c(.7,3.3), pch="|",
group.names=c("x1","x2","y2"))
abline(v=mean(y2), col="green2") # means of `x1` and `y2`.
Notes: (1) If x1
and x2
are rounded to only a few places, and
then y2
is similarly rounded, then the mean and SD of y2
will
typically not match those of x1
exactly. Minor adjustments may help.
(2) I am aware that the procedure shown above can be 'collapsed' into one more complicated step, but I find the two-step method shown easier to remember.
(3) When OPs on this site give only means and variances (not the whole dataset) it is sometimes useful to use something like this to contrive a dataset to use in R that is very similar to OP's. By contrast to R, some procedures in other software (e.g, Minitab) will perform various tests based only on summary statistics.
Answered by BruceET on November 22, 2020
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP