Cross Validated Asked on December 15, 2021
I have a normally distributed variable (var2) with a mean of 10 and sd of 3:
mean <- 10
sd <- 3
var2 <- rnorm(n = 1000, mean = mean, sd = sd)
I want to simulate a second variable when the correlation is known. For example
r = .83
The second variable (y) is known to be normally distributed with a mean of 10 and sd of 3. I did find one solution that I think was relevant that suggested using an independent normal distributed variable with the same variance (var1): Tool for generating correlated data sets
var1 <- rnorm(n = 1000, mean = mean, sd = sd)
y <- scale(var2) * r + scale(residuals(lm(var1 ~ var2))) * sqrt(1 - r * r)
y <- mean + (y - 0) * (sd/1) # Convert to mean and sd of original variable
cor(y,var2)
[,1]
[1,] 0.83
I then want to simulate a third variable (var3) where the correlation is known with the second variable (y).
r <- .91
var3 <- scale(y) * r + scale(residuals(lm(var1 ~ y))) * sqrt(1 - r * r)
var3 <- mean + (var3 - 0) * (sd/1) # Convert to mean and sd of original variable
cor(var3,y)
[,1]
[1,] 0.91
A practical example of this is test 1 (var2) with a predicted score on test 2 (y) and subsequent predicted score on test 3 (var3); I have a situation where I have known correlations between var2 and y, and between y and var3 and subsequently want to know the ultimate correlation between var2 (test 1) and var3 (test 2) based on this simulation.
cor(var3,var2)
My uncertainty is to whether I have completely misinterpreted or misapplied the intention of the methodology discussed in Tool for generating correlated data sets. Or perhaps there is a more convenient way to simulate the scores on test 3 that I am completely overlooking?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP