Cross Validated Asked by WetlabStudent on January 19, 2021
I’d like to predict the probability of success as an unknown function of predictor variables. For example, consider the following fake data
#simulate fake data
n=100
x1 = runif(n)/2
x2 = runif(n)/2
ptrue = x1^1.4 + x2
trials = rpois(n,100)
successes = rbinom(n, prob = ptrue, size = trials)
data = data.frame(successes, trials, x1,x2)
I would like to fit a GAM with a binomial link (as the functional form of the predictors is unknown and likely quite nonlinear), but I can’t figure out how to incorporate the known number of trials. Based on my reading of GAMs one might be able to do something like this in R
mod <- gam(successes/trials ~ x1 + x2, data = data, family = binomial(link = "logit"))
But that doesn’t factor in the number of trials into the fitting. I’ve tried to google examples of GAMs in R like this, but I haven’t had much luck.
This is documented in ?glm
. One way to specify a binomial GLM is to pass it a matrix of successes and failures:
gam(cbind(successes, trials - successes) ~ s(x1) + s(x2), data = data,
method = "REML", family = binomial("logit"))
You can also proceed as you did but provide the number of trials via the weights
arguments as in
gam(successes/trials ~ s(x1) + s(x2), data = data,
method = "REML", family = binomial("logit"),
weights = trials)
And there is also the option of creating a factor variable indicating success or failure (be sure to code the first level as the failures).
For more see the Details section of ?glm
.
If you want to fit a GAM, you want smooth functions of the covariates; your model just included parametric terms. Be sure to use the s()
or te()
functions to indicate which covariates should be represented by penalised spline terms.
Correct answer by Gavin Simpson on January 19, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP