Cross Validated Asked on November 21, 2021
The regression model is:
y = b0 +b1x1 + b2x2 + b3x3 + e
I want to test if b1 + b2 > 0
.
the R package car
has a function linearHypothesis
that can test if b1 + b2 = 0
by
linearHypothesis(lm_model, "b1 + b2 = 0")
but this is a two-sided test. I want a one-sided test.
linearHypothesis(lm_model,"b1 + b2 > 0")
can’t work, it seems that linearHypothesis can only do two-sided test. Are there some R packages can do this one-sided test?
multcomp
The multcomp
package offers the capability of testing linear equalities or inequalities of coefficients. Here, I'll use the same data generated by @conjugateprior to illustrate the package:
library(multcomp)
# Generate data and fit model
set.seed(1234)
d <- data.frame(X1 = rnorm(500), X2 = rnorm(500), X3 = rnorm(500))
d$Y <- 1 + d$X1 * -1 + d$X2 * 0.9 + d$X3*0.2 + rnorm(500)
mod0 <- lm(Y ~ X1 + X2 + X3, data = d)
# Specify the linear hypothesis
glht_mod <- glht(
model = mod0
, linfct = c("X1 + X2 >= 0")
)
# Inspect summary and confidence interval
summary(glht_mod)
confint(glht_mod)
Which produces the following output (truncated).
Linear Hypotheses:
Estimate Std. Error t value Pr(<t)
X1 + X2 >= 0 -0.07211 0.06284 -1.148 0.126
Linear Hypotheses:
Estimate lwr upr
X1 + X2 >= 0 -0.07211 -Inf 0.0314
We see that the p-value for the null hypothesis that $beta_{X1} + beta_{X2} geq 0$ is $0.126$. So the test provides little evidence against the null hypothesis.
The one-sided 95%-confidence interval is $(-infty, 0.0314]$ so the data is compatible at the 95%-level that the sum of the two coefficients is up to $0.03144$. The upper limit is larger than $0$ so the data indicate that $0$ is among the plausible values for the sum. This confidence limits is in good agreement with the confidence interval obtained by @conjugateprior using the bootstrap.
restriktor
The restriktor
package allows you to fit linear equality or inequality restrictions about parameters and effects. The packages has its own website with a tutorial and other explanations. Again, using the same data as above:
library(restriktor)
# Generate data
set.seed(1234)
d <- data.frame(X1 = rnorm(500), X2 = rnorm(500), X3 = rnorm(500))
d$Y <- 1 + d$X1 * -1 + d$X2 * 0.9 + d$X3*0.2 + rnorm(500)
# Fit the model
mod0 <- lm(Y ~ X1 + X2 + X3, data = d)
Now we'll have to specify the constraints, which we can do easily using text-based description (important: the constraint syntax has to be enclosed within single quotes). Then, refit the model using the restriktor
function:
myconstr <- '(X1 + X2) > 0'
constr_mod <- restriktor(
object = mod0
, constraints = myconstr
)
To test the hypothesis, let's use the iht
function:
iht(constr_mod)
This generates the following output (truncated):
Overview of all available hypothesis tests:
Global test: H0: all parameters are restricted to be equal (==)
vs. HA: at least one inequality restriction is strictly true (>)
Test statistic: 818.0010, p-value: <0.0001
Type A test: H0: all restrictions are equalities (==)
vs. HA: at least one inequality restriction is strictly true (>)
Test statistic: 0.0000, p-value: 1
Type B test: H0: all restrictions hold in the population
vs. HA: at least one restriction is violated
Test statistic: 1.3169, p-value: 0.1259
Type C test: H0: at least one restriction is false or active (==)
vs. HA: all restrictions are strictly true (>)
Test statistic: -1.1475, p-value: 0.8741
The relevant hypothesis in your case is B
, which features a p-value of $0.1259$ and thus offers little evidence against the null hypothesis that $beta_{X1} + beta_{X2} > 0$.
The package offers several different methods to calculate the standard errors of the coefficients, including the bootstrap or heteroscedasticity-consistent standard errors.
Answered by COOLSerdash on November 21, 2021
Doubtless there are packages to do it, but you could also try using the bootstrap, which will work for nearly arbitrary contrasts. Here's a simple implementation that gets a bootstrap sampling distribution for b2 + b1, from which the p value can be computed by asking what proportion of estimates are positive. To change the statistic(s) you're interested just change the second line of the stat
function and rerun.
We'll first set up some fake data
set.seed(1234)
d <- data.frame(X1 = rnorm(500), X2 = rnorm(500), X3 = rnorm(500))
d$Y <- 1 + d$X1 * -1 + d$X2 * 0.9 + d$X3*0.2 + rnorm(500)
noting that the 'true' population value is -0.1.
Now fit a regular (correctly specified, in this case) model and get a point estimate for b2 + b1.
mod0 <- lm(Y ~ X1 + X2 + X3, data = d)
cfs <- coef(mod0)
cfs['X2'] + cfs['X1'] # About -0.072
That's the statistic we're interested in getting a sampling distribution and thereby a p-value for, so we wrap up this process in a function:
stat <- function(dat, inds){
cfs <- coef(lm(Y ~ X1 + X2 + X3, data = dat[inds,]))
cfs['X2'] + cfs['X1'] # return the statistic in this sample
}
where inds
are the indices of rows in dat
in a single bootstrap sample. The
boot
function will provide those, so we just have to be sure to use them
in the call to lm
.
Now to get a lot of bootstrap samples of b2 + b1:
library(boot)
cores <- parallel::detectCores() # because we are impatient
res <- boot(d, stat, R = 10000, parallel = "multicore", ncpus = cores)
(If the parallelization has problems then just remove the parallel
and ncpus
arguments to boot
and be prepared to wait a bit longer). For more
precision, increase R
, the number of (re)samples.
Anyway, here's the whole bootstrap sampling distribution
hist(res$t, breaks = 100, xlab = "Estimated b2 + b1",
main = "Bootstrap Sampling Distribution of b2 + b1")
Now we might ask how often b2 + b1 is greater than 0 (a.k.a. the bootstrap one-sided p-value)
sum(as.numeric(res$t > 0)) / length(res$t) # about 15% of the time
Alternatively, we might ask whether 0 is outside the lower 95% of estimates
quantile(as.numeric(res$t), prob = 0.95) # about .038 which is greater than 0 obvs.
or we could just use the the bootstrap samples to get a standard error and do things more parametrically, but in any case it looks like we should 'fail to reject' that b2 + b1 > 0.
This answer has been a bit fast and loose on the inferential side of things, e.g. glossing over some bootstrap decisions, with the aim of getting the machinery in place. Hereafter, and as noted above, you can just keep adjusting stat
to get bootstrap sampling distributions for increasingly weird and wonderful quantities. Though if you do that you might want to read up on the limits, assumptions, and strategies of bootstrapping.
Answered by conjugateprior on November 21, 2021
You could bootstrap this. Example:
fit <- lm(Sepal.Length ~ Petal.Length + Petal.Width, data = iris)
sum(coef(fit)[c("Petal.Length", "Petal.Width")])
library(boot)
bootfun <- function(DF, ind, origfit) {
#bootstrap residuals:
DF$Sepal.Length <- fitted(origfit) + residuals(origfit)[ind]
fit <- lm(Sepal.Length ~ Petal.Length + Petal.Width, data = DF)
sum(coef(fit)[c("Petal.Length", "Petal.Width")])
}
set.seed(42)
myboot <- boot(iris, bootfun, 9999, origfit = fit)
mean(myboot$t <= 0)
#[1] 0.00970097
#--> reject H0: alpha_1 + alpha_2 <= 0
Answered by Roland on November 21, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP