Cross Validated Asked on November 20, 2021
Suppose I have a logistic regression model $Y_i=mathbf{1}(X_ibeta>epsilon_i)$ to estimate, where the distribution of $epsilon_i$ is known, $X_i$ follows distribution $F_{theta}$ with an unknown scalar parameter $theta$. Suppose I only have 40 observations: ${Y_i,X_i}_{i=1}^{40}$. I’m wondering if there are any formal studies on the properties of the following estimator:
Step1. I estimate $beta$ and $theta$ with maximum likelihood and get : $widehat{beta},widehat{theta}$.
Step2. I simulate 160 new data points ${Y^*_i,X^*_i}_{i=1}^{160}$ from $Y_i=mathbf{1}(X_iwidehat{beta}>epsilon_i)$ and $F_{widehat{theta}}$.
Step3. I reestimate $beta$ and $theta$ using the 200 observations ${Y_i,X_i}_{i=1}^{40}cup {Y^*_i,X^*_i}_{i=1}^{160}$, and obtain new estimate $widetilde{beta},widetilde{theta}$.
Intuitively, this procedure seems consistent. In finite samples, it might have smaller variance(because we used more data), but larger bias(because we are not generating data from the true parameter value).
However, I would like to see more rigorous theoretical justification for using $widetilde{beta},widetilde{theta}$. My questions are:
1.Suppose the simulation sample size is $B$ and the original sample size is $n$, how to formally prove that $widetilde{beta},widetilde{theta}$ is consistent in the sense that it converges in probability to $beta,theta$ as $n$ (or $n$ together with $B$) goes to infinity?
2. Is there any criterion (such as MSE) under which $widetilde{beta},widetilde{theta}$ is better than $widehat{beta},widehat{theta}$?
Thanks!
The "procedure" outlined below is fully analogous to the one you suggest. I have chosen a simpler estimation procedure, with only one parameter, to make the computations easier.
Real experiment, actual data. An urn contains 1000 red balls and 1100 green balls. The true proportion of red balls in the urn is $theta = 10/21 = 0.4761905.$
Sampling with replacement $n = 40$ times from the urn, I see 15 red balls in 40, so my estimate of $theta$ is $hat theta = 15/40 = 0.375.$ (I'm asking you to pretend I have a real urn from which I drew actual balls.)
urn = c(rep(1,1000),rep(0,1100))
x = sample(urn, 40, rep=T)
sum(x)
[1] 15
One kind of 95% confidence interval for $theta$ based on the 40 observations is the Jeffreys interval $(0.238, 0.529).$ It does happen to include the true $theta =0.4761905.$ [But in an actual experiment, I wouldn't know that.]
qbeta(c(.025,.975), 15.5, 25.5)
[1] 0.2379065 0.5294649
Simulated data. Rightly realizing that sampling from the urn is like observing independent Bernoulli trials, I (foolishly) decide to 'augment' my sample with 160 simulated Bernoulli trials having "red-ball" probability $hattheta = 0.375.$ [All simulations and computations from R.]
set.seed(2020)
r.a = sum(rbinom(160, 1, 0.375)); r.a
[1] 55
So now, I pretend to have observed $15 + 55 = 70$ red balls in $200.$ My re-estimated value of $theta$ is the 'improved' $tildetheta = 70/200 = 0.35.$ The Jeffreys 95% CI based on this 'improved' estimate is $(0.286, 0.418).$ I am delighted by my new interval because, based on 200 fake 'observations' it is shorter than my original CI. [Of course, in an actual experiment, I wouldn't know that it no longer includes the true value of $theta.]$
qbeta(c(.025, .975), 70.5, 130.5)
[1] 0.2864262 0.4178799
However, drawing from an urn requires having an urn with balls in it and messing around with drawing and counting and replacing. And simulation is quick and easy. So, elated by my (dilusional) 'success' with fake data, I decide to simulate another 1000 fake draws. Now based on my new estimate $tildetheta = 70/200.$
The result of this extended simulation is the updated estimate $tilde{tildetheta} = 0.3508$ and the even shorter CI $(0.3242, 0.3782),$ which by now is based mainly on my pseudo-random number generator and has very little to do with an actual urn and balls.
set.seed(1066)
r.aa = sum(rbinom(1000, 1, 70/200)); r.aa
[1] 351
(70 + 351)/(200+1000)
[1] 0.3508333
qbeta(c(.025,.975), 70+351+.5, 1200-70-351+.5)
[1] 0.3242170 0.3781682
Note: I have used Jeffreys CIs here because they have very good coverage properties and are very easy to compute using R. Although Jeffreys intervals are based on a Bayesian argument, they have excellent frequentist properties and are not used in a Bayesian context here. Their endpoints are often similar to those of Agresti CIs: the Agresti version the final CI above is $(0.3246, 0.3786).$
Answered by BruceET on November 20, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP