# Calculating bias of ML estimate of AR(1) coefficient

Cross Validated Asked by Andrew Kirk on December 9, 2020

I am trying to develop adjustment factors for maximum-likelihood estimates of the auto-regression coefficient in an AR(1) process. By simulation I have discovered that the estimates are positively biased but consistent. I would like to produce a table of bias factors by number of observations N in the time series and population coefficient, which I would then try to convert into a table of correction factors by N and estimate value.

Looking at related posts on here I see confirmation that the estimate is biased and consistent, and there is a formula for the asymptotic variance. there is also a thread about the bias from using MA(1) to approximate AR(1). But I could not find any formula for the bias of a MLE estimate of teh AR(1) coefficient.

If no formula is available, I can use simulation to produce my table. But I would rather do it analytically if possible. The estimator is:

$$hattheta = frac{sum_t y_t y_{t-1}}{sum_t y_{t-1}{}^2}$$

where $$theta$$ is the AR coefficient and $$y_1,…,y_N$$ are the observed time series values. What I am looking for is how to calculate $$E[hattheta]$$ in terms of $$theta$$ and $$N$$. For the time series I am working with, the instantaneous volatility $$sigma$$ is always 1.

Any suggestions on how I could develop a formula for $$E[hattheta]$$, $$E[hattheta] – theta$$ or $$E[hattheta]/theta$$ would be gratefully received. Ditto if you can advise that there is no known solution, in which case I will just proceed to use simulation.

Thank you

For this type of estimate, it is quite difficult to get a useful expression for the distribution of the estimator, and hence, for the expected value. The behaviour of the estimator can be related back to a normal quadratic form, but this gives quite a complicated expression.

Distribution of the times series vector: Suppose we use the centered $$text{AR}(1)$$ model $$y_t = phi y_{t-1} + varepsilon_t$$ with homoscedastic normal error terms and $$|phi|<1$$ for stationarity. We can write the distribution of the observable time series vector $$mathbb{y} = (y_1,...,y_n)$$ as:

$$mathbf{y} sim text{N}( mathbf{0},mathbf{Sigma}(phi)) quad quad quad mathbf{Sigma}(phi) = frac{sigma^2}{1-phi^2} begin{bmatrix} 1 & phi & phi^2 & cdots & phi^{n-2} & phi^{n-1} \ phi & 1 & phi & cdots & phi^{n-3} & phi^{n-2} \ phi^2 & phi & 1 & cdots & phi^{n-4} & phi^{n-3} \ vdots & vdots & vdots & ddots & vdots \ phi^{n-2} & phi^{n-3} & phi^{n-4} & cdots & 1 & phi \ phi^{n-1} & phi^{n-2} & phi^{n-3} & cdots & phi & 1 \ end{bmatrix}.$$

Taking the principal square root of the variance matrix gives the alternative form:

$$mathbf{y} = mathbf{Delta}(phi) mathbf{z} quad quad quad mathbf{z} sim text{N}( mathbf{0},boldsymbol{I}) quad quad quad mathbf{Delta}(phi) = mathbf{Sigma}(phi)^{1/2}.$$

(The matrix $$mathbf{Delta}(phi)$$ can easily be obtained using the Eigendecomposition of the variance matrix. The variance matrix and its square root are symmetric circulant matrices, with fixed eigenvector matrix equal to the DFT matrix and eigenvalues that are functions of $$phi$$.)

Distribution of the OLS estimator: For this model (clarifying your use of subscripts in the sums) the OLS estimator is:

$$hat{theta} = frac{sum_{t=2}^n y_t y_{t-1}}{sum_{t=2}^n y_t^2}.$$

(Note that the OLS estimator for this model is not equal to the MLE estimator, since it effectively ignores a logarithmic term involving the variance matrix that is in the likelihood function.) To find the distribution of the OLS estimator, define the $$n times n$$ matrix:

$$mathbf{M}(r) equiv begin{bmatrix} 0 & 0 & 0 & cdots & 0 & 0 \ -1 & r & 0 & cdots & 0 & 0 \ 0 & -1 & r & cdots & 0 & 0 \ vdots & vdots & vdots & ddots & vdots & vdots \ 0 & 0 & 0 & cdots & r & 0 \ 0 & 0 & 0 & cdots & -1 & r \ end{bmatrix}.$$

It can be shown that the event $$hat{theta} leqslant r$$ is equivalent to the event $$mathbb{y}^text{T} M(r) mathbb{y} geqslant 0$$, so we have:

begin{equation} begin{aligned} F_hat{theta}(r) equiv mathbb{P}(hat{theta} leqslant r) &= mathbb{P}(mathbb{y}^text{T} mathbf{M}(r) mathbb{y} geqslant 0) \[6pt] &= mathbb{P}((mathbf{Delta}(phi) mathbf{z})^text{T} mathbf{M}(r) (mathbf{Delta}(phi) mathbf{z}) geqslant 0) \[6pt] &= mathbb{P}(mathbf{z}^text{T} mathbf{Delta}(phi) mathbf{M}(r) mathbf{Delta}(phi) mathbf{z} geqslant 0) \[6pt] &= mathbb{P}(mathbf{z}^text{T} mathbf{H}(phi, r) mathbf{z} geqslant 0), \[6pt] end{aligned} end{equation}

where $$mathbf{H}(phi, r) equiv mathbf{Delta}(phi) mathbf{M}(r) mathbf{Delta}(phi)$$. Hence, the OLS estimator has the density function:

$$f_hat{theta}(r) = P_2(phi,r) quad quad quad P(phi,r) equiv mathbb{P}(mathbf{z}^text{T} mathbf{H}(phi, r) mathbf{z} geqslant 0).$$

Thus, the expected values is $$mathbb{E}(hat{theta}) = int limits r P_2(phi,r) dr$$.

Answered by Ben on December 9, 2020