Data Science Asked on March 5, 2021
I am currently studying the Elements of Statistical Learning book. The following equation is in page 120.
It calculates the Hessian matrix for the log-likelihood function as follows
begin{equation}
dfrac{partial^2 ell(beta)}{partialbetapartialbeta^T} = -sum_{i=1}^{N}{x_ix_i^Tp(x_i;beta)(1-p(x_i;beta))}
end{equation}
But is the following calculation it is only calculating $dfrac{partial^2ell(beta)}{partialbeta_i^2}$ terms. But Hessian matrix should also contain $dfrac{partial^2ell(beta)}{partialbeta_ipartialbeta_j}$ where $ineq j$.
Please explain the reason for missing out these terms.
Beta is a vector of parameters, therefore:
$ frac{delta l(beta)}{deltabeta}= [frac{delta l(beta)}{deltabeta_1}quadfrac{delta l(beta)}{deltabeta_2}quadfrac{delta l(beta)}{deltabeta_3}quad...quadfrac{delta l(beta)}{deltabeta_n}]$ and so
$ frac{delta(frac{delta l(beta)}{deltabeta})}{deltabeta^{T}}= begin{bmatrix} frac{delta l^2(beta)}{deltabeta_1^2} & frac{delta l^2(beta)}{deltabeta_1deltabeta_2} & ... & frac{delta l^2(beta)}{deltabeta_1deltabeta_n} frac{delta l^2(beta)}{deltabeta_2deltabeta_1} & frac{delta l^2(beta)}{deltabeta_2^2} & ... & frac{delta l^2(beta)}{deltabeta_2deltabeta_n} vdots & vdots & ddots & vdots frac{delta l^2(beta)}{deltabeta_ndeltabeta_1} & frac{delta l^2(beta)}{deltabeta_ndeltabeta_2} & ... & frac{delta l^2(beta)}{deltabeta_n^2} end{bmatrix}$, which is your Hessian.
The term on the right side of your equation is also a matrix, because there is a multiplication of vectors in it: $x_i cdot x_i^T$, which gives a $n times n$ matrix.
Answered by Michał Kardach on March 5, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP