Mathematics Asked by xedg on December 3, 2021
I am trying to learn how to replicate the matrix calculus done in the following paper: https://arxiv.org/pdf/1811.11433.pdf. To learn how to do this, I a using the following book I found (https://www.mobt3ath.com/uplode/book/book-33765.pdf), by Karim Abadir and Jan Magnus.
I attempted to start by find the differential of function H given below. However, it does not look like I am on the right track. Can someone tell me if my calculations below are correct so far? Or at least if I am using the correct book to be able to understand the paper I listed? I noticed that the book uses the ‘vec’ operator to treat the Hessian of a matrix function as a matrix while the paper uses an order 4 tensor, so I am not sure if I am using the right approach. Thanks for the help.
My work so far:
Let $H(B)=logdet BCB^T$ where $B$ and $C$ are square matrices of dimension $n$ and $C$ is symmetric. Let $F(B)=BCB^T$ and $G(R)=logdet R$ so that $H(B)=G(F(B))$.
begin{align*}
dF &= d(B)CB^T + BCd(B^T) hspace{0.4cm} dG(R) = Tr[R^{-1} dR] \
\
dH &= Tr[(BCB^T)^{-1} (d(B)CB^T + BCd(B^T))] textbf{ Take transpose}\
&= Tr[(BCd(B)^T+d(B)CB^T)(BCB^T)^{-1}] \
&=Tr[BCd(B)^T(BCB^T)^{-1}] + Tr[(d(B)CB^T(BCB^T)^{-1}] \
&=Tr[BCd(B)^T(B^T)^{-1}C^{-1}B^{-1}] + Tr[(d(B)CB^T(B^T)^{-1}C^{-1}B^{-1}] textbf{ Use cyclic property}\
&= Tr[(B^T)^{-1} d(B)^T] + Tr[B^{-1} d(B)] = 2* Tr[B^{-1}d(B)]
end{align*}
The corresponding total derivative is then $DH=2*(vec (B^{-1}))^T$ by the book’s notation. Then I assume I would just ‘unvectorize’ this to get the derivative in the paper’s notation? Is this a good start to calculating the gradient of the loss function in the paper I listed. Thanks.
First, calculate the gradient for the full matrix. $$eqalign{ X &= BCB^T = X^T \ phi &= logdet X \ dphi &= X^{-T}:dX \ &= X^{-1}:2operatorname{sym}(dB,CB^T) \ &= 2X^{-1}BC:dB \ frac{partialphi}{partial B} &= 2X^{-1}BC \ }$$ Repeat the calculation for the diagonalized matrix. $$eqalign{ Y &= (Iodot X) = Y^T \ psi &= logdet(Y) \ dpsi &= 2Y^{-1}BC:dB \ frac{partialpsi}{partial B} &= 2Y^{-1}BC \ }$$ The Pham cost function is a linear combination of these functions. $$eqalign{ {cal L} &= frac{psi - phi}{2} \ frac{partial{cal L}}{partial B} &= Big(Y^{-1}-X^{-1}Big)BC ;doteq; G_{std} qquad&big({rm standard;gradient}big) \\ }$$ However, rather than the standard gradient, the linked paper utilizes the relative gradient, which is defined in terms of a small perturbation matrix $(E)$. $$eqalign{ d{cal L} &= {cal L}(B+EB) - {cal L}(B) \ &= G_{std}:EB \ &= G_{std}B^T:E \ &= G:E \ \ G &= Big(Y^{-1}-X^{-1}Big)BCB^T \ &= Big(Y^{-1}-X^{-1}Big)X \ &= (Y^{-1}X-I) \ }$$ This is the content of the first part of Eq (3) on the second page, except it is written in component form, i.e. $$eqalign{ G_{ab} &= frac{X_{ab}}{X_{aa}} - delta_{ab} \\ }$$
Answered by greg on December 3, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP