Computing the matrix differential/derivative of the matrix$rightarrow$scalar function $log det(BCB^T)$

Question

I am trying to learn how to replicate the matrix calculus done in the following paper: https://arxiv.org/pdf/1811.11433.pdf.  To learn how to do this, I a using the following book I found (https://www.mobt3ath.com/uplode/book/book-33765.pdf), by Karim Abadir and Jan Magnus.
I attempted to start by find the differential of function H given below.  However, it does not look like I am on the right track.  Can someone tell me if my calculations below are correct so far?  Or at least if I am using the correct book to be able to understand the paper I listed?  I noticed that the book uses the 'vec' operator to treat the Hessian of a matrix function as a matrix while the paper uses an order 4 tensor, so I am not sure if I am using the right approach.  Thanks for the help.
My work so far:
Let $H(B)=logdet BCB^T$ where $B$ and $C$ are square matrices of dimension $n$ and $C$ is symmetric.  Let $F(B)=BCB^T$ and $G(R)=logdet R$ so that $H(B)=G(F(B))$.
begin{align*}
      dF &= d(B)CB^T + BCd(B^T) hspace{0.4cm} dG(R) = Tr[R^{-1} dR] \
      \
      dH &= Tr[(BCB^T)^{-1} (d(B)CB^T + BCd(B^T))] textbf{ Take transpose}\
      &= Tr[(BCd(B)^T+d(B)CB^T)(BCB^T)^{-1}] \
      &=Tr[BCd(B)^T(BCB^T)^{-1}] + Tr[(d(B)CB^T(BCB^T)^{-1}] \
      &=Tr[BCd(B)^T(B^T)^{-1}C^{-1}B^{-1}] + Tr[(d(B)CB^T(B^T)^{-1}C^{-1}B^{-1}] textbf{ Use cyclic property}\
      &= Tr[(B^T)^{-1} d(B)^T] + Tr[B^{-1} d(B)] = 2* Tr[B^{-1}d(B)] 
end{align*}
The corresponding total derivative is then $DH=2*(vec (B^{-1}))^T$ by the book's notation.  Then I assume I would just 'unvectorize' this to get the derivative in the paper's notation?  Is this a good start to calculating the gradient of the loss function in the paper I listed.  Thanks.

greg · Answer

First, calculate the gradient for the full matrix.
$$eqalign{
X &= BCB^T = X^T \
phi &= logdet X \
dphi &= X^{-T}:dX \
  &= X^{-1}:2operatorname{sym}(dB,CB^T) \
  &= 2X^{-1}BC:dB \
frac{partialphi}{partial B}
  &= 2X^{-1}BC \
}$$
Repeat the calculation for the diagonalized matrix.
$$eqalign{
Y &= (Iodot X) = Y^T \
psi &= logdet(Y) \
dpsi &= 2Y^{-1}BC:dB \
frac{partialpsi}{partial B} &= 2Y^{-1}BC \
}$$
The Pham cost function is a linear combination of these functions.
$$eqalign{
{cal L} &= frac{psi - phi}{2} \
frac{partial{cal L}}{partial B} &= Big(Y^{-1}-X^{-1}Big)BC 
;doteq; G_{std}
 qquad&big({rm standard;gradient}big) \
}$$
However, rather than the standard gradient, the linked paper utilizes the relative gradient, which is defined in terms of a small perturbation matrix $(E)$.
$$eqalign{
d{cal L} &= {cal L}(B+EB) - {cal L}(B) \
 &= G_{std}:EB \
 &= G_{std}B^T:E \
 &= G:E \
\
G
 &= Big(Y^{-1}-X^{-1}Big)BCB^T \
 &= Big(Y^{-1}-X^{-1}Big)X \
 &= (Y^{-1}X-I) \
}$$
This is the content of the first part of Eq (3) on the second page, except it
is written in component form, i.e.
$$eqalign{
G_{ab} &= frac{X_{ab}}{X_{aa}} - delta_{ab} \
}$$

NB:   The paper uses bra-ket notation for the Frobenius product, whereas I use a colon, e.g.
$$A:B = langle A|Brangle = {rm Tr}(A^TB)$$
because it's a lot easier to type (and it looks better).

The Kronecker-vec operation can flatten a matrix expression into a vector
$${rm vec}(AXB)=(B^Totimes A){rm vec}(X) ;=; Mx$$
Using the vec operation, 
a gradient matrix can be flattened to a long vector
$$eqalign{
frac{partialphi}{partial X}  &= G quad&({rm matrix}) \
dphi &= G:dX \
  &= {rm vec}(G)&:{rm vec}(dX) \
  &= g:dx \
frac{partialphi}{partial x}  &= g quad&({rm vector})  \
\
G,X &in{mathbb R}^{mtimes n} \
g,x &in {mathbb R}^{mntimes 1} \
}$$
Similarly, a 4th order Hessian tensor 
can be flattened into a large matrix 
$$eqalign{
{cal H} &= frac{partial G}{partial X}
  in{mathbb R}^{mtimes ntimes mtimes n} quad&({rm tensor}) \
H &= frac{partial g}{partial x}
  in {mathbb R}^{mntimes mn}  quad&({rm matrix}) \
}$$

Computing the matrix differential/derivative of the matrix$rightarrow$scalar function $log det(BCB^T)$

One Answer

Add your own answers!

Ask a Question