# How to impose restrictions on a random matrix via its prior distribution?

Cross Validated Asked by SOULed_Outt on November 27, 2020

I am reading the paper Factor analysis and outliers: A Bayesian approach. The author starts with a factor analysis model given by
$${bf y}_i = {bf Lambda} {bf z}_i + {bf e}_i, quad i = 1, ldots, n,$$
where each $${bf y}_i$$ is a $$p$$-dimensional observation vector, each $${bf z}_i$$ is a $$K$$-dimensional latent factor vector, and $${bf Lambda}$$ is a $$p times K$$ full-rank matrix of factor loadings. The author assumes that the factors and the error term are Normal:
$${bf z}_i sim mathcal{N} ({bf 0}, {bf Phi})$$
$${bf e}_i sim mathcal{N} ({bf 0}, {bf Psi})$$

The author assigns Wishart priors to $${bf Phi}^{-1}$$ and $${bf Psi}^{-1}$$:
$${bf Phi}^{-1} sim mathcal{W}_K left( {bf Phi}_{*}, nu_{*} right)$$
$${bf Psi}^{-1} sim mathcal{W}_p left( {bf Psi}_{*}, n_{*} right)$$

In the paper the author writes something I found to be quite interesting:

While classical factor analysis sets $$bf Phi = I$$ and uses a diagonal $$bf Psi$$ matrix, we impose these restrictions via the prior information matrices $${bf Psi}_{*}$$ and $${bf Phi}_{*}$$.

Question: What should the values of $${bf Psi}_{*}$$ and $${bf Phi}_{*}$$ be in order to do what the author is suggesting?

The author does not seem to state exactly how this can be done, but I may have missed it so I will continue reading it. My own research on this matter pointed me to these seemingly similar unanswered questions here and here.

UPDATE: I did some research on the Wishart distribution and if you specify that $$Psi_*$$ and $$Phi_*$$ are two diagonal matrices, then $$mathbb{E} [Psi]$$ and $$mathbb{E} [Phi]$$ will be two diagonal mean matrices. Perhaps, this is what the author is referring to. Still unsure, though.

UPDATE 2: I set $$Psi_*$$ and $$Phi_*$$ to diagonal matrices and ran simulations in R, but the results aren’t what I expected. The simulated values I obtained are not diagonal, so I think I misinterpreted the author’s statement. I thought that if you formulate the factor analysis model with the prior distributions above, that you can consider it the classical factor analysis model by choosing certain hyper-parameter value. But it seems that this formulation does not produce the classical factor analysis model.

UPDATE 3: The classical factor analysis model sets $${bf Phi} = {bf I}$$ (i.e. non-random), sets $$bf Psi$$ to be a diagonal matrix (i.e. random diagonal matrix) and assigns prior distributions to only the diagonal elements. What I understand the author’s statement to mean, is that I can do the aforementioned things by using Wishart priors on $$bf Phi$$ and $$bf Psi$$ with special scale matrices $$bf Phi_*$$ and $$bf Psi_*$$.

Inverse Wishart (which is used in the mentioned article) is used as a prior for the covariance matrix of a multivariate Normal distributed random variable.

This choice is based on the fact that its a conjugate prior for the covariance matrix in this scenario.

If $$mathbf{X}=(mathbf{x}_1, mathbf{x}_2, ldots, mathbf{x}_n) sim mathcal{N}(mathbf{0}, mathbf{Sigma})$$, with a prior $$mathbf{Sigma} sim mathcal{W}^{-1}(mathbf{Psi}, nu)$$, then the posterior $$p(mathbf{Sigma}|mathbf{X}) sim mathcal{W}^{-1}(mathbf{A}+mathbf{Psi},n+nu)$$ is also an inverse-Wishart distributed random variable ($$mathbf{A}=mathbf{X}mathbf{X}^t$$, $$n$$=number of observations $$mathbf{X}$$).

Said that, one can impose the structure of the prior for the covariance matrix, by setting the prior scale matrix $$mathbf{Psi}$$ opportunely. In the article, the authors set the $$mathbf{Psi}=mathbf{Psi}^*$$ to be diagonal.

An alternative approach would have been forcing the $$p$$ variables to be independently Normal-distributed. In that case, the conjugate prior for the variance of each dimension would have been the Inverse Gamma.
The limitation of the latter is that forces the posterior $$p$$ variables to be independent, while in the case of an Inverse Wishart, off-diagonal elements of the covariance matrix can have a non-zero-probability to be non-zero.

When setting the scale matrix $$mathbf{Psi}^*$$ as diagonal and $$nu=p+1$$, the correlations in $$mathbf{Sigma}$$ have a marginal uniform distribution (par. 2.1 https://arxiv.org/pdf/1408.4050.pdf). This corresponds to a non-informative prior for the correlations, implying that non-zero correlations require strong evidence from the data $$mathbf{X}$$.

An interesting alternative, suggested by Gelman, is to use Half-Cauchy priors (the linked article focuses on 1-dimensional hierarchical models):

http://www.stat.columbia.edu/~gelman/research/published/taumain.pdf

Correct answer by ping on November 27, 2020

### Ask a Question

Get help from others!