Cross Validated Asked by SOULed_Outt on November 27, 2020

I am reading the paper Factor analysis and outliers: A Bayesian approach. The author starts with a factor analysis model given by

$${bf y}_i = {bf Lambda} {bf z}_i + {bf e}_i, quad i = 1, ldots, n,$$

where each ${bf y}_i$ is a $p$-dimensional observation vector, each ${bf z}_i$ is a $K$-dimensional latent factor vector, and ${bf Lambda}$ is a $p times K$ full-rank matrix of factor loadings. The author assumes that the factors and the error term are Normal:

$${bf z}_i sim mathcal{N} ({bf 0}, {bf Phi})$$

$${bf e}_i sim mathcal{N} ({bf 0}, {bf Psi})$$

The author assigns Wishart priors to ${bf Phi}^{-1}$ and ${bf Psi}^{-1}$:

$${bf Phi}^{-1} sim mathcal{W}_K left( {bf Phi}_{*}, nu_{*} right)$$

$${bf Psi}^{-1} sim mathcal{W}_p left( {bf Psi}_{*}, n_{*} right)$$

In the paper the author writes something I found to be quite interesting:

While classical factor analysis sets $bf Phi = I$ and uses a diagonal $bf Psi$ matrix, we impose these restrictions via the prior information matrices ${bf Psi}_{*}$ and ${bf Phi}_{*}$.

**Question:** What should the values of ${bf Psi}_{*}$ and ${bf Phi}_{*}$ be in order to do what the author is suggesting?

The author does not seem to state exactly how this can be done, but I may have missed it so I will continue reading it. My own research on this matter pointed me to these seemingly similar unanswered questions here and here.

**UPDATE:** I did some research on the Wishart distribution and if you specify that $Psi_*$ and $Phi_*$ are two diagonal matrices, then $mathbb{E} [Psi]$ and $mathbb{E} [Phi]$ will be two diagonal mean matrices. Perhaps, this is what the author is referring to. Still unsure, though.

**UPDATE 2:** I set $Psi_*$ and $Phi_*$ to diagonal matrices and ran simulations in R, but the results aren’t what I expected. The simulated values I obtained are not diagonal, so I think I misinterpreted the author’s statement. I thought that if you formulate the factor analysis model with the prior distributions above, that you can consider it the classical factor analysis model by choosing certain hyper-parameter value. But it seems that this formulation does not produce the classical factor analysis model.

**UPDATE 3:** The classical factor analysis model sets ${bf Phi} = {bf I}$ (i.e. non-random), sets $bf Psi$ to be a diagonal matrix (i.e. random diagonal matrix) and assigns prior distributions to only the diagonal elements. What I understand the author’s statement to mean, is that I can do the aforementioned things by using Wishart priors on $bf Phi$ and $bf Psi$ with special scale matrices $bf Phi_*$ and $bf Psi_*$.

Inverse Wishart (which is used in the mentioned article) is used as a prior for the covariance matrix of a multivariate Normal distributed random variable.

This choice is based on the fact that its a conjugate prior for the covariance matrix in this scenario.

If $mathbf{X}=(mathbf{x}_1, mathbf{x}_2, ldots, mathbf{x}_n) sim mathcal{N}(mathbf{0}, mathbf{Sigma})$, with a prior $mathbf{Sigma} sim mathcal{W}^{-1}(mathbf{Psi}, nu)$, then the posterior $p(mathbf{Sigma}|mathbf{X}) sim mathcal{W}^{-1}(mathbf{A}+mathbf{Psi},n+nu)$ is also an inverse-Wishart distributed random variable ($mathbf{A}=mathbf{X}mathbf{X}^t$, $n$=number of observations $mathbf{X}$).

Said that, one can impose the structure of the prior for the covariance matrix, by setting the prior scale matrix $mathbf{Psi}$ opportunely. In the article, the authors set the $mathbf{Psi}=mathbf{Psi}^*$ to be diagonal.

An alternative approach would have been forcing the $p$ variables to be independently Normal-distributed. In that case, the conjugate prior for the variance of each dimension would have been the *Inverse Gamma*.

The limitation of the latter is that forces the posterior $p$ variables to be independent, while in the case of an *Inverse Wishart*, off-diagonal elements of the covariance matrix can have a non-zero-probability to be non-zero.

When setting the scale matrix $mathbf{Psi}^*$ as diagonal and $nu=p+1$, the correlations in $mathbf{Sigma}$ have a marginal uniform distribution (par. 2.1 https://arxiv.org/pdf/1408.4050.pdf). This corresponds to a non-informative prior for the correlations, implying that non-zero correlations require strong evidence from the data $mathbf{X}$.

An interesting alternative, suggested by Gelman, is to use *Half-Cauchy* priors (the linked article focuses on 1-dimensional hierarchical models):

http://www.stat.columbia.edu/~gelman/research/published/taumain.pdf

Correct answer by ping on November 27, 2020

Get help from others!

Recent Answers

- Jon Church on Why fry rice before boiling?
- Peter Machado on Why fry rice before boiling?
- haakon.io on Why fry rice before boiling?
- Joshua Engel on Why fry rice before boiling?
- Lex on Does Google Analytics track 404 page responses as valid page views?

Recent Questions

- How can I transform graph image into a tikzpicture LaTeX code?
- How Do I Get The Ifruit App Off Of Gta 5 / Grand Theft Auto 5
- Iv’e designed a space elevator using a series of lasers. do you know anybody i could submit the designs too that could manufacture the concept and put it to use
- Need help finding a book. Female OP protagonist, magic
- Why is the WWF pending games (“Your turn”) area replaced w/ a column of “Bonus & Reward”gift boxes?

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP