Different questions about "Inverse Physics problems"

Question

I am in a context of forecasts in astrophysics. Don't be too rude if questions seem to you stupid or naive but rather indulgent, I am just looking for better undertsand all these numerical methods of Monte-Carlo alone/ Monte-carto coupling with Markov-Chain and the difference between a sampler and an estimator. This is little the mess in my head to grasp all the subtilities.
1. Using Covariance matrix at each step
In the following figure below below, especially in the central box I don't understand why I have to use the Covariance matrix at each call of a point that will be or not accepted in the distribution of the posterior : Is it done to compute the $chi^2$ at each time and accept/reject it relying on some threshold, but on which criterion ?
In my code, I generate Power matter spectrum (in Cosmology at the upper left of the figure). Up to this, there is not random process. For me, this is in the central box that there is random with the computation of a posterior distribution with the formula :
$P(Theta | data)=dfrac{P(data | Theta) times P(Theta)}{P(data)}$
As you can see, I need the Likelihood which directly depends of the theorical model, doesn't it ?
Then, I generate a sample of the Likelihood by taking random data in this likelihood ? I am a bit lost as you can see, mixing the 2 concepts and where the random processes occur.

2. Monte-Carlo and Metropolis Hastings
Have I got to consider the term "Monte-Carlo" as a general way to generate distributions (or samples, I don't know which one of the two terms I must use (even if, with Monte-Carlo, I can compute and so estimate the expectation of a random variable knowing the PDF with an integral ?
And coupled with Metropolis-Hasting, the result is that we have a distribution of the posterior, from we can extract for example the mean (peak of the distribution) ?
3). Link between Likelihood and chi-squared : which is the deep link between Likelihood and chi-squared into Monte-Carlo Markov-Chain ?
4. Fisher formalism :
A last question : I heard that Fisher formalism could be only applied under the assumption that posterior/likelihood must be Gaussian.
Could anyone explain why ? and mostly, how to demonstrate it from a mathematical point of view ?
And if by lack of chance, the likelihood produced by a theorical model is not Gaussian, which other alternatives are possible to estimate a set of parameters ? Are there only Monte-Carlo-Markov-Chain methods which could circumvent the non-existence of Gaussian property of Likelihood ?
PS : I have asked different questions but all of them is linked in the sense they have connections between themselves from estimations and sampling method point of view.
So don't be too rude, I am just looking for trying better understand and grasp all the subtilities of all these concepts.
Even if I could have only one answer about one of my questions, I would be grateful.

cos_theta · Accepted Answer

As I understand, your ultimate goal is to solve an inverse problem (i.e., infer some parameters from given data / observations). To this end, you want to apply Bayesian Inference, which relates the posterior (i.e., the probability distribution of the unknown parameters) to the likelihood (i.e., the probability model of observing some values given the parameters) and the prior (i.e., the probability distribution of your belief that the parameters attain some values). The evidence is only used to normalize in order to obtain a valid probability distribution (there are more use cases, e.g., model selection).
Since you're mentioning $chi^2$, I'm supposing the likelihood looks like
$$
p(vec{y} | vec{p}) simeq expleft( -frac{1}{2} (vec{y} - vec{p})^T Sigma^{-1} (vec{y} - vec{p} ) right),
$$
which means that the data / observations $vec{y}$ follow a normal distribution $vec{y} sim mathcal{N}(vec{p}, Sigma)$ where the parameters $vec{p}$ are the mean and the covariance $Sigma$ is fixed. Note that the likelihood is just some function that can be (numerically) evaluated given the inputs $vec{y}$ and $vec{p}$.
Now, to infer the parameters, we are often interested in some functionals of the posterior.
For example, mean, mode, standard deviation, quantiles, highest-posterior-density regions etc.
Note that, for appreciating the Bayesian framework, the parameter inference should not be reduced to a single value (e.g., the mean of the posterior).
In this context, the Monte Carlo method essentially means to draw samples from the posterior and use a statistical estimator to infer some quantity (functionals such as mean, quantiles, etc.) from the distribution.
That is, using the Monte Carlo method, we would simply need to draw random samples from the posterior and use this to estimate the parameters (i.e., take the sample mean to approximate the mean of the distribution).
However, directly sampling from the posterior is usually not possible.
In the example above (Likelihood is normal distribution) it depends on the choice of the prior distribution whether we obtain some known distribution for the posterior that can be sampled from directly (see conjugate priors).
As the name implies, Markov Chain Monte Carlo methods are a subset of Monte Carlo methods.
It is a special method to generate samples from the posterior distribution, which can subsequently be used in a Monte Carlo estimator.
The "standard" MCMC method is Metropolis-Hastings which works like this:
Given some initial state $vec{p}_i$, perform the following steps:

Draw a proposal $vec{x} sim Q(vec{p}_i)$, where $Q$ is a probability distribution that may depend on $vec{p}_i$.
Calculate acceptance probability $$ alpha_i = minleft{1, frac{p(vec{x} | vec{y}) q(vec{p}_i | vec{x})}{p(vec{p}_i | vec{y}) q(vec{x} | vec{p}_i)}  right}, $$
where $q(cdot | vec{a})$ is the density of $Q(vec{a})$.
Draw a random sample $u_i$ from the uniform distribution $U([0,1])$ on $[0,1]$ and set $$ vec{p}_{i+1} = begin{cases} vec{x} & text{if } u_i leq alpha_i  vec{p}_i & text{otherwise}. end{cases} $$

In this algorithm, the posterior density $$p(vec{p} | vec{y}) simeq p(vec{y} | vec{p}) p(vec{p})  $$
without normalization is used.
This involves the computation of the likelihood and prior at the proposed point $vec{x}$, which, in turn, requires multiplication by the covariance matrix in the evaluation of the likelihood.
This should answer your first two questions.

Link between Likelihood and chi-squared

This really depends on the modeling assumptions and the form of the likelihood. In the model used above, it is assumed that
$$ vec{y} = vec{p} + varepsilon, qquad varepsilon sim mathcal{N}_{vec{0}, Sigma}. $$
If the errors are not assumed to be Gaussian, the $chi^2$ term would not appear in the likelihood.

Fisher formalism

As far as I know, the maximum likelihood theory and the Fisher information do not depend on Gaussian distributions. They are fully generic.

And if by lack of chance, the likelihood produced by a theorical model is not Gaussian, which other alternatives are possible to estimate a set of parameters ?

Besides Monte Carlo methods (including MCMC), you can still apply maximum likelihood estimators for the model parameters.

Roxy · Answer

The previous answer pretty sums up my understanding on this problem. I just want to add 2 solid references on this regard (Both are from an astrophysics context).
The paper by Hogg et al provides a pretty hands-on approach while the the survey of Sharma is more of a survey of MCMC analysis usage in astrophysics.
I am not from the astrophysics community, but I learned a lot with Bayesian inference with MCMC from these two. Hope this can be helpful.

Different questions about "Inverse Physics problems"

2 Answers

Add your own answers!

Ask a Question