Computational Science Asked by Daniel Sela on April 4, 2021
I’m working on a Monte Carlo project similar to the Ising model. I’ve found many examples on which I’ve based my code.
From some papers I read on binning analysis, the errors after each binning step are supposed to converge. Mine ended up oscillating after some binning step. And so I’m getting negative auto correlation times.
I was hoping someone could either verify my procedure is correct, or explain a good procedure for dealing with correlated sampling.
The algorithm I am using is as follows:
I sample after each time step.
Then recursively (the function in the code is binning()),
I create a list of lists.
Each list in the list is a collection of averages of successive samples by binning successive pairs (not crossing between pairs).
I calculate the standard deviation based on a paper at each step and that gives a sequence of errors at each binning step.
Part 4D of this paper were my references. I assumed the last error would be a good estimate for the limit, giving me negative correlation as it doesn’t converge, my problem.
I’m not sure if I am doing this correctly, and if I am, what could be going wrong giving somewhat oscillating error for each bin?
I recently went through a similar experience, so you can take this suggestion as coming from a fellow learner rather than an expert. I was implementing an MCMC-type algorithm, and I wanted to know what the effective sample size was. So I tried to calculate the empirical autocorrelation $hatrho_k$ of the samples, but the sum
$$tau = 1 + 2sum_{k = 1}^Nhatrho_k$$
was close to zero or negative. This would imply a really large effective sample size (ESS) if you took that estimate seriously, which was obviously absurd. The reason this works poorly is that it's difficult to get a very good estimate for the autocorrelation function at large lags from the samples themselves.
I found an arXiv preprint that compared several different methods for estimating the ESS and/or correlation time for functions of the samples. The easiest method was a binning approach similar to but I think simpler than the one in the paper you reference. First, let $f$ be some function of the samples that you're trying to estimate, let
$$hatmu = frac{1}{N}sum_{k = 1}^Nf(x_k)$$
be the sample mean of $f$, and
$$hat s^2 = frac{1}{N}sum_{k = 1}^N(f(x_k) - hatmu)^2$$
the sample variance. Now divide the $N$ samples into batches of size $M$. (Usually $M propto N^{1/3}$, but all you really need is that both $M$ and $N / M$ to go to infinity in the limit of larger sample sizes.) Now let
$$hatmu_m = frac{1}{M}sum_{k = Mcdot m + 1}^{Mcdot(m + 1)}f(x_k)$$
be the sample mean of batch $m$, and let
$$hat s_M^2 = frac{M}{N}sum_{m = 1}^{N / M}(hatmu_m - hatmu)^2$$
be the variance between the batches. Then the correlation time is approximately
$$tau = Mhat s_M^2 / hat s^2.$$
I found this approach to be pretty foolproof to implement and the results are guaranteed to be positive, whereas the empirical correlation function can be poorly sampled and thus might sum to all sorts of crazy things.
Section IV.D of the paper you linked is similar to the method described in section 2.3 of the Thompson preprint, the initial positive sequence or IPS estimator. The idea is that the sequence
$$textrm{IPS}_k = rho_{2k} + rho_{2k + 1}$$
is always positive for the true, ideal autocorrelation function $rho_k$ of a reversible Markov chain. The idea of the IPS estimator is to take the correlation time to be
$$tau approx 1 + 2sum_{k = 1}^{N^*}textrm{IPS}_k$$
but this sum is truncated at the first index $N^*$ where $textrm{IPS}_{N^* + 1} < 0$. I also tried this approach for my MCMC simulation and found that it gave about the same estimate of the correlation time and thus the effective sample size.
Finally, you might have better luck asking this kind of question on the Statistics stack exchange as it's a little more their bailiwick.
Correct answer by Daniel Shapero on April 4, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP