How to derive the Potts model as well as the gauge transformation *directly* from the principle of maximum entropy?

Question

$
newcommand{delim}[3]{left#1#3right#2}
newcommand{delimzero}[3]{#1#3#2}
newcommand{delimone}[3]{bigl#1#3bigr#2}
newcommand{delimtwo}[3]{Bigl#1#3Bigr#2}
newcommand{delimthree}[3]{biggl#1#3biggr#2}
newcommand{delimfour}[3]{Biggl#1#3Biggr#2}
$Suppose we have a random vector $s = delim(){s_1, dotsc, s_N}$ whose each component can be one of integers $delim{}{1, dotsc, q}$.
In another word, we are considering a system of $N$ Potts spins.
Given $M$ independent samples of the system, we could write the one- and two-site statistics as
$$
begin{align}
  f_i(a) &= frac 1 M sum_{m=1}^M deltadelim(){a,s_i^m},

f_{ij}(a,b) &= frac 1 M sum_{m=1}^M deltadelim(){a,s_i^m} deltadelim(){b,s_j^m}.
end{align}
$$
I want to derive the functional form the maximum-entropy probability distribution consistent with $delimone{}{f_i}$ and $delimone{}{f_{ij}}$.
The wanted the functional form could be obtained by the optimization problem
$$
underset{P(s)}{textrm{arg max}} sum_s -P(s) log P(s)
$$
with constraints
$$
begin{align}
  textrm{normalization: }
  &sum_s P(s) = 1,

textrm{one-site statistics: }
  &sum_s P(s) deltadelim(){a,s_i} = f_i(a) quad delim(){1le i le N textrm{ and } 1 le a le q},

textrm{two-site statistics: }
  &sum_s P(s) deltadelim(){a,s_i} deltadelim(){b,s_j} = f_{ij}(a,b) quad delim(){1le i,j le N textrm{ and } 1 le a,b le q}.
end{align}
$$
Since the (Shannon) entropy $S[P] = sum_{s} -P(s) log P(s)$ is convex with respect to configuration probabilities $P(s)$, this optimization problem can be solved with Lagrange multipliers:
$$
P(s) = frac 1 Z expdelimthree(){-sum_i^N h_i(s_i) - sum_{i<j} J_{ij}(s_i,s_j)}.tag{1}
$$
where $Z = sum_s expdelim(){-sum_i h_i(s_i) - sum_{i<j} J_{ij}(s_i,s_j)}$ is the partition function.
The distribution (1) is also called 'the generalized Potts model'.
There are $q N + q^2 N(N-1)/2$ parameters in eq.~(1).
But constraints except the one ensuring normalization are redundant in the sense that
$$
sum_{a=1}^q f_i(a) = 1, qquad sum_{b=1}^q f_{ij}(a,b) = f_i(a).
$$
Thus the number of independent constraints is $(q-1)N + (q-1)^2 N(N-1)/2$.
And this leads to the so-called gauge invariance such that the distribution (1) is invariant under the transformation
$$
begin{split}
  J_{ij}(a,b) &to J_{ij}(a,b) + K_{ij}(a) + K_{ji}(b) ,

h_i(a) &to h_i(a) + g_i - sum_{j neq i} delim[]{K_{ij}(a) + K_{ji}(a)} .
end{split}
tag{2}
$$
where $delim{}{g_i}$ and $delim{}{K_{ij}(a)}$ are arbitrary constants.
Allow me to explain a bit further: the contribution of $delim{}{g_i}$ is to add an overall constant to the exponent, thus has no impact after normalization; adding $K_{ij}(a)$ to $J_{ij}(a,b)$ and subtracting $K_{ij}(a)$ from $h_i(a)$ simultaneously means to move some contribution of the edge $(i,j)$ to the vertex $i$.
In the literature, the distribution is given first and then the over-parameterization is shown by the gauge transformation.
However, I think the right procedure of the use of the principle of maximum entropy should take the redundancy of constraints from the very beginning and derive the so-called gauge transformation.
So my question is actually:
How to solve optimization problem with redundant constraints while keeping redundancy in constraints?
In another word, how to derive the transformation (2) from redundancy?
This question is raised in the context of applying the principle of maximum entropy to the statistical inference of protein/DNA/RNA sequence data (where each loci has more than 2 possible states and therefore is usually not modeled as an Ising spin).
I add this context in the hope that ​other people confused about this point can be enlightened by answers here.
I have made a somewhat extensive literature survey but failed to find a reasonable derivation for the transformation (2).
Concerning literature I've surveyed, authors either mention the redundancy and soon choose one gauge to eliminate the redundancy, or elaborate a bit further on where the redundancy comes from.
But everyone write down the transformation (2) directly, no one gives a derivation.
I have little knowledge in the gauge theory, does the origin lies there?
Anyway, any hint would be appreciated.
Thanks in advance.

How to derive the Potts model as well as the gauge transformation directly from the principle of maximum entropy?

Add your own answers!

Ask a Question