Justification of the $U(1)$ gauge for electromagnetism?

Question

Why should we expect or require that there is a $U(1)$-gauge symmetry in the theory of a charged particle (such as QED), namely that its physical properties should not change under local changes of the wavefunction $psi(x) rightarrow psi(x) e^{i alpha(x)}$? For example, demanding this $U(1)$-symmetry be satisfied justifies the usage of a covariant derivative $D_mu(x) = partial_mu + i q B_mu(x)$, while $B_mu$ transforms by $B_mu(x) rightarrow B_mu(x) - frac1q partial_mu alpha(x)$. Letting $B_mu$ be the vector potential $A_mu$ in electromagnetism allows an interaction/coupling between the fermion wavefunction $psi$ and electromagnetism, although I don't understand why that is indeed the correct thing to do. I heard that this is precisely called the minimal coupling, but I don't understand the exact details.
In summary I have two questions:
Question 1. What is the physical reason why we should require a $U(1)$-gauge symmetry to be present for the wavefunction of a charged fermion?
Question 2. After we impose this gauge symmetry on the Dirac Lagrangian, what is the physical justification that we should let the connection 1-form term to be precisely the vector potential $A_mu$, and not other terms from electromagnetism (say, $F_{munu}A^nu$)?
There are (many) other questions on Physics SE that address a similar concern (such as this and this), but not exactly this.

J. Murray · Accepted Answer

Ultimately, the physical reason for doing this is that it works. There’s a fairly natural line of reasoning which leads to this procedure - that’s not a proof, because proofs don’t exist in physics, but it’s suggestive motivation. I'll run through this reasoning, breaking up the narrative with some elaboration that may be helpful.

If you impose a local symmetry under the action of a Lie group $G$, then you are immediately led to the need for a connection, which can be represented by a one-form $mathbf A$ which takes its values in the Lie algebra $frak g$ associated to $G$, and a covariant derivative $D = partial + A$.

Let $Psi : mathbb R^3mapsto mathbb C^n$ be an $n$-component wavefunction.  We choose a basis ${hat e_1,hat e_2,ldots,hat e_n}$ for $mathbb C^n$ and express our wavefunction in component form $Psi = psi^a hat e_a$.  If we allow the basis to be position-dependent, then differentiation of the wavefunction yields
$$partial_{color{red}{mu}} Psi = (partial_color{red}{mu} psi^a)hat e_a + psi^apartial_color{red}{mu}(hat e_a)$$
where I use the red, Greek subscript to denote the spatial index and Latin super/subscripts to denote the $mathbb C^n$ indices. The expression $partial_color{red}{mu}(hat e_a)$ will be some element of $mathbb C^n$, so we can express it in the local basis as $partial_color{red}{mu}(hat e_a) = {A_color{red}{mu}}^b_{  a} hat e_b$.  Plugging this back in and relabeling indices yields
$$partial_color{red}{mu} Psi = (partial_color{red}{mu} psi^a + {A_color{red}{mu}}^a_{  b} psi^b)hat e_a$$
This motivates the definition $D_color{red}{mu} equiv partial_color{red}{mu} + A_color{red}{mu}$ (where, for each $color{red}{mu}$, $A_color{red}{mu}$ is interpreted as an $ntimes n$ complex matrix), so $partial_color{red}{mu} Psi = (D_color{red}{mu}psi)^a hat e_a$.  Under change of basis via some $Omega in G$, $psi^a mapsto Omega^a_{  b} psi^b$ and $hat e_a mapsto (Omega^{-1})^b_{  a} hat e_b$ to preserve the value of $Psi$ itself.   Note: From here on, I'll drop the spatial index, because it just sits there and comes along for the ride.  You can always put it back in if you want.

Exercise for the reader: Using the definition $partial(hat e_a) = A^b_{  a}hat e_b$, show that under change of basis, $A mapsto Omega(A + partial) Omega^{-1}$.  Further, argue that if $Omega = e^theta$ for some $theta in frak{g}$, consistency requires that $A in frak{g}$.

We now ask whether this connection has any physical meaning.  If it can be uniformly set to zero by appropriate change of gauge, then we can perform all of our calculations in that gauge; since all gauges are physically equivalent, this implies that $A$ cannot manifest any physical effects. Setting the connection to zero means that $Omega(A + partial)Omega^{-1} = 0 iff A = -(partial Omega^{-1})Omega = Omega^{-1} partial Omega$ for some $Omega in G$.

Exercise for the reader: Show that if $A_mu = Omega^{-1} partial_mu Omega$, then
$$(dA)_{munu} equiv partial_mu A_nu - partial_nu A_mu = -[A_mu,A_nu]$$
so the $frak{g}$-valued(!) 2-form $F$ with components $F_{munu} equiv (dA)_{munu} + [A_mu,A_nu] = 0$.  Furthermore, show that under change of gauge, $F mapsto Omega F Omega^{-1}$.

That the connection can be set to zero implies that $F$ (called the curvature form of $A$) vanishes.  The reverse is also true (at least locally), but that's considerably harder to show.
If $F$ doesn't vanish, then we need some way to determine what it should be. One way to do this is to make a scalar (density) out of $mathbf F$ and use it as a Lagrangian density. Recall that $F$ has two spatial indices which need to be taken care of.  Generically$^dagger$, the simplest scalar one can make out of $mathbf F$ is $-frac{1}{4}operatorname{Tr}(F^2)$, where $F^2=F_{munu}F^{munu}$ and the numerical factor is added for conventional reasons.

Note that $g^{munu}F_{munu}$ vanishes identically, so that's no good.  However, $F^2 equiv g^{mualpha}g^{nu beta} F_{mu nu} F_{alpha beta}$ does not.  But we must be careful - each $F_{munu}$ is a matrix.  Written properly with all of the relevant indices,
$$F^2 = g^{mualpha}g^{nubeta}left( F^a_{  b}right)_{munu} left( F^b_{  c}right)_{alphabeta} = (F^2)^a_{  c}$$
We still need to get rid of those vector space indices, so we can just trace over them to get a real scalar:
$$operatorname{Tr}(F^2) = (F^2)^a_{  a} = g^{mualpha}g^{nubeta}left( F^a_{  b}right)_{munu} left( F^b_{  a}right)_{alphabeta}$$

Demanding that the Lagrangian be gauge-invariant rules out terms like $A_mu A^mu$, which would give the $A$ fields a mass; as a result, all of the auxiliary fields obtained in this way are massless.
At the end of the day, the coupling to the matter in the theory arises naturally from the presence of the connection in the covariant derivative. The dynamics arise from using the simplest gauge-invariant scalar as a Lagrangian density.

Question 1. What is the physical reason why we should require a $U(1)$-gauge symmetry to be present for the wavefunction of a charged fermion?

There’s no particular reason why we should - other than that if we do, then electromagnetism falls into our lap. If we repeat the procedure with different gauge groups, we arrive at different theories, some of which seem to be manifested in reality and others which do not.

Question 2. After we impose this gauge symmetry on the Dirac Lagrangian, what is the physical justification that we should let the connection 1-form term to be precisely the vector potential $A_mu$, and not other terms from electromagnetism (say, $F_{munu}A^nu$)?

I think are imagining that $U(1)$ symmetry is imposed and subsequently married to electromagnetism, but that's not quite right.  $U(1)$ symmetry is imposed and then becomes electromagnetism.  It's not that we find ourselves searching for a connection 1-form and decide that it should be the vector potential; it's that if you follow the procedure above, then the connection 1-form automatically obeys the Maxwell equations and exerts a Lorentz force on charged matter.
In other words, you can call it what you like but the imposition of a local $U(1)$ symmetry requires the introduction of an auxiliary field which behaves exactly like the electromagnetic 4-potential and has exactly the same effect on matter.  If it walks like a duck and quacks like a duck...

$^dagger$A fun exception to this rule is if the dimension of the underlying space and the dimension of the vector space on which $G$ acts are the same; then the Greek and Latin indices run over precisely the same values.  This is true when we consider $d$-dimensional spacetime and its $d$-dimensional tangent spaces, for example.
In that case, we can contract one Latin index with one Greek index in the expression $(F^a_{  b})_{munu}$, and then contract the result with the metric.  The only non-trivial way to do this is
$$ g^{mu nu} (F^a_{  mu})_{a nu}$$
This term is linear in $F$ rather than quadratic, and appears in general relativity; the connection is the Christoffel connection, $F$ is the Riemann curvature tensor, and the above-obtained scalar is the Ricci scalar.

Justification of the $U(1)$ gauge for electromagnetism?

One Answer

Add your own answers!

Ask a Question