Why does Hartree-Fock work so well?

Question

Why does the Hartree-Fock method for electronic structure work so well for atoms?
More specifically, why is the "correlation energy" a relatively small component of an atom's (ground state) energy?
I might also ask why electron-electron interaction appears to For example:

The true ground state energy of the helium atom is −2.903 hartree
The energy of the Hartree-Fock solution is -2.862 hartree
The energy of a solution obtained by ignoring electron-electron interaction altogether is -2.750 hartree

(See F. W. Byron, Jr. and C. J. Joachain, Phys. Rev. 146, 1.)
If we quantify the "effect" of different interactions using the interaction's contribution to the ground state, we find

electron-electron interaction (-2.750 - -2.903 = 0.153) is almost 20x smaller than electron-nucleus interaction (2.903), and
electron correlation (-2.862 - -2.903 = 0.041) is 70x smaller than electron-nuclear interaction, and almost 4x smaller than electronic mean field interaction (0.153).

Given the basic ingredients for atomic physics (Coulomb interaction, Schrodinger equation, Pauli exclusion), it does not seem like there is any reason a priori that one electron in a Helium atom should be so indifferent to the location of the other.
What reasons (other than desperation) did theorists have to expect the mean-field approximation to yield meaningful results?
Were they based on experimental observations or some deeper physical argument?

Chiral Anomaly · Accepted Answer

I don't know why (or if) people originally expected Hartree-Fock to work as well as it does, but after thinking about it for a while, I'm personally a little less surprised by it. It seems surprising at first because most wavefunctions are not Slater determinants, but in view of the constraints that I'll derive below, finding any wavefunction that does better than the best Slater determinant seems challenging. "Seems challenging" could be due to my lack of imagination, but it still makes me a little less surprised by how well a single Slater determinant works.

Conventional formulation of the question

Consider a model of $N$ non-relativistic spin-$1/2$ electrons. The Hamiltonian is $$ newcommand{cH}{{cal H}} newcommand{cS}{{cal S}} H = K + V + W tag{1} $$ where

$K$ is the kinetic-energy term,
$V$ is the attractive Coulomb interaction of each electron with the fixed nucleus,
$W$ is the repulsive Coulomb interaction between electrons.

The Hilbert space $cH$ consists of all antisymmetric wavefunctions $psi$, which means that $psi$ changes sign whenever two of the $N$ location-and-spin arguments are exchanged. A wavefunction is called a Slater determinant if it can be written as the antisymmetrized product of $N$ single-electron wavefunctions. Let $cSsubsetcH$ be the set of Slater determinants.

Let $psi(A)$ denote the expectation value of an operator $A$ with respect to a wavefunction $psi$. The true ground state is the wavefunction $psiincH$ that minimizes the quantity $psi(H)$, which is then the energy $E_0$ of the ground state: $$ E_0equiv min_{psiincH}psi(H). tag{2} $$ The Hartree-Fock method uses the minimum of $psi(H)$ among all Slater determinants $psiincSsubsetcH$ as an approximation to $E_0$: $$ E_text{HF}equivmin_{psiincS}psi(H). tag{3} $$ The subscript HF stands for Hartree-Fock. The question is: why is (3) such a good approximation to (2)?

Dilating the wavefunction

What properties of the terms $K$, $V$, and $W$ might be important?

The signs are important. For any state $psi$, the quantities $psi(K)$ and $psi(W)$ are always positive, whereas the quantity $psi(V)$ is always negative.
The spatial scale is important. Consider the quantities $psi(K)$, $psi(V)$, and $psi(W)$, for any wavefunction $psi$. We can make the magnitudes of all of these quantities smaller by dilating $psi$ in space. This makes $psi(K)$ smaller because it reduces the electrons' momenta (by reducing the magnitude of the wavefunction's gradients), and it makes the magnitudes of $psi(V)$ and $psi(W)$ smaller by moving the electrons farther away from each other and from the nucleus. This assumes that the system is an atom, with one nucleus at the origin, and the dilation (or dilatation?) leaves the origin fixed.

We can make this more specific. Let $psi_lambda$ be the wavefunction obtained by applying a spatial scale factor $lambda>0$, where $lambda<1$ dilates the wavefunction, and $lambda>1$ compresses the wavefunction. The kinetic term $K$ scales like the gradient-squared (because momentum $sim$ gradient), and the Coulomb interaction scales like $r^{-1}$ where $r$ is the distance between charges. Therefore, begin{align} psi_lambda(K) &= lambda^2psi(K) psi_lambda(V) &= lambdapsi(V) psi_lambda(W) &= lambdapsi(W). tag{5} end{align} Combine these to get $$ psi_lambda(H) = lambda^2psi(K) + lambdabig(psi(V)+psi(W)Big). tag{6} $$ Now, for any given $psi$, let $Lambda$ denote the value of $lambda$ that minimizes the quantity (6). By taking derivative of (6) with respect to $lambda$ and requiring that the result be equal to zero, we get $$ Lambda = -frac{psi(V)+psi(W)}{2psi(K)}. tag{7} $$ By construction, $psi_Lambda(H)$ is the minimum possible expectation value of $H$ among all wavefunctions that can be obtained from $psi_Lambda$ by rescaling in space, so if the original wavefunction happened to already be $psi=psi_Lambda$, then we would get $Lambda=1$. This implies $$ psi_Lambda(V)+psi_Lambda(W) = -2psi_Lambda(K). tag{8} $$ This looks like the virial theorem, and it implies $$ psi_Lambda(H) = -psi_Lambda(K) < 0. tag{9} $$ This is the minimum expectation value of $H$ that can be achieved by dilating (or compressing) the wavefunction in space, starting with an arbitrary wavefunction $psi$.

How good can a single Slater determinant be?

A Slater determinant is an antisymmetrized product of single-electron wavefunctions, which I'll call orbitals. What might the optimal Slater determinant look like?

Start with some generic Slater determinant whose overall scale has already been optimized as described above. To try to reduce the energy further, we could:

Increase $|psi(V)|$ by squeezing one or more orbitals closer to the nucleus.
Decrease $psi(W)$ by moving two or more orbitals farther away from each other.

Consider how we might move two orbitals farther away from each other without also moving them farther from the nucleus (which would be counterproductive). One way to do this is to concentrate one of the orbitals on one side of the nucleus and to concentrate the other one on the other side. We can do this without moving either one farther from the nucleus. Therefore, this should decrease $psi(W)$ without changing $psi(V)$. On the other hand, it will increase $psi(K)$, because now each orbital is concentrated in a smaller volume (which forces the momenta to be larger). After making such a change, we might be able to optimize its effect a little more by adjusting the overall scale as explained above. If the net effect of these changes is to reduce the energy overall, then the optimal Slater determinant must already exploit something like this.

Such a configuration seems asymmetric, but that's not necessarily a problem. Even if we expect the true ground state to have some special symmetry, the optimal Slater determinant does not necessarily need to have that same symmetry. It only needs to belong to a family of equally-optimal Slater determinants that collectively have that symmetry. Given one member of that family, we can average over rotations to construct a more symmetric state (which will no longer be a single Slater determinant), which may then be a better approximation to the true ground state, but the intuition described below suggests that this averaging might not change the energy much.

How much better can a superposition of Slater determinants be?

Despite the compact notation, an expectation value $psi(cdots)$ is quadratic in the wavefunction. Using bra-ket notation, we can write $$ newcommand{ra}{rangle} newcommand{la}{langle} psi(cdots)equivfrac{lapsi|cdots|psira}{lapsi|psira}. tag{11} $$ To do better than $|psi_text{HF}ra$, we need to consider wavefunctions that are not Slater determinants. Any $N$-electron wavefunction can be written as a linear combination of Slater determinants, so we can think of a general wavefunction as a linear combination of different $N$-orbital configurations. To make the intuition easier, we can avoid having a variable number of terms in the superposition by choosing a fixed number $J$ and writing $$ |psi_text{HF}ra = frac{1}{J}sum_{j=1}^J |psi_text{HF}ra tag{12} $$ so that the optimal Slater determinant $|psi_text{HF}ra$ is expressed as a superposition of $J$ (identical) terms. Then we can consider how we might vary different terms in the superposition in different ways in order to do better than a single Slater determinant.

Since $|psi_text{HF}ra$ is already the optimal Slater determinant, any changes we make to the terms in (12) will necessarily make the "diagonal" terms worse, meaning that the contribution of the diagonal terms to the energy will increase. Whatever happens in the cross-terms must overcompensate for the degradation in the diagonal terms. I don't know how to orchestrate different changes in different terms of (12) to accomplish this feat. At the very least, this is yet another factor that limits our ability to improve on $psi_text{HF}$.

One way to change the cross-terms without changing the diagonal terms is to average over rotations, as described above, assuming that the optimal Slater determinant is not rotationally symmetric. Suppose this helps. How much might it help? Even though the Hilbert space is infinite-dimensional, the set of states below a given energy and localized within a given finite region is essentially finite-dimensional. And in the present case, where $psi_text{HF}$ is already squeezing things about as tightly as Pauli exclusion allows, finite-dimensional might mean not-very-many-dimensional. This limits the number of different terms in (12) that can be orthogonal to each other. In other words, even though we may consider a large number of terms in (12), many of them won't differ much, so their cross-terms will look essentially like diagonal terms, which means that those cross-terms can't help decrease the overall energy significantly — because we've already chosen $psi_text{HF}$ to optimize the diagonal terms.

That's nowhere near being a proof that Hartree-Fock is a good approximation, but it makes me less surprised that $psi_text{HF}(H)$ is already close to optimal for some atoms. A few examples are tabulated here.

As a side note: a wavefunction that gives a good approximation to the energy of the ground state is not necessarily such a good approximation to the true ground state wavefunction itself. The fractional error in the former tends to be of order $epsilon^2$ whenever the error in the latter is of order $epsilon$. This general result is derived in the first section of Goodisman and Klemperer, "On errors in Hartree-Fock calculations," The Journal of Chemical Physics 38, 721 (1963).

Why does Hartree-Fock work so well?

One Answer

Conventional formulation of the question

Dilating the wavefunction

How good can a single Slater determinant be?

How much better can a superposition of Slater determinants be?

Add your own answers!

Ask a Question