# Is it required to sample the full reaction coordinate in umbrella sampling?

Matter Modeling Asked by BND on January 1, 2022

Below is the free energy profile obtained after a histogram analysis of an umbrella sampling simulation. Generally, one is interested in the free energy difference (highest energy minus lowest energy). Let’s assume that we have a good assumption where the lowest and highest are on the reaction coordinate.

• Should one run an umbrella simulation on the entire reaction coordinate?
• Isn’t sampling the lowest and highest energy region enough to derive the free energy difference?

The short answer is: yes you need to sample the whole reaction coordinate if you want accurate free energy values.

If you are simply interested in qualitative behaviour, then you might (key word being might) get away with fewer windows (but they still need to be distributed fairly uniformly), but I would only recommend this for a dry/test run and never for a production run. Otherwise, you need to be absolutely sure you get good phase space overlap between all distributions, or you will either get huge deviations from the correct free energy values or your analysis will crash. The reason for this is quite simple: the method that umbrella sampling uses for analysis, the weighted histogram analysis method (WHAM), relies on data from all windows in order to calculate relative populations. If you have gaps in your phase space, this in turn creates gaps in the PMF (potential of mean force) and WHAM cannot calculate the relative heights, meaning that it will either converge to a strange result or not converge at all.

As a final note, I would say that one of the biggest disadvantages of umbrella sampling (apart from the cost) is the fact that the efficiency is directly dependent on the shape of the PMF, bearing in mind that the WHAM estimator is exponentially sensitive to the phase space overlap. For example, if you have a very high kinetic barrier, you will need to concentrate more windows around this point in order to get some overlap. This means that while you don't "need to know the answer to get the answer", you do "need to know the answer to be certain in your convergence". In other words, you might have to add extra windows after your analysis to make sure your distributions overlap and that the extra windows don't change your results, because they shouldn't.

Edit: I just noticed that the artifact at around 0.8 nm can be readily explained by the lack of phase space overlap, so this gives you quite a good idea of what to expect when you have insufficient data. The authors of the tutorial also seem to acknowledge that and they also state that you need to make sure you always have adequate overlap.

Answered by Godzilla on January 1, 2022