What is the difference between maximum likelihood hypothesis and maximum a posteriori hypothesis?

Question

I am a student and I am studying machine learning. I am focusing on the concept of Bayesian learning and I have studied the maximum likelihood hypothesis and the maximum a posteriori hypothesis.
I have seen that the maximum likelihood hypothesis  is the hyposesis that maximizes the likelihood of seeng the data, and it is defined as:
$h_{ML}=arg_h max P(D|h)$
while maximum a posteriori hypothesis is the hypothesis that maximizes the posterir probability of seeng the data, and it is defined as:
$h_{MAP}=arg_h max P(D|h)P(h)$
I am really confused by these two definitions, since I can't grasp what is the difference between the two.
I have understood that the maximum likelihood hypothesis is the one that, given some observed data, finds the parameters of the distribution such that I am most likly to understand the data.
But I cannot unserstand what is the MAP hypothesis.
I have tried to read some interpretations and definitions, mìbut I can't understand the difference between the two.
So, what is the difference between maximum likelihood hypothesis and maximum a posteriori hypothesis?

Brandon Donehoo · Accepted Answer

I'll try and provide some intuition for you here, instead of focusing on the mechanics of the math behind the methods.
Imagine you are evaluating whether a coin is fair or not, so you collect a sequence of heads and tails as your data set. In MLE, we simply look at the data we collected and find the maximum likelihood... this works well when we have no prior knowledge to leverage (i.e. we have no idea if the coin is fair or not).
By contrast, in MAP we take the same likelihood we used in MLE but now multiply by our prior knowledge. For instance, we may strongly suspect that our coin is biased and so we can influence our estimate with that knowledge through a prior distribution. This new estimate is a mixture of what we believe (our prior) and what we measured (our likelihood).
Thinking of two extreme cases here would be 1) if we very strongly believe in our prior then we would need to collect a lot of data to influence the resulting estimate away from the prior. Conversely, if we know very little up front (i.e. we have an uninformative prior) then finding the MAP estimate is equivalent to the MLE estimate because our prior did not influence the result.
Perhaps a visual representation can help too:

MLE is finding the maximum of the green curve.
MAP is 1) multiplying the blue curve by the green curve to create the red curve and 2) finding the maximum of the newly created red curve.

What is the difference between maximum likelihood hypothesis and maximum a posteriori hypothesis?

One Answer

Add your own answers!

Ask a Question