How to get the maximum likelihood estimate of the categorical distribution parameters using Lagrange optimization?

Question

Let's say our data is discrete-valued and belongs to one of $K$ classes.
The underlying probability distribution is assumed to be a categorical/multinoulli distribution given as $p(textbf{x}) = prod_{k = 1}^{K}mu_{k}^{x_{k}}$ where x is a one-hot vector given as $textbf{x} = [x_{1} x_{2} ... x_{K}]^{T}$ and $boldsymbol{mu} = [mu_{1} ... mu_{K} ]^{T}$ are the parameters. 
Suppose $D = {mathbf{x}_{1}, text{ } mathbf{x}_{2}, text{ } ... ,text{ }mathbf{x}_{N}}$ is our data.

The log likelihood is: 
$log p(D|boldsymbol{mu}) = sum_{k = 1}^{K} m_{k} log{mu_{k}}$ 
where $m_{k} = sum_{n = 1}^{N} x_{nk}$
To get the MLE solution, we have to solve the following optimization problem:
$max_{boldsymbol{mu}} sum_{k = 1}^{K} m_{k} log{mu_{k}} hskip 1em text{such that} hskip 1em mu_{k} geq 0, hskip  0.5em sum_{k = 1}^{K} mu_{k} = 1$
To solve this we write the following Lagrangian.
$L(boldsymbol{mu}, mathbf{u}, v) = sum_{k = 1}^{K} m_{k} log{mu_{k}} - sum_{k = 1}^{K} u_{k}mu_{k} + vleft( sum_{k = 1}^{K}mu_{k} - 1right)$ 
The primal problem formulation is then 
$boldsymbol{hat{mu}} = inf_{boldsymbol{mu}} sup_{u_{k} geq 0, v} L(boldsymbol{mu}, mathbf{u}, v)$
I have no idea how to proceed further. Have no clue how to solve the primal problem.

How to get the maximum likelihood estimate of the categorical distribution parameters using Lagrange optimization?

Add your own answers!

Ask a Question