Data Science Asked by Akira on August 8, 2021
I’m reading the paper Algorithms for nonnegative matrix factorization with the β-divergence by Cédric Févotte and Jérôme Idier. Package scikit-learn uses their algorithm for module sklearn.decomposition.NMF. In section 4.1, they said
An MM algorithm can be derived by minimizing the auxiliary function $G(mathbf{h} mid tilde{mathbf{h}})$ w.r.t to $mathbf{h}$. Given the convexity and the separability of the auxiliary function the optimum is obtained by canceling the gradient given by Eq. (36). This is trivially done and leads to the following update:
$$
h_{k}^{mathrm{MM}} = tilde{h}_{k}left(frac{sum_{f} w_{f k} v_{f} tilde{v}_{f}^{beta-2}}{sum_{f} w_{f k} tilde{v}_{f}^{beta-1}}right)^{gamma(beta)}.
$$
The gradient in Eq. (36) is
This gradient depends on our choice of the decomposition of $beta$-divergence. I don’t get how the authors obtain such an explicit formula for $h_{k}^{mathrm{MM}}$. Could you please elaborate on this issue?
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP