TransWikia.com

What is a distribution-wise asymmetric measure?

Data Science Asked by InAFlash on May 22, 2021

I was trying to understand KL-Divergence, $$D_{KL} langle P(X) Vert P(Y) rangle,$$ and was going through its Wikipedia article. It says the following

In contrast to variation of information, it is a distribution-wise asymmetric measure and thus does not qualify as a statistical metric of spread – it also does not satisfy the triangle inequality.

What is the meaning of distribution-wise asymmetric measure? Is there a symmetric measure? What are the rules that a quantity should follow to be qualified as a statistical metric of spread?

One Answer

What is the meaning of distribution-wise asymmetric measure?

The (forward) KL-divergence is distribution-wise asymmetric because if you calculate it as $$D_{KL} langle P(X) Vert P(Y) rangle$$ where $P(X)$ and $P(Y)$ are two different probability distributions with the latter being the reference distribution, then $$D_{KL} langle P(Y)Vert P(X)rangle neq D_{KL}langle P(X)Vert P(Y)rangle.$$ In other words, the reverse KL-divergence is not equal to the forward KLD. If forward KLD were symmetric, then the above would be an equality, not an inequality.

Is there a symmetric measure?

A distribution-wise symmetric measure would, for example, be mutual information:

$$I(X;Y) = H(X)+H(Y)-H(X,Y) = D_{KL} langle P(X,Y) || P(X) cdot P(Y) rangle,$$ where $H(X)$ is the entropy of a variable $X$'s probability distribution, since $I(Y;X) = I(X;Y)$. Mutual information is a special case of the KLD in which the joint distribution is measured against the product of marginal distributions.

What are the rules that a quantity should follow to be qualified as a statistical metric of spread?

The three axioms that a distance metric should meet are:

  1. Identity of indiscernibles
  2. Symmetry
  3. Sub-additivity or triangle inequality

Since mutual information does not obey the triangle of inequality, it does not fit the full criteria for being a distance metric. Instead, variation of information does meet all of the above requirements and is a true metric:

$$VI(X;Y) = H(X,Y) - I(X;Y)$$ where $H(X,Y)$ is the joint entropy.

Correct answer by develarist on May 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP