Data Science Asked by Vijetha Gattupalli on January 7, 2021
Given two sets of samples drawn from two different distributions, is it computationally possible to get an estimate of KL-Divergence between the two distribution using these samples?
Here I am assuming the dimensionality of the two distributions is high (say d). To compute the estimate, we first need to discretize the entire space and then estimate probabilities based on the frequencies. Let us say, we discretize each dimension into p bins. Then the total number of grids in the space will be $p^d$. So we need to compute the probabilities of the two distributions for $p^d$ grids, which is exponential in time. Hence I assume we cannot compute an estimate of KL Divergence using the samples for any practical problem.
I wanted to check if this explanation is correct or if I am missing anything. Could someone assert if this rationale is correct?
Check this article. They use k-NN to interpolate the values of P(x) and Q(x), so that you can use the KL-divergence formula with 'approximated histograms'.
Answered by Carlos Pinzón on January 7, 2021
There is no need to discretize the space since KL divergence can be calculated for continuous spaces.
Yes - you can calculate the difference between samples using KL divergence.
Based on differences between samples, estimating a possible difference in populations is the core of statistical inference. It is a very complex issue.
Answered by Brian Spiering on January 7, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP