Data Science Asked on June 23, 2021
Is there any index that measures similarity between 2 gaussian distributions of 1-D data (may have slightly different number of points) considering their mean shift, variance shift, difference in shapes(like one is symmetric and the other is skewed) etc. and gives similarity between [0,1]?
I am using Hedges’ index for the same but it does not give a similarity index between 0 and 1. It can be greater than 1 as well, so it is difficult to interpret it.
Also, no pattern of the data is known beforehand, if it helps in any way for the answer.
One method is Kolmogorov-Smirnov test. Kolmogorov-Smirnov test checks whether two samples are drawn from the same continuous distribution where sample sizes can be different. It's p-value is close to 0 when two samples follow the same distribution and close to 1 when they do not follow the same distribution. So you can use 1 - (p-value) as a similarity metric.
import numpy as np
from scipy.stats import ks_2samp
np.random.seed(52)
n1 = 200
n2 = 300
mu_1 = 5
mu_2 = 5.1
sigma_1 = 0.3
sigma_2 = 0.2
sample_1 = np.random.normal(mu_1, sigma_1, n1)
sample_2 = np.random.normal(mu_2, sigma_2, n2)
result = ks_2samp(sample_1, sample_2)
print(result.pvalue)
1.4998994601889137e-08
Note that there are also other methods such as Bhattacharyya distance, Kullback–Leibler divergence. Some implementations for Kullback-Leibner can be found also here.
Answered by Orkun Berk Yuzbasioglu on June 23, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP