TransWikia.com

Find non overlapping area between two kde plots in python

Data Science Asked by astro123 on October 21, 2020

I was attempting to determine whether a feature is important or not base on its kde distribution for target variable. I am aware how to plot the kde plot and guess after looking at the plots, but is there a more formal doing this? Such as can we calculate the area of non overlapping area between two curves?

When I googled for the area between two curves there are many many links but none of them could solve my exact problem.

NOTE:
The main aim of this plot is to find whether the feature is important or not. So, please suggest me further if I am missing any hidden concepts here.

What I am trying to do is set some threshold such as 0.2, if the non-overlapping area > 0.2, then assert that the feature is important, otherwise not.

MWE:

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset('titanic')

x0 = df.loc[df['survived']==0,'fare']
x1 = df.loc[df['survived']==1,'fare']

sns.kdeplot(x0,shade=1)
sns.kdeplot(x1,shade=1)

Output

enter image description here

Similar links

One Answer

There are different ways to measure the similarity between two functions.

One option is to define the overlap between both functions as their dot-product:

# ensure both functions are normalized (self-overlap = 1)
x0 /= np.dot(x0, x0)
x1 /= np.dot(x1,x1)
overlap = np.dot(x0,x1)

Instead of multiplying the individual function values as above, you may calculate their difference and take, for example, the mean. This is similar to a loss function in machine learning:

d = np.absolute(x0 - x1)
mae = np.mean(d)    # mean absolute error
mse = np.mean(d**2) # mean square error

If the data is represented on different grid this approach won't work. But you can interpolate your functions and represent them on a new, common grid. A basic example is available in the SciPy documentation. The interpolated data can then be used in the above code snippets.

Answered by Feodoran on October 21, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP