Cross Validated Asked by crimson_idiot on November 18, 2021
I’ve generated a PDF of binned data using the python package binsmooth. The PDF is plotted in the following image:
I am trying to smooth the PDF so as to provide a more intuitive interpretation of how a particular distribution (in this case, an income distribution) changes over time. I’d like for the existing kinks / discontinuities to be smoothed over, but the general shape of the distribution to be preserved.
I tried using scipy.interpolate.make_interp_spline, but this failed to generate a smoothed chart, even after experimenting with different values of k
. I also tried using a polynomial smoother, but this failed to capture the asymptotic nature of the right tail.
Here’s the code that was used to generate the PDF:
bin_edges = np.array([0, 5000, 10000, 15000, 20000, 25000, 35000, 50000,
75000, 100000, 150000])
test_counts = np.array([0, 2557937, 3391469, 3943782, 3413864, 3314342, 5480426,
6113560, 5939968, 2702192, 1909488, 927019])
# create binsmooth object
bs = BinSmooth()
# fit binsmooth object, supplying sample mean through m
bs.fit(bin_edges, test_counts, m=41179)
# generate the x-axis linespace
test_x_values = np.arange(0, 200000, 1)
# extract the pdf
pdf_values = bs.pdf(test_x_values)
# plot
f = plt.figure(figsize=(20, 5))
plt.plot(test_x_values, pdf_values)
Any help / insight would be greatly appreciated!
Kernel Density Estimation (KDE) is a non-parametric technique for estimating the PDF of a random variable. When applying KDE you select the bandwidth parameter, which controls the degree of smoothness. Note that a bias-variance tradeoff takes place: as variance decreases (and the graph becomes smoother), bias is increased.
Answered by Akylas Stratigakos on November 18, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP