Data Science Asked by Elimination on February 24, 2021
I have a non-negative variable and I’d like to plot it, log-scaled
I’m trying to understand how to deal with 0
-values. One naive idea I had in mind is just to add 1
to all values (or some very low number greater than 1
)
What other options are available?
Thanks
Your suggestion is a valid one, encoding variables with a known outcome once the scaling is applied. Log(1) will become zero, so just keep that in mind for your next stage. You can use clip
or replace
for this:
df.clip(1, df.max())
or try replacing with a NaN
df.replace(0, np.nan)
Alternatively you could do one of the following:
df = df[df['column'] !=0]
but then you lose some data.Which ever method you decide upon depends on your use-case, and compatibility with the plotting function you use.
Answered by WBM on February 24, 2021
For this type of issue, I typically add the reciprocal of the log base. For data that's being log10-scaled, this results in adding 0.1 to all values. For data that's being log2-scaled, this results in adding 0.5 to all values. This has the nice property of mapping all of your 0 values to -1 in the log scale, regardless of what log base you use. If your data are very small numerically, you may want to use a higher power of the reciprocal to avoid adding factors that will cause your actual values to vary by several fold. If the data are all between 0 and 0.01, for example, I might add a factor of 0.0001 when log10 scaling.
Answered by Nuclear Hoagie on February 24, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP