TransWikia.com

Dealing with zeros when plotting log-scaled data

Data Science Asked by Elimination on February 24, 2021

I have a non-negative variable and I’d like to plot it, log-scaled

I’m trying to understand how to deal with 0-values. One naive idea I had in mind is just to add 1 to all values (or some very low number greater than 1)

What other options are available?

Thanks

2 Answers

Your suggestion is a valid one, encoding variables with a known outcome once the scaling is applied. Log(1) will become zero, so just keep that in mind for your next stage. You can use clip or replace for this:

df.clip(1, df.max())

or try replacing with a NaN

df.replace(0, np.nan)

Alternatively you could do one of the following:

  1. Drop the zero value rows e.g. df = df[df['column'] !=0] but then you lose some data.
  2. Fill the zero values with a statistically representative value (i.e. interpolation). You can explore the Pandas interpolate method here.

Which ever method you decide upon depends on your use-case, and compatibility with the plotting function you use.

Answered by WBM on February 24, 2021

For this type of issue, I typically add the reciprocal of the log base. For data that's being log10-scaled, this results in adding 0.1 to all values. For data that's being log2-scaled, this results in adding 0.5 to all values. This has the nice property of mapping all of your 0 values to -1 in the log scale, regardless of what log base you use. If your data are very small numerically, you may want to use a higher power of the reciprocal to avoid adding factors that will cause your actual values to vary by several fold. If the data are all between 0 and 0.01, for example, I might add a factor of 0.0001 when log10 scaling.

Answered by Nuclear Hoagie on February 24, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP