TransWikia.com

ValueError: Input contains NaN, infinity or a value too large for dtype('float32')

Data Science Asked on February 26, 2021

I got ValueError when predicting test data using a RandomForest model.

My code:

clf = RandomForestClassifier(n_estimators=10, max_depth=6, n_jobs=1, verbose=2)
clf.fit(X_fit, y_fit)

df_test.fillna(df_test.mean())
X_test = df_test.values  
y_pred = clf.predict(X_test)

The error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

How do I find the bad values in the test dataset? Also, I do not want to drop these records, can I just replace them with the mean or median?

Thanks.

10 Answers

With np.isnan(X) you get a boolean mask back with True for positions containing NaNs.

With np.where(np.isnan(X)) you get back a tuple with i, j coordinates of NaNs.

Finally, with np.nan_to_num(X) you "replace nan with zero and inf with finite numbers".

Alternatively, you can use:

  • sklearn.impute.SimpleImputer for mean / median imputation of missing values, or
  • pandas' pd.DataFrame(X).fillna(), if you need something other than filling it with zeros.

Correct answer by fernando on February 26, 2021

Assuming X_test is a pandas dataframe, you can use DataFrame.fillna to replace the NaN values with the mean:

X_test.fillna(X_test.mean())

Answered by kmandov on February 26, 2021

For anybody happening across this, to actually modify the original:

X_test.fillna(X_train.mean(), inplace=True)

To overwrite the original:

X_test = X_test.fillna(X_train.mean())

To check if you're in a copy vs a view:

X_test._is_view

Answered by CommonSurname on February 26, 2021

I faced similar problem and saw that numpy handles NaN and Inf differently.
Incase if you data has Inf, try this:

np.where(x.values >= np.finfo(np.float64).max)
Where x is my pandas Dataframe 

This will be giving a tuple of location of places where NA values are present.

Incase if your data has Nan, try this:

np.isnan(x.values.any())

Answered by Prakash Vanapalli on February 26, 2021

Don't forget

col_mask=df.isnull().any(axis=0) 

Which returns a boolean mask indicating np.nan values.

row_mask=df.isnull().any(axis=1)

Which return the rows where np.nan appeared. Then by simple indexing you can flag all of your points that are np.nan.

df.loc[row_mask,col_mask]

Answered by bmc on February 26, 2021

Do not forget to check for inf values as well. The only thing that worked for me:

df[df==np.inf]=np.nan
df.fillna(df.mean(), inplace=True)

And even better if you are using sklearn

def replace_missing_value(df, number_features):

    imputer = Imputer(strategy="median")
    df_num = df[number_features]
    imputer.fit(df_num)
    X = imputer.transform(df_num)
    res_def = pd.DataFrame(X, columns=df_num.columns)
    return res_def

When number_features would be an array of the number_features labels, for example:

number_features = ['median_income', 'gdp']

Answered by Kohn1001 on February 26, 2021

Here is the code for how to "Replace NaN with zero and infinity with large finite numbers." using numpy.nan_to_num.

df[:] = np.nan_to_num(df)

Also see fernando's answer.

Answered by Domi W on February 26, 2021

In most cases getting rid of infinite and null values solve this problem.

get rid of infinite values.

df.replace([np.inf, -np.inf], np.nan, inplace=True)

get rid of null values the way you like, specific value such as 999, mean, or create your own function to impute missing values

df.fillna(999, inplace=True)

or

df.fillna(df.mean(), inplace=True)

Answered by Natheer Alabsi on February 26, 2021

If your values are larger than float32, try to run some scaler first. It'd be rather unusual to have deviation spanning more than float32.

Answered by Piotr Rarus on February 26, 2021

You can list your columns that had NaN with this function

df.isnull().sum()

and then you can fill these NAN values in your dataset file. (csv or excel file)

Answered by Busra Dogan on February 26, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP