ValueError: Input contains NaN, infinity or a value too large for dtype('float32')

Question

I got ValueError when predicting test data using a RandomForest model.

My code:

clf = RandomForestClassifier(n_estimators=10, max_depth=6, n_jobs=1, verbose=2)
clf.fit(X_fit, y_fit)

df_test.fillna(df_test.mean())
X_test = df_test.values  
y_pred = clf.predict(X_test)

The error:

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

How do I find the bad values in the test dataset? Also, I do not want to drop these records, can I just replace them with the mean or median?

Thanks.

fernando · Accepted Answer

With np.isnan(X) you get a boolean mask back with True for positions containing NaNs.

With np.where(np.isnan(X)) you get back a tuple with i, j coordinates of NaNs.

Finally, with np.nan_to_num(X) you "replace nan with zero and inf with finite numbers".

Alternatively, you can use:

sklearn.impute.SimpleImputer for mean / median imputation of missing values, or
pandas' pd.DataFrame(X).fillna(), if you need something other than filling it with zeros.

kmandov · Answer

Assuming X_test is a pandas dataframe, you can use DataFrame.fillna to replace the NaN values with the mean:

X_test.fillna(X_test.mean())

CommonSurname · Answer

For anybody happening across this, to actually modify the original:

X_test.fillna(X_train.mean(), inplace=True)

To overwrite the original:

X_test = X_test.fillna(X_train.mean())

To check if you're in a copy vs a view:

X_test._is_view

Prakash Vanapalli · Answer

I faced similar problem and saw that numpy handles NaN and Inf differently.
Incase if you data has Inf, try this:

np.where(x.values >= np.finfo(np.float64).max)
Where x is my pandas Dataframe

This will be giving a tuple of location of places where NA values are present.

Incase if your data has Nan, try this:

np.isnan(x.values.any())

bmc · Answer

Don't forget

col_mask=df.isnull().any(axis=0)

Which returns a boolean mask indicating np.nan values.

row_mask=df.isnull().any(axis=1)

Which return the rows where np.nan appeared. Then by simple indexing you can flag all of your points that are np.nan.

df.loc[row_mask,col_mask]

Kohn1001 · Answer

Do not forget to check for inf values as well. The only thing that worked for me:

df[df==np.inf]=np.nan
df.fillna(df.mean(), inplace=True)

And even better if you are using sklearn

def replace_missing_value(df, number_features):

imputer = Imputer(strategy="median")
    df_num = df[number_features]
    imputer.fit(df_num)
    X = imputer.transform(df_num)
    res_def = pd.DataFrame(X, columns=df_num.columns)
    return res_def

When number_features would be an array of the number_features labels, for example:

number_features = ['median_income', 'gdp']

Domi W · Answer

Here is the code for how to "Replace NaN with zero and infinity with large finite numbers." using numpy.nan_to_num.

df[:] = np.nan_to_num(df)

Also see fernando's answer.

Natheer Alabsi · Answer

In most cases getting rid of infinite and null values solve this problem.

get rid of infinite values.

df.replace([np.inf, -np.inf], np.nan, inplace=True)

get rid of null values the way you like, specific value such as 999, mean, or create your own function to impute missing values

df.fillna(999, inplace=True)

or

df.fillna(df.mean(), inplace=True)

Piotr Rarus · Answer

If your values are larger than float32, try to run some scaler first. It'd be rather unusual to have deviation spanning more than float32.

Busra Dogan · Answer

You can list your columns that had NaN with this function
df.isnull().sum()

and then you can fill these NAN values in your dataset file. (csv or excel file)

ValueError: Input contains NaN, infinity or a value too large for dtype('float32')

10 Answers

Add your own answers!

Ask a Question