Data Science Asked by Faozan Indresputra on June 5, 2021
I am still newbie on python with jupyter notebook
I’d like to ask how to solve error "ValueError: Input contains NaN, infinity or a value too large for dtype(‘float32’)"
first I make prediction with these code
from sklearn.preprocessing import OneHotEncoder, LabelEncoder, StandardScaler
from sklearn.compose import ColumnTransformer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score, roc_curve,auc, confusion_matrix
from xgboost import XGBClassifier
from sklearn.linear_model import SGDClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
dec = DecisionTreeClassifier()
ran = RandomForestClassifier(n_estimators=100)
ran2 = RandomForestClassifier(criterion='gini',
n_estimators=1750,
max_depth=7,
min_samples_split=6,
min_samples_leaf=6,
max_features='auto',
oob_score=True,
random_state=42,
n_jobs=-1,
verbose=1)
knn = KNeighborsClassifier(n_neighbors=50) #DISESUAIKAN DENGAN JUMLAH SAMPLE KITA, JANGAN SAMPE SAMPLE kita cuma 83 tapi ditulis 100
sgd = SGDClassifier(max_iter=1000, tol=1e-3)
xgb = XGBClassifier()
naive = GaussianNB()
log = LogisticRegression(random_state = 0)
svc_lin = SVC(kernel = 'linear', random_state = 0)
svc_rbf = SVC(kernel = 'rbf', random_state = 0)
models = {"Decision tree" : dec,
"Random forest" : ran,
"Random forest Tuning" : ran2,
"KNN" : knn,
"SGD" : sgd,
"Gaussian Naive bayes" : naive,
"XGBoost" : xgb,
"Logistic Regression" : log,
"Linear Classifier" : svc_lin,
"RBF Classifier" : svc_rbf}
scores= { }
for key, value in models.items():
model = value
model.fit(x_train, y_train)
scores[key] = model.score(x_test, y_test)
then, see the accuracy score with :
scores_frame1 = pd.DataFrame(scores, index=["Accuracy Score"]).T
scores_frame1.sort_values(by=["Accuracy Score"], axis=0 ,ascending=False, inplace=True)
scores_frame1
after that, I got that "Decision tree method" have high accuracy score, then, I make prediction with my own data set.
preds = dec.predict(df)
preds
however, I got error,
> --------------------------------------------------------------------------- ValueError Traceback (most recent call
> last) <ipython-input-34-4034b7c264f2> in <module>
> ----> 1 preds = dec.predict(hasil1_1)
> 2 preds
>
> ~anaconda3libsite-packagessklearntree_classes.py in
> predict(self, X, check_input)
> 425 """
> 426 check_is_fitted(self)
> --> 427 X = self._validate_X_predict(X, check_input)
> 428 proba = self.tree_.predict(X)
> 429 n_samples = X.shape[0]
>
> ~anaconda3libsite-packagessklearntree_classes.py in
> _validate_X_predict(self, X, check_input)
> 386 """Validate X whenever one tries to predict, apply, predict_proba"""
> 387 if check_input:
> --> 388 X = check_array(X, dtype=DTYPE, accept_sparse="csr")
> 389 if issparse(X) and (X.indices.dtype != np.intc or
> 390 X.indptr.dtype != np.intc):
>
> ~anaconda3libsite-packagessklearnutilsvalidation.py in
> inner_f(*args, **kwargs)
> 71 FutureWarning)
> 72 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
> ---> 73 return f(**kwargs)
> 74 return inner_f
> 75
>
> ~anaconda3libsite-packagessklearnutilsvalidation.py in
> check_array(array, accept_sparse, accept_large_sparse, dtype, order,
> copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples,
> ensure_min_features, estimator)
> 643
> 644 if force_all_finite:
> --> 645 _assert_all_finite(array,
> 646 allow_nan=force_all_finite == 'allow-nan')
> 647
>
> ~anaconda3libsite-packagessklearnutilsvalidation.py in
> _assert_all_finite(X, allow_nan, msg_dtype)
> 95 not allow_nan and not np.isfinite(X).all()):
> 96 type_err = 'infinity' if allow_nan else 'NaN, infinity'
> ---> 97 raise ValueError(
> 98 msg_err.format
> 99 (type_err,
>
> ValueError: Input contains NaN, infinity or a value too large for
> dtype('float32').
the data for prediction is just 110 from total 1278 rows. However, if I divided into 2 dataset, which are from 1-685 and 686-1278, they can run smoothly.
is it because too many dataset will cause error?
help me….
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP