Data Science Asked by pragun on October 4, 2021
I am reading everywhere on new questions and blogs that since version 0.20, OneHotEncoder is able to handle string features.
Moreover, the documentation is what looks more ambiguous. Here are the first two lines from the documentation:
Encode categorical integer features as a one-hot numeric array. The input to this transformer should be an array-like of integers or
strings, denoting the values taken on by categorical (discrete)
features.
First line says it
encodes categorical integer features
and the next line says
input should be array like of integers or strings.
When i tried it, i still got the value error.
print(X.columns)
encoder = OneHotEncoder(categorical_features=[1,4,5])
encoder.fit(X)
Index(['age', 'sex', 'bmi', 'children', 'smoker', 'region'], dtype='object')
ValueError: could not convert string to float: 'female'
I am aware of the means to handle encoding of string features with LabelEncoder
, ColumnTransfomer
and pd.getDummies()
but specifically want to understand about this.
This seems to fail only when you're using categorical_features
, which was deprecated at the same time the encoder was extended to strings. Using the now-recommended ColumnTransformer
to specify which columns to encode works with strings (as does applying the encoder to the entire frame, though that's not what you want, with features like bmi
).
E.g.,
onehot = OneHotEncoder(...)
cat_cols = [1,4,5]
preproc = ColumnTransformer(transformers=[('onehot', onehot, cat_cols)],
remainder='passthrough')
preproc.fit_transform(X)
Answered by Ben Reiniger on October 4, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP