Data Science Asked by Jinglesting on January 22, 2021
Let us say we have a dataset with a feature such as Surname.
arr['Surname'] = ['Smith', 'Jones', 'Johnson', 'Smith']
And I want to encode this categorical info as a new feature like
arr['Surname_Count'] = [2, 1, 1, 2]
with the caveat that it is done within an Sklearn pipeline. Are there quick ways to do this that do not involve rolling my own partition counting transformer?
You can check out Featuretools, which an open source python framework for automated feature engineering. Specifically for you, it can generate aggregation features such as count for your dataset.
After generating the new feature matrix with the desired column, you can use the matrix as you normally would within an Sklearn pipeline.
Answered by Alexander Wang on January 22, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP