TransWikia.com

How can I count the number of occurrences of a category in dataset as part of an Sklearn Pipeline

Data Science Asked by Jinglesting on January 22, 2021

Let us say we have a dataset with a feature such as Surname.

arr['Surname'] = ['Smith', 'Jones', 'Johnson', 'Smith']

And I want to encode this categorical info as a new feature like

arr['Surname_Count'] = [2, 1, 1, 2]

with the caveat that it is done within an Sklearn pipeline. Are there quick ways to do this that do not involve rolling my own partition counting transformer?

One Answer

You can check out Featuretools, which an open source python framework for automated feature engineering. Specifically for you, it can generate aggregation features such as count for your dataset.

After generating the new feature matrix with the desired column, you can use the matrix as you normally would within an Sklearn pipeline.

Answered by Alexander Wang on January 22, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP