TransWikia.com

How to treat data transformation choices as hyperparemeters?

Data Science Asked by callmeanythingyouwant on July 20, 2021

While reading the book hands-on ML by Aurelien Geron, I came across this line-

Treat your data transformation choices as hyperparameters, especially
when you are not sure about them (e.g., if you’re not sure whether to
replace missing values with zeros or with the median value, or to just
drop the rows).

How exactly do I do that? Is there a way to do it via sklearn or do I have to manually keep several datasets (each with a different transformation) and then fit models onto all of them?

2 Answers

So, the question talks about how to treat transformation choices as hyper parameters.

How I would go about it is the following:

Use one baseline model architecture for the data and then repeat the following:

  1. Instantiate the baseline model (effectively make sure all of the weights are initialised)
  2. Create the transformed dataset
  3. Train the model
  4. Compute generalisation performance measures (AUC, precision, recall, whatever).

Then compare the generalisation performance across all of the data transforms to find the "best" transformation which improves a generalisation metric which is appropriate for your task.

Answered by shepan6 on July 20, 2021

What shepan6 is suggesting is basically to manually search for the best "transformation choice hyperparameters" by trying them all and seeing what performs best.

This is a good idea (I upvoted), but if you want to go further, you can use a package like hyperopt and manually define an "objective" function that accepts a parameter that decides on which transformation to use.

Answered by Itamar Mushkin on July 20, 2021

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP