Data Science Asked by callmeanythingyouwant on July 20, 2021
While reading the book hands-on ML by Aurelien Geron, I came across this line-
Treat your data transformation choices as hyperparameters, especially
when you are not sure about them (e.g., if you’re not sure whether to
replace missing values with zeros or with the median value, or to just
drop the rows).
How exactly do I do that? Is there a way to do it via sklearn or do I have to manually keep several datasets (each with a different transformation) and then fit models onto all of them?
So, the question talks about how to treat transformation choices as hyper parameters.
How I would go about it is the following:
Use one baseline model architecture for the data and then repeat the following:
Then compare the generalisation performance across all of the data transforms to find the "best" transformation which improves a generalisation metric which is appropriate for your task.
Answered by shepan6 on July 20, 2021
What shepan6 is suggesting is basically to manually search for the best "transformation choice hyperparameters" by trying them all and seeing what performs best.
This is a good idea (I upvoted), but if you want to go further, you can use a package like hyperopt and manually define an "objective" function that accepts a parameter that decides on which transformation to use.
Answered by Itamar Mushkin on July 20, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP