Data Science Asked by Pieterism on September 2, 2021
So I have a dataset containing the results of executing problem instances with different given solver strategies. Simplified example:
| Problem_instance | Problem_Size | Used_Solver | Cost |
| P1 | 50 | A | 75 |
| P1 | 50 | B | 125 |
| P1 | 50 | C | 225 |
| P1 | 50 | D | 100 |
| P2 | 150 | A | 165 |
| P2 | 150 | B | 360 |
| P2 | 150 | C | 275 |
| P2 | 150 | D | 45 |
| P3 | 25 | A | 35 |
| P3 | 25 | B | 65 |
| ... | ... | ... | ... |
I’m trying to use machine learning to predict the best performing Solver for a given problem instance. In data processing stage, I need to standardize or scale my data, but I’m not sure how to this best.
Firstly, I’m not sure which sklearn’s Scaler to use (StandardScalar
/ MinMaxScaler
/..).
Secondly, I’m confused how to handle the different records for each instance. When I group the data first based on problem_instance
and then use a MinMaxScaler
, the record with Cost = 0
would be the best solution for this problem and Cost=1
the worst. But if I use the same strategy to scale the Problem_Size
this would be equal to 0 everywhere. On the other hand if I use a global scaling, the information about which Solver is the best for each instance is lost.
Can someone help me how to handle the data preprocessing for this problem?
There's not one right answer to this question, because what scaler works best really depends on the data and the algorithm you use to make the prediction. You should try different scalers combined with different algorithms to decide which preprocessing is best by comparing the cross validation results of each pipeline.
Of course, you don't have unlimited time. I would:
You'll eventually run out of time or patience, but I think going in this order will make the most efficient use of your time.
Remember, there's not 1 best way to do it for every problem.
Answered by Josh on September 2, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP