Data Science Asked on September 5, 2021
I have a data set that has a few columns such as:
Total cost: mean = 3,000,000
Percent complete: mean = 50
final profit %: mean = 14
I know with such different orders of magnitude before I fit a linear regression I should standardize the data (using python and sklearn). The problem is there are negatives in this data that I need to keep so I don’t know which type of standardization I should use? The only two I am familiar with are log transformations and StandardScaler both of which I think get rid of negatives.
You can use Normalization. Normalization rescale your mean to 0 and standard deviation to 1 containing both positive and negative value.
$X_{Normalised} = frac{X - mu}{sigma}$
Here $mu$ is your original mean and $sigma$ is your standard deviation.
Answered by SrJ on September 5, 2021
You can still use StandardScaler() as it will keep the negative values. If you think you have a few outliers, and want to reduce their influence, you can also look at RobustScaler().
Answered by Donald S on September 5, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP