Data Science Asked by Mario Tormo on April 26, 2021
I have a feature that shows a characteristic of the instances. That characteristic can be present or not. If present it shows an almost normal distribution of values (actually a bit skewed to the right, but with a log transformation it becomes normalized). When the characteristic is not present in the instance, the value of the feature is just 0.
So at the end, I have a distribution with a lot of instances with value 0 and a bit far right from it the almost-normal distribution. I would like to split it in two different features: one that shows the absence/presence of the characteristic (easy), and a second that shows only a normal distribution without the annoying peak around zero.
Aren't you providing the answer? You can split the feature in two, namely, if feature_to_split
is the feature you're talking about, you can create feature_to_split_ispresent
which will take either 1 or 0 depending on the presence or absence of that specific characteristic, and feature_to_split_value
which will take the actual value of that characteristic.
Answered by Francesco Alongi on April 26, 2021
I don't have a precise answer to that because it depends on what you want to do with that data. Assuming that your task is supervised learning since is the most popular, just extract that feature will be enough for a model to discriminate between different cases.
EDIT:
Models like linear regression or NN works better under normality regime; in this case I would try these options:
Answered by Mikedev on April 26, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP