Data Science Asked by Elvin Ugonna on August 9, 2020
I have a data set which has “Speed” as one of the columns (features). The column contains both zero and non-zero values. I want to randomly set 10% of the non-zero values to zeros. This will change the corresponding “class” label to zeros. I mean any value set to zero, its corresponding class value will be zero as well. I have done this but it is give me errors. Though due to error, I cannot tell it will give me the update/result I want.
file_path = 'Processed_data/data1.csv'
df = pd.read_csv(file_path)
per_change = 0.1
attr = 'Speed'
target = 'Class'
df_spd = df[df['Speed'] > 0.]
num_rows_to_change = int(df.shape[0] * per_change)
num_with_zero_initial = df[df[attr] == 0].shape[0]
assert df_spd.shape[0] > num_rows_to_change,
'Number of rows with non-zero speed is less than 10% of the original dataset.'
df_update = df_spd.sample(num_rows_to_change)
df_update[attr] = 0.
df_update[target] = 0.
df.update(df_update)
update_list = df_update.index.tolist()
num_with_zero_final = df[df['Speed'] == 0].shape[0]
assert num_with_zero_final == num_with_zero_initial + num_rows_to_change,
'Number of rows needed to change not equal to number of rows changed.'
df.to_csv('changed.csv')
FYI, I did not go through your code as it is fairly straight forward assuming I had understood it right.
>> import pandas as pd
>> import random
>> df = pd.DataFrame({'a': np.random.rand(10), b: np.random.rand(10)})
>> print(df)
a b
0 0.127409 0.508811
1 0.345239 0.674797
2 0.824521 0.381567
3 0.893538 0.062142
4 0.307070 0.769546
5 0.872883 0.175192
6 0.046671 0.592971
7 0.799977 0.632761
8 0.932829 0.456906
9 0.188867 0.470296
>> idx = random.sample(df.index, int(len(df)*0.2)) # random indices from dataframe selection, 20% of them are selected based on 0.2
>> print(idx)
[6, 0]
>> df[df.index.isin(idx)] = [0, 0]
>> df
a b
0 0.000000 0.000000
1 0.345239 0.674797
2 0.824521 0.381567
3 0.893538 0.062142
4 0.307070 0.769546
5 0.872883 0.175192
6 0.000000 0.000000
7 0.799977 0.632761
8 0.932829 0.456906
9 0.188867 0.470296
Hope it helps.
Answered by Kiritee Gak on August 9, 2020
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP