TransWikia.com

My small script on value alteration in columns of a data not working

Data Science Asked by Elvin Ugonna on August 9, 2020

I have a data set which has “Speed” as one of the columns (features). The column contains both zero and non-zero values. I want to randomly set 10% of the non-zero values to zeros. This will change the corresponding “class” label to zeros. I mean any value set to zero, its corresponding class value will be zero as well. I have done this but it is give me errors. Though due to error, I cannot tell it will give me the update/result I want.

file_path = 'Processed_data/data1.csv'  
df = pd.read_csv(file_path)  
per_change = 0.1  
attr = 'Speed'  
target = 'Class'  
df_spd = df[df['Speed'] > 0.]

num_rows_to_change = int(df.shape[0] * per_change)  
num_with_zero_initial = df[df[attr] == 0].shape[0]  
assert df_spd.shape[0] > num_rows_to_change,   
    'Number of rows with non-zero speed is less than 10% of the original   dataset.'  
df_update = df_spd.sample(num_rows_to_change)  
df_update[attr] = 0.  
df_update[target] = 0.  
df.update(df_update)  
update_list = df_update.index.tolist()  
num_with_zero_final = df[df['Speed'] == 0].shape[0]  
assert num_with_zero_final == num_with_zero_initial + num_rows_to_change,   
    'Number of rows needed to change not equal to number of rows changed.'  
df.to_csv('changed.csv')

One Answer

FYI, I did not go through your code as it is fairly straight forward assuming I had understood it right.

>> import pandas as pd
>> import random
>> df = pd.DataFrame({'a': np.random.rand(10), b: np.random.rand(10)})
>> print(df)
          a         b
0  0.127409  0.508811
1  0.345239  0.674797
2  0.824521  0.381567
3  0.893538  0.062142
4  0.307070  0.769546
5  0.872883  0.175192
6  0.046671  0.592971
7  0.799977  0.632761
8  0.932829  0.456906
9  0.188867  0.470296
>> idx = random.sample(df.index, int(len(df)*0.2)) # random indices from dataframe selection, 20% of them are selected based on 0.2
>> print(idx)
[6, 0]
>> df[df.index.isin(idx)] = [0, 0]
>> df
          a         b
0  0.000000  0.000000
1  0.345239  0.674797
2  0.824521  0.381567
3  0.893538  0.062142
4  0.307070  0.769546
5  0.872883  0.175192
6  0.000000  0.000000
7  0.799977  0.632761
8  0.932829  0.456906
9  0.188867  0.470296

Hope it helps.

Answered by Kiritee Gak on August 9, 2020

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP