Data Science Asked on August 20, 2021
I am working with typing data with timing features(unit: ms) and some of the features are based on the keyboard keyCodes(positive integers, range:[8, 222]). Currently, I use StandardScaler()
by scikit-learn to scale all the features, so that my learning models do not overweigh the keyCode based features. I would like dicretize the keyCode based features and run the StandardScaler()
for the timing features only. How can I go about this?
Not sure about your question, but maybe something like the following could help:
import pandas as pd
import random
from sklearn.preprocessing import StandardScaler
# Let's assume the following dataframe
data = {
'KeyCodes': [random.randrange(2, 223, 1) for _ in range(10000)],
'age': [random.randrange(1, 100) for _ in range(10000)],
'id': [i for i in range(10000)]
}
# First you need to create the dummy variables based in KeyCodes
df = pd.DataFrame.from_dict(data)
df.head()
dummies = pd.get_dummies(df['KeyCodes']).rename(columns=lambda x: f'KeyCode_{x}')
df = pd.concat([df, dummies], axis=1)
df.drop(['KeyCodes'], inplace=True, axis=1)
# Then you can apply the normalization to the subset of features you wish to normalize
normalized_df = df.copy()
col_names = [f'KeyCode_{i}' for i in range(8, 223)]
features_normalized = df[col_names]
scaler = StandardScaler().fit(features_normalized.values)
features_normalized = scaler.transform(features_normalized.values)
normalized_df[col_names] = features_normalized
# Explore the output
normalized_df['age'][:10] # you can see it was not normalized
set(normalized_df['KeyCodes_8']) # normalized version of the feature
set(df['KeyCodes_8']) # not normalized version of the feature
Answered by glhuilli on August 20, 2021
Get help from others!
Recent Answers
Recent Questions
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP