Data Science Asked on August 8, 2021
Currently, I have an imbalanced data set with proportions 84% and 16%. I wanna use VAE as oversampling method and I want to determine the best proportions of data that results in better metrics. Also, I want to use cross-validation for this purpose. The following code is the algorithm that I am using. The problem is that in some cases num_samples might be negative because after diving the data, the ratio changes. For example, in the current model, the proportions are 84 and 16 percent. However, after diving data, this ratio will change like 80-20 %. How should I apply this algorithm such that the results will be fair and comparable?
kfold = StratifiedKFold(n_splits=10, shuffle=False)
# Number of samples to generate
maj = len(y_val_train[y_val_train == 0])
mino = len(y_val_train[y_val_train == 1])
thresholds = np.arange(0.16,0.505,0.02)
measures = np.zeros((np.size(thresholds),3))
no_replication = 10
for train, test in kfold.split(X_train.values, y_train.values):
X_val_train,y_val_train = X_train.values[train], y_train.values[train]
X_val_test, y_val_test = X_train.values[test], y_train.values[test]
counter = 0
for frac in thresholds:
num_samples = int(round(1/(1/frac - 1) * maj - mino))
# Variational Oversampling
vos = VOS(hidden_dim= 1000,
latent_dim=2,
minority_class_id=1,
verbose=1,
epochs=15,
num_samples_to_generate = num_samples,
random_state = 0,
optimizer="Adam")
#Fit the VAE oversampling model and get new data set
X_res_val,y_res_val = vos.fit_oversample(X_val_train,y_val_train)
#--- Fit and predict with Random Forest
logReg_vae = RandomForestClassifier(n_estimators=50, random_state=0)
#fit
lr_vae_fit = logReg_vae.fit(X_res_val, y_res_val)
#predict
pred_lr_vae_val = logReg_vae.predict(X_val_test)
#F1-Score
mom_vae_val = f1_score(y_val_test, pred_lr_vae_val)
measures[counter,0] += f1_score(y_val_test, pred_lr_vae_val)
measures[counter,1] += recall_score(y_val_test, pred_lr_vae_val)
measures[counter,2] += accuracy_score(y_val_test, pred_lr_vae_val)
counter+=1
```
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP