How to apply variational autoencoder for oversampling with cross-validation?

Question

Currently, I have an imbalanced data set with proportions 84% and 16%. I wanna use VAE as oversampling method and I want to determine the best proportions of data that results in better metrics. Also, I want to use cross-validation for this purpose. The following code is the algorithm that I am using. The problem is that in some cases num_samples might be negative because after diving the data, the ratio changes. For example, in the current model, the proportions are 84 and 16 percent. However, after diving data, this ratio will change like 80-20 %. How should I apply this algorithm such that the results will be fair and comparable?
kfold = StratifiedKFold(n_splits=10, shuffle=False)

# Number of samples to generate
maj = len(y_val_train[y_val_train == 0])
mino = len(y_val_train[y_val_train == 1])

thresholds = np.arange(0.16,0.505,0.02)
measures = np.zeros((np.size(thresholds),3))

no_replication = 10

for train, test in kfold.split(X_train.values, y_train.values):
    
    X_val_train,y_val_train = X_train.values[train], y_train.values[train]
    X_val_test, y_val_test = X_train.values[test], y_train.values[test]

counter = 0

for frac in thresholds:

num_samples = int(round(1/(1/frac - 1) * maj - mino))

# Variational Oversampling 
        vos = VOS(hidden_dim= 1000,
                                latent_dim=2,
                                minority_class_id=1,
                                verbose=1,
                                epochs=15,
                                num_samples_to_generate = num_samples,
                                random_state = 0,
                                optimizer="Adam")

#Fit the VAE oversampling model and get new data set
        X_res_val,y_res_val = vos.fit_oversample(X_val_train,y_val_train)

#--- Fit and predict with Random Forest
        logReg_vae = RandomForestClassifier(n_estimators=50, random_state=0)

#fit
        lr_vae_fit = logReg_vae.fit(X_res_val, y_res_val)

#predict
        pred_lr_vae_val = logReg_vae.predict(X_val_test)

#F1-Score
        mom_vae_val = f1_score(y_val_test, pred_lr_vae_val)
        measures[counter,0] += f1_score(y_val_test, pred_lr_vae_val)
        measures[counter,1] += recall_score(y_val_test, pred_lr_vae_val)
        measures[counter,2] += accuracy_score(y_val_test, pred_lr_vae_val)

counter+=1
 ```

How to apply variational autoencoder for oversampling with cross-validation?

Add your own answers!

Ask a Question