Stratified K Fold Cross Validation in Orange: python script

Data Science Asked on March 18, 2021

I am using Orange to predict customer churn and compare different learners based on accuracy, F1, etc.

As my problem is unbalanced (10% churn – 90% not churn), I want to oversample. However, when using orange, this is not possible to do the oversampling within the cross-validation (test & score block).

Therefore, I want to, based on my input data, generate first 10 folds (stratified – where the distribution 10 % churn / 90 % not churn) is preserved. Then, oversample within each fold to get 50 – 50 distribution. Then, add for each instance the fold number as a feature. Lastly, within the test & score block, do cross validation by feature, namely the fold number. I think I have to implement this myself by using a Python script. Is there anyone that could help me doing this?

Thank you!
Emma

cross validation imbalance orange python

Add your own answers!

Ask a Question

Get help from others!