Data Science Asked on March 18, 2021
I am using Orange to predict customer churn and compare different learners based on accuracy, F1, etc.
As my problem is unbalanced (10% churn – 90% not churn), I want to oversample. However, when using orange, this is not possible to do the oversampling within the cross-validation (test & score block).
Therefore, I want to, based on my input data, generate first 10 folds (stratified – where the distribution 10 % churn / 90 % not churn) is preserved. Then, oversample within each fold to get 50 – 50 distribution. Then, add for each instance the fold number as a feature. Lastly, within the test & score block, do cross validation by feature, namely the fold number. I think I have to implement this myself by using a Python script. Is there anyone that could help me doing this?
Thank you!
Emma
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP