TransWikia.com

Hypothesis Testing on Prediction from Random Forest Classifier

Data Science Asked by user12840093 on July 27, 2020

I used a random forest classifier to predict 0,1, 0 indicating non-prevalence of a disease, and 1 indicating the disease is prevalent. I want to see if having a feature would increase the chance of zero prevalence.
H0: The prevalence of having a facility = The prevalence of not having the facility
H1: The prevalence of having a facility > The prevalence of not having a facility

If we reject H0, is it sufficient to say that having a particular facility significantly lower the risk of the disease?

stat = pd.DataFrame({'factor':[],'pval':[],'status':[]})
for column in dummies_test.columns[0:-2]:
    zero = dummies_test[dummies_test[column] == 0].result
    one = dummies_test[dummies_test[column] == 1].result
    test = stats.ttest_ind(zero, one)
    if test.pvalue<0.01:
        stat = stat.append({'factor':column, 'pval':test.pvalue,'status': 'Reject H0'},ignore_index =True)
    else: 
        stat = stat.append({'factor':column, 'pval':test.pvalue,'status': 'Insufficient to Reject H0'},ignore_index =True)

Add your own answers!

Ask a Question

Get help from others!

© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP