Hypothesis Testing on Prediction from Random Forest Classifier

Data Science Asked by user12840093 on July 27, 2020

I used a random forest classifier to predict 0,1, 0 indicating non-prevalence of a disease, and 1 indicating the disease is prevalent. I want to see if having a feature would increase the chance of zero prevalence.
H0: The prevalence of having a facility = The prevalence of not having the facility
H1: The prevalence of having a facility > The prevalence of not having a facility

If we reject H0, is it sufficient to say that having a particular facility significantly lower the risk of the disease?

stat = pd.DataFrame({'factor':[],'pval':[],'status':[]})
for column in dummies_test.columns[0:-2]:
    zero = dummies_test[dummies_test[column] == 0].result
    one = dummies_test[dummies_test[column] == 1].result
    test = stats.ttest_ind(zero, one)
    if test.pvalue<0.01:
        stat = stat.append({'factor':column, 'pval':test.pvalue,'status': 'Reject H0'},ignore_index =True)
    else: 
        stat = stat.append({'factor':column, 'pval':test.pvalue,'status': 'Insufficient to Reject H0'},ignore_index =True)

hypothesis testing research

Add your own answers!

Ask a Question

Get help from others!

Recent Answers

Jon Church on Why fry rice before boiling?
Joshua Engel on Why fry rice before boiling?
haakon.io on Why fry rice before boiling?
Lex on Does Google Analytics track 404 page responses as valid page views?
Peter Machado on Why fry rice before boiling?