Chi Square Test Goodness of Fit

Question

I want to use a chi square test but I'm unsure if I'm using it right. The KickStarter website shows the frequency of main categories projects. It is updated once a day. I got a data set of KickStarter Projects from 2009 -2016. I wanted to filter the data by year including only projects that launched between jan - jun and count the frequency of the categories. I would then perform multiple tests for each year with what kickStarter posted.
All of the tests give me 0 for the p-values. My goal is to check sampling of the data I got from kaggle by comparing it with kickstarter statistic. Any advice?
Kickstarter Projects
Kickstarted Stats
#Data was collected 6/19/2020
_ = {'Art' : (39310, 44.41), #(Number of projects, Success rate by percent)
    'Design' : (41510, 38.37),
    'Technology' : (42993, 20.65),
    'Film & Video' : (74760, 37.59),
    'Music' : (62545, 49.99),
    'Fashion' : (31840, 28.16),
    'Publishing' : (50215, 33.30),
    'Food' : (30106, 25.15),
    'Comics' : (16280, 59.24),
    'Photography' : (12448, 32.39),
    'Theater' : (12282, 60.01),
    'Crafts' : (1149, 25.20),
    'Journalism' : (5762, 22.81),
    'Dance' : (4265, 61.71),
    'Games' : (53029, 41.10)
}

tmp2 = [_[tmp][0] for tmp in sorted(_)]  # Frequency of main catagories for the year 2020. Six months jan - jun

for year in range(2010, 2017):           # Get the first 6 months of every year
        tmp1 = list(clean_df[(clean_df['launched'].dt.year == year) & (clean_df['launched'].dt.month <= 6)].main_category.value_counts().sort_index())
        print(year, chisquare(f_obs= tmp1, f_exp= tmp2))

Chi Square Test Goodness of Fit

Add your own answers!

Ask a Question