Data Science Asked on July 7, 2021
I want to use a chi square test but I’m unsure if I’m using it right. The KickStarter website shows the frequency of main categories projects. It is updated once a day. I got a data set of KickStarter Projects from 2009 -2016. I wanted to filter the data by year including only projects that launched between jan – jun and count the frequency of the categories. I would then perform multiple tests for each year with what kickStarter posted.
All of the tests give me 0 for the p-values. My goal is to check sampling of the data I got from kaggle by comparing it with kickstarter statistic. Any advice?
Kickstarter Projects
Kickstarted Stats
#Data was collected 6/19/2020
_ = {'Art' : (39310, 44.41), #(Number of projects, Success rate by percent)
'Design' : (41510, 38.37),
'Technology' : (42993, 20.65),
'Film & Video' : (74760, 37.59),
'Music' : (62545, 49.99),
'Fashion' : (31840, 28.16),
'Publishing' : (50215, 33.30),
'Food' : (30106, 25.15),
'Comics' : (16280, 59.24),
'Photography' : (12448, 32.39),
'Theater' : (12282, 60.01),
'Crafts' : (1149, 25.20),
'Journalism' : (5762, 22.81),
'Dance' : (4265, 61.71),
'Games' : (53029, 41.10)
}
tmp2 = [_[tmp][0] for tmp in sorted(_)] # Frequency of main catagories for the year 2020. Six months jan - jun
for year in range(2010, 2017): # Get the first 6 months of every year
tmp1 = list(clean_df[(clean_df['launched'].dt.year == year) & (clean_df['launched'].dt.month <= 6)].main_category.value_counts().sort_index())
print(year, chisquare(f_obs= tmp1, f_exp= tmp2))
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP