Code Review Asked by Moo10000 on November 11, 2021
I have been writing code to automate some weekly reports. I had help over on Stack Overflow. I have code that works for the most part, however there are a few things that I just can’t seem to fix.
In short, I loop through the data and create a dictionary of dataframes based on ‘location’ key unique values. I can use the dictionary to make summary reports for each location. I wanted to make another dictionary from this based on ‘sublocation.’ Instead with some advice, I make a list of each sublocation, access each item in the df-dict, loop to find corresponding sublocations and make plots.
My problems are as follows:
f = 'path'
d = pd.DataFrame()
d= pd.read_csv(f)
dfs = dict(tuple(d.groupby('location')))
for key, value in dfs.items():
try:
fig, axs = plt.subplots(2, 3);
sns.countplot(y='ethnic', data=value, orient='h', palette ='colorblind', ax=axs[0,0]);
sns.countplot(y='Ratio', data=value,orient='v', palette ='colorblind',ax=axs[1,0]);
sns.countplot(y='site', data = value, ax=axs[0,1]);
sns.countplot(y='STATUS', data = value, ax = axs[1,1])
sns.countplot(y='Assessment', data = value, ax = axs[0,2])
#pth = os.path.join(tmppath, '{0}'.format(key))
for p in axs.patches:
ax.text(p.get_x() + p.get_width()/2., p.get_width(), '%d' % int(p.get_width()),
fontsize=12, color='red', ha='center', va='bottom')
plt.tight_layout(pad=2.0, w_pad=1.0, h_pad=2.0);
plt.set_title('{0}'.format(key)+'Summary')
plt.savefig("basepath/testing123/{0}/{1}.pdf".format(key,key), bbox_inches = 'tight');
plt.clf()
#plt.show()
except:
plt.savefig("basepath/{0}/{1}.pdf".format(key,key), bbox_inches = 'tight');
#plt.savefig("{0}.pdf".format(key), bbox_inches = 'tight');
pass
#####Now for sublocations
dfss = dict(tuple(d.groupby('site')))
#%%
for key, value in dfss.items():
a =(repr(value['school_dbn'][:1]))
try:
fig, axs = plt.subplots(2, 3);
#tmppath = 'basepath/{0}'.format(key);
sns.countplot(y='ethnic', data=value, orient='h', palette ='colorblind', ax=axs[0,0]);
sns.countplot(y='Program]', data=value,orient='v', palette ='colorblind',ax=axs[1,0]);
sns.countplot(y='AltAssessment', data = value, ax = axs[0,2])
pth = os.path.join(tmppath, '{0}'.format(key))
plt.tight_layout(pad=2.0, w_pad=1.0, h_pad=2.0);
plt.set_title('{0}'.format(key)+'Summary')
plt.savefig("basepath/{0}/{1}_{2}.pdf".format(value['location'][-6:],value['location'][-6:],key), bbox_inches = 'tight');
plt.clf()
#plt.show()
except:
plt.savefig("basepath/testing123/{0}/{1}_{2}.pdf".format(value['location'][-6:],value['location'][-6:],key), bbox_inches = 'tight');
#plt.savefig("{0}.pdf".format(key), bbox_inches = 'tight');
pass
The reason why I want to save like this is because each location has a folder with same name. Sublocation belongs to only one location, therefore I want to save as ‘location_sublocation.pdf’.
I got this done by making a second dictionary, which takes locations as keys and values as list of sublocations
dfs = dict(tuple(data.groupby('location')))
dfss = dict(tuple(data.groupby('sublocation')))
dd = {}
for key, value in dfs.items(): #dictionary is made of groupby object, key is
#location, value is datafram
a = []
dee={}
for i in value['sublocation']:
if i in a:
pass
else:
a.append(str(i))
dee = {key:a}
dd.update(dee)
for key, value in dfss.items():
try:
for k, v in dd.items():
if key in v:
dur=str(k)
else:
pass
except:
pass
Then in the next cell,
for key, value in dfss.items():
try:
for k, v in dd.items():
if key in v:
dur=str(k)
else:
pass
#tmp = value[value['sublocation']==i]
sns.set(style='white', palette=sns.palplot(sns.color_palette(ui)), font='sans-serif')
I think I can make the overall script run even faster by employing more regex expressions for filtering the dataframe in various steps.
This set-up works because I can save the files according to the key's from the two dictionaries. It allows me to save the nearly 375 files automatically. I use another script to move the files to their respective folders.
plt.savefig("path/{0}/{1} @ {2}.pdf".format(dur,dur,key), bbox_inches = 'tight')
Having a slightly different case, take three data sets and make mini data sets based on some column such as location
oct_dict = dict(tuple(oct.groupby('location')))
oct2_dict = dict(tuple(oct2.groupby('location')))
for k, v in oct_dict.items():
#try:
#v2 = stu_dict[k] #sometimes using this try/else method works better
#else:
#v2 = pd.DataFrame()
#try:
#v3 = oct2_dict[k]
#else:
#v3 = pd.DataFrame()
for k2, v2 in stu_dict.items(): #replace with v2 = stu_dict[k] if you know for sure it exits
for k3, v3 in oct2_dict.items(): #replace with v3 = oct2_dict[k] if you know for sure it exits
if k == k2 and k == k3: #can delete this if not needed
plt.close('all')
with PdfPages(r'path{}.pdf'.format(k)) as pdf:
Answered by Moo10000 on November 11, 2021
Get help from others!
Recent Questions
Recent Answers
© 2024 TransWikia.com. All rights reserved. Sites we Love: PCI Database, UKBizDB, Menu Kuliner, Sharing RPP