Automating a set a of weekly reports, including graphs and delivery of reports

Question

I have been writing code to automate some weekly reports. I had help over on Stack Overflow. I have code that works for the most part, however there are a few things that I just can't seem to fix.

In short, I loop through the data and create a dictionary of dataframes based on 'location' key unique values. I can use the dictionary to make summary reports for each location. I wanted to make another dictionary from this based on 'sublocation.' Instead with some advice, I make a list of each sublocation, access each item in the df-dict, loop to find corresponding sublocations and make plots.

My problems are as follows:

Code is slow
Graphs are not formatted properly (overlapping even with tight_layout) 
For the reports in sublocation, I am having a hard time saving to the right folder. I think this has to do with the way I want to format the string in the savefig text. For each sublocation I want reference the name using value['location'], I think this is always updated every loop so it doesn't work. 
I have the error exception because when looking to match subloc to loc. not every subloc will appear in the dict value dataframe

f = 'path'
d = pd.DataFrame()
d= pd.read_csv(f)
dfs = dict(tuple(d.groupby('location')))
for key, value in dfs.items():
    try:
        fig, axs = plt.subplots(2, 3);
        sns.countplot(y='ethnic', data=value, orient='h', palette ='colorblind', ax=axs[0,0]);
        sns.countplot(y='Ratio', data=value,orient='v', palette ='colorblind',ax=axs[1,0]);
        sns.countplot(y='site', data = value, ax=axs[0,1]);
        sns.countplot(y='STATUS', data = value, ax = axs[1,1])
        sns.countplot(y='Assessment', data = value, ax = axs[0,2])
        #pth = os.path.join(tmppath, '{0}'.format(key))
        for p in axs.patches:
            ax.text(p.get_x() + p.get_width()/2., p.get_width(), '%d' % int(p.get_width()), 
            fontsize=12, color='red', ha='center', va='bottom')
        plt.tight_layout(pad=2.0, w_pad=1.0, h_pad=2.0);
        plt.set_title('{0}'.format(key)+'Summary')
        plt.savefig("basepath/testing123/{0}/{1}.pdf".format(key,key), bbox_inches = 'tight'); 
        plt.clf()

#plt.show()
    except:
        plt.savefig("basepath/{0}/{1}.pdf".format(key,key), bbox_inches = 'tight');
        #plt.savefig("{0}.pdf".format(key), bbox_inches = 'tight'); 
        pass

#####Now for sublocations

dfss = dict(tuple(d.groupby('site')))

#%%

for key, value in dfss.items():
    a =(repr(value['school_dbn'][:1]))

try:
        fig, axs = plt.subplots(2, 3);
        #tmppath = 'basepath/{0}'.format(key);
        sns.countplot(y='ethnic', data=value, orient='h', palette ='colorblind', ax=axs[0,0]);
        sns.countplot(y='Program]', data=value,orient='v', palette ='colorblind',ax=axs[1,0]);
        sns.countplot(y='AltAssessment', data = value, ax = axs[0,2])
        pth = os.path.join(tmppath, '{0}'.format(key))
        plt.tight_layout(pad=2.0, w_pad=1.0, h_pad=2.0);
        plt.set_title('{0}'.format(key)+'Summary')
        plt.savefig("basepath/{0}/{1}_{2}.pdf".format(value['location'][-6:],value['location'][-6:],key), bbox_inches = 'tight'); 
        plt.clf()

#plt.show()
    except:
        plt.savefig("basepath/testing123/{0}/{1}_{2}.pdf".format(value['location'][-6:],value['location'][-6:],key), bbox_inches = 'tight');
        #plt.savefig("{0}.pdf".format(key), bbox_inches = 'tight'); 
        pass

The reason why I want to save like this is because each location has a folder with same name. Sublocation belongs to only one location, therefore I want to save as 'location_sublocation.pdf'.

Moo10000 · Answer

I got this done by making a second dictionary, which takes locations as keys and values as list of sublocations
dfs = dict(tuple(data.groupby('location')))
dfss = dict(tuple(data.groupby('sublocation')))

dd = {}

for key, value in dfs.items(): #dictionary is made of groupby object, key is 
                               #location, value is datafram
    a = []
    dee={}
    for i in value['sublocation']:
        if i in a:
            pass
        else:
            a.append(str(i))
    dee = {key:a}
    dd.update(dee)
for key, value in dfss.items(): 
    try:
        for k, v in dd.items():
            if key in v:
                dur=str(k)
            else:
                pass
    except:
        pass

Then in the next cell,
for key, value in dfss.items(): 
    try:
        for k, v in dd.items():
            if key in v:
                dur=str(k)
            else:
                pass
        #tmp = value[value['sublocation']==i]
        sns.set(style='white', palette=sns.palplot(sns.color_palette(ui)), font='sans-serif')

I think I can make the overall script run even faster by employing more regex expressions for filtering the dataframe in various steps.
This set-up works because I can save the files according to the key's from the two dictionaries. It allows me to save the nearly 375 files automatically. I use another script to move the files to their respective folders.
plt.savefig("path/{0}/{1} @ {2}.pdf".format(dur,dur,key), bbox_inches = 'tight')

Having a slightly different case, take three data sets and make mini data sets based on some column such as location
oct_dict = dict(tuple(oct.groupby('location')))
oct2_dict = dict(tuple(oct2.groupby('location'))) 
for k, v in oct_dict.items():
    #try:
        #v2 = stu_dict[k]      #sometimes using this try/else method works better
    #else:
        #v2 = pd.DataFrame()
    #try:
        #v3 = oct2_dict[k]
    #else:
        #v3 = pd.DataFrame()
    for k2, v2 in stu_dict.items(): #replace with v2 = stu_dict[k] if you know for sure it exits
        for k3, v3 in oct2_dict.items(): #replace with v3 = oct2_dict[k] if you know for sure it exits
            if k == k2 and k == k3: #can delete this if not needed
                plt.close('all')
                with PdfPages(r'path{}.pdf'.format(k)) as pdf:

Automating a set a of weekly reports, including graphs and delivery of reports

One Answer

Add your own answers!

Ask a Question