python - Modifying Code to work for Month and Week instead of Year -
i making stacked bar plot on year time span x-axis company names, y-axis number of calls, , stacks months.
i want able make plot run time span of month, stacks days, , time span of week, stacks days. having trouble doing since code built around year time span.
my input original input csv file. pulling 2 rows this:
companyname recvd_dttm company1 6/5/2015 18:28:50 pm company2 6/5/2015 14:25:43 pm company3 9/10/2015 21:45:12 pm company4 6/5/2015 14:30:43 pm company5 6/5/2015 14:32:33 pm
then make datatable looks this
pivot_table.head(3) out[12]: month 1 2 3 4 5 6 7 8 9 10 11 12 companyname customer1 17 30 29 39 15 26 24 12 36 21 18 15 customer2 4 11 13 22 35 29 15 18 29 31 17 14 customer3 11 8 25 24 7 15 20 0 21 12 12 17
and code far.
first grab years worth of data (i change month or week question)
# filter countries @ least 1 medal , sort df['recvd_dttm'] = pd.to_datetime(df['recvd_dttm']) #only retrieve data before (ignore typos future dates) mask = df['recvd_dttm'] <= datetime.datetime.now() df = df.loc[mask] # first , last datetime final week of data range_max = df['recvd_dttm'].max() range_min = range_max - pd.dateoffset(years=1) # take slice final week of data df = df[(df['recvd_dttm'] >= range_min) & (df['recvd_dttm'] <= range_max)]
then create pivot_table shown above.
########################################################### #create dataframe ########################################################### df = df.set_index('recvd_dttm') df.index = pd.to_datetime(df.index, format='%m/%d/%y %h:%m') result = df.groupby([lambda idx: idx.month, 'companyname']).agg(len).reset_index() result.columns = ['month', 'companyname', 'numbercalls'] pivot_table = result.pivot(index='month', columns='companyname', values='numbercalls').fillna(0) s = pivot_table.sum().sort(ascending=false,inplace=false) pivot_table = pivot_table.ix[:,s.index[:30]] pivot_table = pivot_table.transpose() pivot_table = pivot_table.reset_index() pivot_table['companyname'] = [str(x) x in pivot_table['companyname']] companies = list(pivot_table['companyname']) pivot_table = pivot_table.set_index('companyname') pivot_table.to_csv('pivot_table.csv')
then use pivot table create ordereddict plotting
########################################################### #create ordereddict plotting ########################################################### months = [pivot_table[(m)].astype(float).values m in range(1, 13)] names = ["jan", "feb", "mar","apr","may","jun","jul","aug","sep","oct","nov", "dec"] months_dict = ordereddict(list(zip(names, months))) ########################################################### #plot! ########################################################### palette = brewer["rdylgn"][8] hover = hovertool( tooltips = [ ("month", "@months"), ("number of calls", "@numbercalls"), ] ) output_file("stacked_bar.html") bar = bar(months_dict, companies, title="number of calls each month", palette = palette, legend = "top_right", width = 1200, height=900, stacked=true) bar.add_tools(hover) show(bar)
does have ideas on how approach modifying code can work shorter time spans? thinking modification in ordereddict section. possibly making len(recvd_dttm) iterate over?
for days in month ('2015-07' say) change
result = df.groupby([lambda idx: idx.month, 'companyname']).agg(len).reset_index()
to like
month = '2015-07' result = df.loc[month].groupby([lambda idx: idx.day, 'companyname']).agg(len).reset_index()
and replace 'month'
'day'
below. wouldn't have bother ordereddict etc. in case ints. week
start, end = '2015-07-06', '2015-07-12' result = df.loc[start: end].groupby( [lambda idx: idx.dayofweek, 'companyname']).agg(len).reset_index()
Comments
Post a Comment