我创建了一个字典的字典,其结构如下:键是部门(“ABC”),然后日期(01.08)是键,值是{产品名称(A),单位(0),收入(0)}。多个部门继续采用这种结构。请参阅下面的字典打印输出。
'ABC': 01.08 \
A. Units 0
Revenue 0
B. Units 0
Revenue 0
C. Units 0
Revenue 0
D. Units 0
Revenue 0
此外,我还使用 groupby 和聚合函数(sum)创建了一个数据框,以获取每个部门每天的单位总数和收入(这是两个级别的聚合,而不是字典中的三个级别 - date 、 Department、产品)。
打印 df,它是单位数量和总收入的聚合,结果是:
print df.ix['ABC']
Total Overall Units \
dates
2016-08-01 2
2016-08-02 0
2016-08-03 2
2016-08-04 1
2016-08-22 2
Total Overall Revenue \
dates
2016-08-01 20
2016-08-02 500
2016-08-03 39
2016-08-04 50
我目前最终得到两个单独的对象,我想合并/附加它们,以便将总单位和总收入添加到字典末尾的正确位置(即映射到正确的部门和日期)。 目前,我正在按“部门”分别打印字典,然后打印数据框
pd.to html
,所以我留下了两个单独的表。它们不仅是分开的,而且从 df 创建的表也少了一列,因为它们的分组方式不同。
'ABC':
01.08 | 02.08 | 03.08 | 04.08
A Total Units 0 0 0 0
Total Revenue 0 0 0 0
B Total Units 0 0 0 0
Total Revenue 0 0 0 0
C Total Units 0 0 0 0
Total Revenue 0 0 0 0
D Total Units 0 0 0 0
Total Revenue 0 0 0 0
Total Overall Units 0 0 0 0
Total Overall Revenue 0 0 0 0
有什么想法吗?
跳到问题#2:我建议使用单个数据框来存储所有信息。与将列式数据保存在字典的字典中相比,使用起来要容易得多。将日期设置为主索引,并为每个字段使用单独的列(“deptA-revenue”)或使用多重索引。然后,您可以将每日总计存储为同一数据框中的列。
要按所需顺序打印,您需要调换日期字典中的行和列。 执行此操作时,对行进行总计可能是最简单的方法。 这使得您提到的第二个对象变得不必要。 除了格式之外,类似这样的东西应该可以工作:
for dept, dates in df.items():
# Transpose the rows and columns into two new dictionaries
# called units and revenue. At the same time, total the
# units and revenue into two new "zztotal" entries.
units = { "zztotal" : {}}
revenues = { "zztotal" : {}}
for date, products in dates.items():
for product, stats in products.items():
name = stats["name"]
if not name in units:
units[name] = {}
revenues[name] = {}
units[name][date] = stats["units"]
revenue[name][date] = stats["revenue"]
if not date in units["zztotal"]:
units["zztotal"][date] = 0
revenue["zztotal"][date] = 0
units["zzotal"][date] += stats["units"]
revenue["zzotal"][date] += stats["revenue"}
# At this point we are ready to print the transposed
# dictionaries. Work is needed to line up the columns
# so the printout is attractive.
print dept
print sorted(dates.keys())
for name, dates in sorted(units.items()):
if name != "zztotal":
print name, "Total Units", [
units[date] for date in sorted(dates)]
print "Total Revenue", [
revenue[date] for date in sorted(dates)]
else:
print "Total Overall Units", [
units[date] for date in sorted(dates)]
print "Total Overall Revenue", [
revenue[date] for date in sorted(dates)]