我有一个列表,其中每个值是一个元组列表。例如,这是我为密钥提取的值:
[('1998-01-20',8) , ('1998-01-22',4) , ('1998-06-18',8 ) , ('1999-07-15' , 7), ('1999-07-21',1) ]
我也对列表进行了排序。现在我想聚合这样的值:
[('1998-01' , 12 ) , ('1998-06' ,8 ) , ('1999-07',8 )]
从某种意义上说,我想按月分组我的元组,一起总结每个月的整数,我读过有关groupby的内容,我认为它对我的数据结构无法帮助,因为我不知道我会做什么面对我的列表,所以我想找个方法说:如果i [0] [:6]相等,从元组的第一个元素开始:sum i [1]。但是我很难实现这个想法。
for i in List :
if i[0][:6] # *problem* I don't know how to say my condition :
s=sum(i[1]) #?
我很感激任何建议,因为我是python的新用户!
还有一个答案与已经给出的答案不同。您可以简单地创建一个新的字典,其中键是年 - 月组合。列表中的日期循环+使用dictionary.get(key, defaultvalue)
应该可以解决问题。 IT将当前值添加到新字典中的值,如果密钥尚不存在,则返回默认值0并创建密钥。
data = [('1998-01-20',8) , ('1998-01-22',4) , ('1998-06-18',8 ) , ('1999-07-15' , 7), ('1999-07-21',1)]
dictionary = dict()
for (mydate, val) in data: #
ym = mydate[0:7] # the key is only the year month combination (i.e. '1998-01' for example)
dictionary[ym] = dictionary.get(ym, 0) + val # return the value for that key or return default 0 (and create key)
data_aggregated = [(key, val) for (key, val) in dictionary.iteritems()] # if you need it back in old format
尝试使用itertools.groupby
按月汇总值:
from itertools import groupby
a = [('1998-01-20', 8), ('1998-01-22', 4), ('1998-06-18', 8),
('1999-07-15', 7), ('1999-07-21', 1)]
for key, group in groupby(a, key=lambda x: x[0][:7]):
print key, sum(j for i, j in group)
# Output
1998-01 12
1998-06 8
1999-07 8
这是一个单行版本:
print [(key, sum(j for i, j in group)) for key, group in groupby(a, key=lambda x: x[0][:7])]
# Output
[('1998-01', 12), ('1998-06', 8), ('1999-07', 8)]
只需使用defaultdict
:
from collections import defaultdict
DATA = [
('1998-01-20', 8),
('1998-01-22', 4),
('1998-06-18', 8),
('1999-07-15', 7),
('1999-07-21', 1),
]
groups = defaultdict(int)
for date, value in DATA:
groups[date[:7]] += value
from pprint import pprint
pprint(groups)
我喜欢用defaultdict
来计算:
from collections import defaultdict
lst = [('1998-01-20',8) , ('1998-01-22',4) , ('1998-06-18',8 ) , ('1999-07-15' , 7), ('1999-07-21',1)]
result = defaultdict(int)
for date, cnt in lst:
year, month, day = date.split('-')
result['-'.join([year, month])] += cnt
print(result)