我有不同国家冰淇淋销售的 4 级字典:
Import pandas as pd
from operator import add
d1={
'Sweden':{'jan':{
'0-5': 5,
'6-8': 8,
'9-10':19,
'11-15': 14,
'16-18': 24},
'march':{
'0-5': 5,
'6-8': 18,
'9-10': 9,
'11-15': 14,
'16-18': 24},
'feb':{
'0-5': 5,
'6-8': 7,
'9-10': 3,
'11-15': 14,
'16-18': 24}},
'Norway':{'jan':{
'0-5': 25,
'6-8': 8,
'9-10': 45,
'11-15': 14,
'16-18': 24},
'march':{
'0-5': 2,
'6-8': 8,
'9-10': 88,
'11-15': 14,
'16-18': 24},
'feb':{
'0-5': 5,
'6-8': 48,
'9-10': 9,
'11-15': 39,
'16-18': 24}}
}
我可以使用嵌套 for 循环将其解压到我想要的 DataFrame:
colnames=['country','month','age','revenue']
lst=[]
for i in d1.keys():
for j in d1[i].keys():
revenue=list(d1[i][j].items())
l1=list(map(add,[(i,j)]*5,revenue))
lst=lst+l1
df=pd.DataFrame.from_records(lst,columns=colnames)
到形状
(30,4)
DataFrame。
pandas 是否有内置函数可以更好/更快地执行此操作 没有for循环?最快的方法是什么?
您可以使用 pandas 函数来重塑,但效率可能较低:
out = (pd.concat({k: pd.DataFrame(d).rename_axis(index='age', columns='month')
for k, d in d1.items()},
names=['country'])
.stack().reset_index(name='revenue')
)
或者:
s = pd.DataFrame(d1).stack()
out = (pd.DataFrame(s.tolist(), index=s.index).stack()
.rename_axis(['month', 'country', 'age']).reset_index(name='revenue')
)
使用字典理解的代码变体,比 pandas 更快:
out = pd.DataFrame([(k1, k2, k3, v3) for k1, d in d1.items()
for k2, d2 in d.items()
for k3, v3 in d2.items()],
columns=['country', 'month', 'age', 'revenue'])
输出:
country month age revenue
0 Sweden jan 0-5 5
1 Sweden jan 6-8 8
2 Sweden jan 9-10 19
3 Sweden jan 11-15 14
4 Sweden jan 16-18 24
5 Sweden march 0-5 5
6 Sweden march 6-8 18
7 Sweden march 9-10 9
8 Sweden march 11-15 14
9 Sweden march 16-18 24
10 Sweden feb 0-5 5
11 Sweden feb 6-8 7
12 Sweden feb 9-10 3
13 Sweden feb 11-15 14
14 Sweden feb 16-18 24
15 Norway jan 0-5 25
16 Norway jan 6-8 8
17 Norway jan 9-10 45
18 Norway jan 11-15 14
19 Norway jan 16-18 24
20 Norway march 0-5 2
21 Norway march 6-8 8
22 Norway march 9-10 88
23 Norway march 11-15 14
24 Norway march 16-18 24
25 Norway feb 0-5 5
26 Norway feb 6-8 48
27 Norway feb 9-10 9
28 Norway feb 11-15 39
29 Norway feb 16-18 24
时间:
# dictionary comprehension
148 µs ± 4.28 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
# pandas reshaping (1)
1.54 ms ± 21.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
# pandas reshaping (2)
1.43 ms ± 27.7 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)