我有这个数据集
import pandas as pd
data = pd.DataFrame({
'ID': [1, 2, 3, 4, 5, 2, 3, 1],
'Debit': [0, 5000, 0, 5000, 3000, 0, 2000, 1000],
'Credit': [-100, 0, -700, 0, 0, -8000, 0, 0]
})
names_index = pd.DataFrame({
'ID': [1, 2, 3, 4, 5, 6, 7, 8],
'names': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
'state1': ['db', 'db', 'db', 'db', 'db', 'db', 'db', 'db'],
'state2': ['cr', 'cr', 'cr', 'cr', 'cr', 'cr', 'cr', 'cr'],
})
我想根据两列的 ID 进行求和,并希望根据结果值将结果值显示到相关列中
我正在尝试的是...
balance = data.groupby('ID')[['Debit', 'Credit']].sum()
result = balance.merge(names_index, on='ID', how='inner')
result = result[['ID', 'names', 'Debit', 'Credit', 'state1', 'state2']]
result['state'] = result.pop('state1').where(result['Debit'].ne(0), result.pop('state2'))
print(result)
结果:
ID names Debit Credit area
0 1 A 1000 -100 dr
1 2 B 5000 -8000 dr
2 3 C 2000 -700 dr
3 4 D 5000 0 dr
4 5 E 3000 0 dr
我想要的结果是:
ID names Debit Credit state
0 1 A 900 0 db
1 2 B 0 -3000 cr
2 3 C 1300 0 db
3 4 D 5000 0 db
4 5 E 3000 0 db
尝试了 pandas 的融化功能,但它让我得到了这样的一栏
balance = data.melt('ID').groupby('ID').value.sum().reset_index()
结果:
ID value
0 1 900
1 2 -3000
2 3 1300
3 4 5000
4 5 3000
请帮忙整理一下..
如果我理解正确的话,在groupby之后聚合为
sum
,制作一个DataFrame并merge
:
s = data.groupby('ID')[['Debit', 'Credit']].sum().sum(axis=1)
m = s>0
out = (names_index[['ID', 'names']]
.merge(
pd.DataFrame({'state1': s.where(m, 0),
'state2': s.mask(m, 0),
'state': np.where(m, 'db', 'cr')})
.reset_index()
)
)
输出:
ID names state1 state2 state
0 1 A 900 0 db
1 2 B 0 -3000 cr
2 3 C 1300 0 db
3 4 D 5000 0 db
4 5 E 3000 0 db