我想将
value_counts()
应用于多个列并进一步重用相同的数据框以添加更多列。我有以下数据框作为示例。
id shop type status
0 1 mac A open
1 1 mac B close
2 1 ikea B open
3 1 ikea A open
4 1 meta A open
5 1 meta B close
6 2 meta B open
7 2 ikea B open
8 2 ikea B close
9 3 ikea A close
10 3 apple B close
11 3 apple B open
12 3 apple A open
13 4 denim A close
14 4 denim A close
我想实现每个
id
和shop
类别的type
和status
的分组计数,如下所示。
id shop A B close open
0 1 ikea 1 1 0 2
1 1 mac 1 1 1 1
2 1 meta 1 1 1 1
3 2 ikea 0 2 1 1
4 2 meta 0 1 0 1
5 3 apple 1 2 1 2
6 3 ikea 1 0 1 0
7 4 denim 2 0 2 0
到目前为止,我已经尝试过这个方法,它工作正常,但我觉得它效率不高,特别是如果我有更多数据并且可能想对同一个 groupby 使用额外的两个 aggs 函数。此外,在某些极少数情况下,合并可能并不总是有效。
import pandas as pd
from functools import reduce
df = pd.DataFrame({
'id': [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4],
'shop': ['mac', 'mac', 'ikea', 'ikea', 'meta', 'meta', 'meta', 'ikea', 'ikea', 'ikea', 'apple', 'apple', 'apple', 'denim', 'denim'],
'type': ['A', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'B', 'B', 'A', 'A', 'A'],
'status': ['open', 'close', 'open', 'open', 'open', 'close', 'open', 'open', 'close', 'close', 'close', 'open', 'open', 'close', 'close']
})
df = df.groupby(['id', 'shop'])
df_type = df['type'].value_counts().unstack().reset_index()
df_status = df['status'].value_counts().unstack().reset_index()
df = reduce(lambda df1, df2: pd.merge(df1, df2, how='left', on=['id', 'shop']), [df_type, df_status])
您可以使用
groupby()
和 value_counts
:
groups = df.groupby(['id','shop'])
pd.concat([groups['type'].value_counts().unstack(fill_value=0),
groups['status'].value_counts().unstack(fill_value=0)],
axis=1).reset_index()
或者更有活力一点:
groups = df.groupby(['id','shop'])
count_cols = ['type','status']
out = pd.concat([groups[c].value_counts().unstack(fill_value=0)
for c in count_cols], axis=1).reset_index()
或与
crosstab
:
count_cols = ['type','status']
out = pd.concat([pd.crosstab([df['id'],df['shop']], df[c])
for c in count_cols], axis=1).reset_index()
输出:
id shop A B close open
0 1 ikea 1 1 0 2
1 1 mac 1 1 1 1
2 1 meta 1 1 1 1
3 2 ikea 0 2 1 1
4 2 meta 0 1 0 1
5 3 apple 1 2 1 2
6 3 ikea 1 0 1 0
7 4 denim 2 0 2 0
crosstab
:
out = pd.concat([pd.crosstab([df['id'], df['shop']], df[c])
for c in ['type', 'status']],
axis=1).reset_index()
melt
+crosstab
:
df2 = df.melt(['id', 'shop'])
out = (pd.crosstab([df2['id'], df2['shop']], df2['value'])
.reset_index()
)
输出:
id shop A B close open
0 1 ikea 1 1 0 2
1 1 mac 1 1 1 1
2 1 meta 1 1 1 1
3 2 ikea 0 2 1 1
4 2 meta 0 1 0 1
5 3 apple 1 2 1 2
6 3 ikea 1 0 1 0
7 4 denim 2 0 2 0
这是使用 pd.get_dummies
实现此操作的一种方法
(pd.concat(
[df, #original dataframe
pd.get_dummies(df[['type','status']], prefix="", prefix_sep='') # created 1,0 column based on the values under type and status
], axis=1)
.groupby(['id','shop']) # group the data
.sum()
.reset_index())
id shop A B close open
0 1 ikea 1 1 0 2
1 1 mac 1 1 1 1
2 1 meta 1 1 1 1
3 2 ikea 0 2 1 1
4 2 meta 0 1 0 1
5 3 apple 1 2 1 2
6 3 ikea 1 0 1 0
7 4 denim 2 0 2 0
这是我的整个过程,您可以从您的平台运行它。
# Module improt
import pandas as pd
import numpy as np
# Data import
df = pd.DataFrame({
'id': [1, 1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 4, 4],
'shop': ['mac', 'mac', 'ikea', 'ikea', 'meta', 'meta', 'meta', 'ikea', 'ikea', 'ikea', 'apple', 'apple', 'apple', 'denim', 'denim'],
'type': ['A', 'B', 'B', 'A', 'A', 'B', 'B', 'B', 'B', 'A', 'B', 'B', 'A', 'A', 'A'],
'status': ['open', 'close', 'open', 'open', 'open', 'close', 'open', 'open', 'close', 'close', 'close', 'open', 'open', 'close', 'close']
})
# Data Pre-process
df_unique = df[['id','shop']].groupby(['id','shop']).count().reset_index()
df_AB = df.groupby(['id','shop','type']).count().reset_index()
df_A = df_AB.loc[df_AB['type'] =='A'].rename(columns={'status':'A'})
df_B = df_AB.loc[df_AB['type'] =='B'].rename(columns={'status':'B'})
df_OC = df.groupby(['id','shop','status']).count().reset_index()
df_O = df_OC.loc[df_OC['status'] =='open'].rename(columns={'type':'open'})
df_C = df_OC.loc[df_OC['status'] =='close'].rename(columns={'type':'close'})
# Merging for your final output
df_final = pd.merge(df_unique,df_A[['id','shop','A']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])
df_final = pd.merge(df_final,df_B[['id','shop','B']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])
df_final = pd.merge(df_final,df_C[['id','shop','close']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])
df_final = pd.merge(df_final,df_O[['id','shop','open']],how='left', left_on = ['id','shop'], right_on = ['id','shop'])
# Data Cleaning
df_final['A'] = df_final['A'].fillna(0)
df_final['B'] = df_final['B'].fillna(0)
df_final['open'] = df_final['open'].fillna(0)
df_final['close'] = df_final['close'].fillna(0)
# Output Display
df_final
附上我输出的图片: