Groupby 输出多列

Question

我正在尝试从 groupby 操作中输出多列。

我的输入文件是：

预期输出：

使用的代码：

df = pd.read_csv('testPCI.csv')

output = df.groupby(['Frequency'])['PCI'].count().to_excel("output.xlsx")

我得到了这个：

如何获得预期的输出，其中包括计数列以及％计数列。

Answer 1

尝试：

(dfc:=df.groupby(['Frequency','PCI'])['PCI'].count().rename('count').reset_index())\
   .assign(count_pct=dfc['count']/dfc['count'].sum()*100).round(2)

输出：

   Frequency  PCI  count  count_pct
0        123    5      5      21.74
1        456    7      8      34.78
2        999    9     10      43.48

输入数据框：

df = pd.DataFrame({'Frequency':[123]*5+[456]*8+[999]*10,
                   'PCI':[5]*5+[7]*8+[9]*10})

详情：

使用“walrus”运算符创建一个数据帧，使用 groupby 按频率对 PCI 进行计数。然后使用创建的数据帧分配一个新列 count_pct 作为 count 除以 count 之和乘以 100 并四舍五入到小数点后两位。

Answer 2

这是一种方法：

out = (
    pd.concat([
        df.value_counts(sort=False),
        df.value_counts(normalize=True)
          .mul(100)
          .round(2)
          .astype(str) + '%'
    ], axis=1)
    .reset_index()
)

输出：

   Frequency  PCI  count proportion
0        123    5      5     21.74%
1        456    7      8     34.78%
2        999    9     10     43.48%

解释

使用
```
df.value_counts
```
和
```
sort=True
```
进行“计数”。

使用

df.value_counts

和

normalize=True

表示“比例”，使用

Series.mul

+

Series.round

+

Series.astype

进行格式设置。

使用
pd.concat
```
 + 
```
axis=1
将两个系列传递给
```
df.reset_index
```
。
Chain
```
df.rename
```
如果您想使用不同的列标签。

Groupby 输出多列

问题描述投票：0回答：2

2个回答

最新问题

Groupby 输出多列

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2