如何在数据框中提取特定条件下的某些行(Python)?

问题描述 投票:0回答:1

我有以下数据集

A=pd.DataFrame({ 'vol_num' : 1.,
                        'vol_name' : pd.Categorical(["test","train","tt","tn","se","train","tt","test","train","tt"]),
                        'lat' : [0.188319,0.818803,0.087331,0.305681,0.871307,0.818803,0.087331,0.188319,0.818803,0.087331],
                        'lon' : [0.959698,0.678901,0.961500,0.229158,0.947383,0.678901,0.961500,0.959698,0.678901,0.961500],
                        })

对于每个“vol_name”,我都有相同的“lat”和“lon”。
我想提取数据框中前 3 个重复的“vol_name”的“lat”和“lon”。

下面的代码给了我 3 值。

A['vol_name'].value_counts().head(3)

tt       3
train    3
test     2
Name: vol_name, dtype: int64

但是,我不知道如何获取每个“lat”和“lon”。

如何才能得到以下结果?采用 3 列的数据框样式。

tt      0.087331    0.961500  
train   0.818803    0.67890  
test    0.188319    0.959698

谢谢你。

*我的真实数据集有超过 500 行。

python pandas dataframe group-by
1个回答
3
投票

首先按

vol_name
删除重复项,然后按索引
idx
更改顺序,最后删除列
vol_num
:

idx = A["vol_name"].value_counts().head(3).index

A = (
    A.drop_duplicates("vol_name")
    .set_index(["vol_name"])
    .reindex(idx)
    .reset_index()
    .drop("vol_num", 1)
)

print (A)
  vol_name       lat       lon
0       tt  0.087331  0.961500
1    train  0.818803  0.678901
2     test  0.188319  0.959698
© www.soinside.com 2019 - 2024. All rights reserved.