我有以下数据集
A=pd.DataFrame({ 'vol_num' : 1.,
'vol_name' : pd.Categorical(["test","train","tt","tn","se","train","tt","test","train","tt"]),
'lat' : [0.188319,0.818803,0.087331,0.305681,0.871307,0.818803,0.087331,0.188319,0.818803,0.087331],
'lon' : [0.959698,0.678901,0.961500,0.229158,0.947383,0.678901,0.961500,0.959698,0.678901,0.961500],
})
对于每个“vol_name”,我都有相同的“lat”和“lon”。
我想提取数据框中前 3 个重复的“vol_name”的“lat”和“lon”。
下面的代码给了我 3 值。
A['vol_name'].value_counts().head(3)
tt 3
train 3
test 2
Name: vol_name, dtype: int64
但是,我不知道如何获取每个“lat”和“lon”。
如何才能得到以下结果?采用 3 列的数据框样式。
tt 0.087331 0.961500
train 0.818803 0.67890
test 0.188319 0.959698
谢谢你。
*我的真实数据集有超过 500 行。
首先按
vol_name
删除重复项,然后按索引 idx
更改顺序,最后删除列 vol_num
:
idx = A["vol_name"].value_counts().head(3).index
A = (
A.drop_duplicates("vol_name")
.set_index(["vol_name"])
.reindex(idx)
.reset_index()
.drop("vol_num", 1)
)
print (A)
vol_name lat lon
0 tt 0.087331 0.961500
1 train 0.818803 0.678901
2 test 0.188319 0.959698