我正在对数据框进行一些操作:
df
Node Interface Speed carrier 1-May 9-May 2-Jun 21-Jun
Server1 internet1 10 ATT 20 30 50 90
Server1 wan3.0 20 Comcast NaN NaN NaN 100
Server1 wan3.0 50 Comcast 30 40 40 NaN
Server2 wan2 100 Sprint 90 70 NaN NaN
Server2 wan2 20 Sprint NaN NaN 88 70
Server2 Internet2 40 Verizon 10 60 90 70
我需要按节点和接口合并数据帧组中的行,将 nan 值替换为另一行,然后选择接口速度的最大值。
预期的数据框应该是这样的:
df1
Node Interface Speed carrier 1-May 9-May 2-Jun 21-Jun
Server1 internet1 10 ATT 20 30 50 90
Server1 wan3.0 50 Comcast 30 40 40 100
Server2 wan2 100 Sprint 90 70 88 70
Server2 Internet2 40 Verizon 10 60 90 70
我试过这个:
df2=df.groupby(['Node','Interface','carrier']),agg({'Speep': 'max'}).reset_index()
df3=df.drop('Speed', axis=1)
df4=df3.ffill().drop_duplicates()
不太有效。有没有一种简单的方法来合并行,用其他行值替换 nan 值并为速度单元格值选择最大速度?
代码
cols = ['carrier', '1-May', '9-May', '2-Jun', '21-Jun']
g = df.groupby(['Node', 'Interface'], sort=False, as_index=False)
out = g.agg({**{'Speed': 'max'}, **dict.fromkeys(cols, 'first')})
输出:
Node Interface Speed carrier 1-May 9-May 2-Jun 21-Jun
0 Server1 internet1 10 ATT 20.0 30.0 50.0 90.0
1 Server1 wan3.0 50 Comcast 30.0 40.0 40.0 100.0
2 Server2 wan2 100 Sprint 90.0 70.0 88.0 70.0
3 Server2 Internet2 40 Verizon 10.0 60.0 90.0 70.0