我试图在数据帧中获取每组记录的第95个分位数

问题描述 投票:0回答:1

我试图获取数据帧中每组行的第95个分位数。我试过了:mdf = mdf.groupby('GroupID')。分位数(.95)

但是解释器返回一个错误:ValueError:'GroupID'既是索引级别又是列标签,这是不明确的。

我有三列,我想要每组的第95个利用率:GroupID,Timestamp,Util

代码如下:

#pandas 95th percentile calculator

import pandas as pd
import numpy as np

#pd.set_option('display.max_columns', 8)

cfile = "path"
rfile = "path"

#define columns in corereport dataframe
cdf = pd.read_csv(cfile, skiprows = 1, names = ['ID','Device','Bundle','IsPolled','Status','SpeedIn','SpeedOut','Timestamp','MaxIn','MaxOut'])

#drop specified columns from dataframe
to_drop = ['Device', 'Bundle', 'IsPolled', 'Status', 'SpeedIn', 'SpeedOut']
cdf.drop(to_drop, inplace=True, axis=1)

#define columns in relationship dataframe
rdf = pd.read_csv(rfile, skiprows = 1, names = ['GroupID', 'ID', 'Path', 'LowestBW', 'TotalBW'])

#merge the two dataframes together on the ID field
mdf = pd.merge(cdf, rdf, left_on='ID', right_on='ID', how = 'left')

#print(mdf.head())

#Add a column with the larger of two values of MaxIn and MaxOut for each row
mdf.loc[mdf['MaxIn'] > mdf['MaxOut'], 'Util'] = mdf['MaxIn']
mdf.loc[mdf['MaxIn'] < mdf['MaxOut'], 'Util'] = mdf['MaxOut']

#drop specified columns from data frame
to_drop = ['ID', 'MaxIn', 'MaxOut', 'Path', 'LowestBW', 'TotalBW']
mdf.drop(to_drop, inplace=True, axis=1)

#print(mdf.head().values)

#Group by the GroupID and Timestamp Columns and sum the value in Util
mdf = mdf.groupby(['GroupID', 'Timestamp'])['Util'].sum().reset_index()

#Grouping by GroupID and then sorting ascending
mdf = mdf.groupby(['GroupID']).apply(lambda x: x.sort_values(['Util']))

mdf = mdf.groupby('GroupID').quantile(.95)

#Write new dataframe out to a csv
ofile = 'path'
mdf.to_csv(ofile, encoding='utf-8', index=False)

pandas quantile
1个回答
0
投票

问题出在这里:

mdf = mdf.groupby(['GroupID']).apply(lambda x: x.sort_values(['Util']))

'GroupID'设为mdf的指数。尝试改为:

mdf = (mdf.groupby(['GroupID'])[['Timestamp', 'Util']]
       .apply(lambda x: x.sort_values(['Util'])))

要么

mdf.sort_values(['GroupID', 'Util'], inplace=True)

但是,我相信你不需要为quantile排序值。

© www.soinside.com 2019 - 2024. All rights reserved.