根据 pandas 的标准进行计数

问题描述 投票:0回答:1

我有一只熊猫

DataFrame
,像这样:

d={'gen':['A','A','A','A','B','B','B','B','C','D','D','D','D','D','D','D','D','D','D'], 'diff':pd.Series([1,1,1,1,2,1,1,1,1,1,1,1,1,2,2,1,1,1], index=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17])}
wk = pd.DataFrame(data=d, index=[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18])

我的目标是根据一些标准计算

gen
出现了多少次:

计算
    diff
  1. 是否为1,并且
    索引 
    diff
  2. 处的
  3. gen
    等于索引
    i
    处的
    gen
    ,并且
    如果有连续的 1,则计数如下: if (连续 1 的个数) %2 == 0: count = 连续的个数/2,如果没有: count = (连续的个数 - 1) /2 
  4. 通过这段代码,我可以实现我想要的:

i+1

字典
k=0 j=0 z={} for i in range(wk.shape[0]): if wk['diff'][i] == 1: if wk['gen'][i] == wk['gen'][i+1]: if j == 0: j+=2 if j%2==0: k+=1 if j>=2: j+=1 z[wk['gen'][i]] = k if wk['gen'][i] != wk['gen'][i+1]: j=0 k=0

的结果是:

z
但是当我使用更大的数据(超过 410,000 条记录)时,当索引 

{'A': 2, 'B': 1, 'D': 4}

处的

gen
不等于索引
i
处的
gen
时,计数器并不总是从 0 开始。我的代码有什么问题吗?
    

python pandas dataframe
1个回答
1
投票

i+1

计算每组连续1秒,执行2个
groupby.count
(相当于你的
floordiv
),并在转换
x/2 if x%2==0 else (x-1)/2
之前再次用
groupby.sum
聚合:
to_dict

输出:

group = wk['diff'].ne(wk.groupby('gen')['diff'].shift()).cumsum() m = wk['diff'].eq(1) out = (wk[m].groupby(['gen', group]) # keep only 1s and group ['diff'].count().floordiv(2) # count and floor division .groupby(level='gen').sum() # sum per "gen" group .loc[lambda x: x>0].to_dict() # only counts > 0 and convert to dict )

中间体
{'A': 2, 'B': 1, 'D': 3}

group
m

	
© www.soinside.com 2019 - 2024. All rights reserved.