我有以下数据框:
df = pd.DataFrame({
'label': [1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4],
'condition1': ['c','c','f','f','c','c','f','f','c','c','f','f','c','c','f','f'],
'condition2': ['c','f','c','f','c','f','c','f','c','f','c','f','c','f','c','f']})
我已经使用以下代码对 df 进行排序:
df = df.sort_values(by=['label', 'condition1'], ascending=[True, True])
我还想对“condition2”进行排序,使其看起来像这样::
df = pd.DataFrame({
'label': [1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4],
'condition1': ['c','c','f','f','c','c','f','f','c','c','f','f','c','c','f','f'],
'condition2': ['c','f','f','c','c','f','f','c','c','f','f','c','c','f','f','c']})
我怎样才能实现这个目标?我尝试将条件2添加到sort_values中,但什么也没发生。
逻辑仍有待澄清,但假设您希望条件 2 的第一个值与条件 1 的每组相匹配,您可以根据两列的相等性计算排序系列:
tmp = df.sort_values(by=['label', 'condition1'], ascending=[True, True])
order = np.lexsort([tmp['condition2'].ne(tmp['condition1']), df['condition1'], df['label']])
out = df.iloc[order]
输出:
label condition1 condition2
0 1 c c
1 1 c f
3 1 f f
2 1 f c
4 2 c c
5 2 c f
7 2 f f
6 2 f c
8 3 c c
9 3 c f
11 3 f f
10 3 f c
12 4 c c
13 4 c f
15 4 f f
14 4 f c