数据帧上交换行的排序算法

Question

我有以下虚拟 df：

import pandas as pd
data = {
    'address': [1234, 24389, 4384, 4484, 1234, 24389, 4384, 188],
    'old_account': [200, 200, 200, 300, 200, 494, 400, 100],
    'new_account': [300, 100, 494, 200, 400, 200, 200, 200]
}

df = pd.DataFrame(data)
print(df)

   address  old_account  new_account
0     1234          200          300
1    24389          200          100
2     4384          200          494
3     4484          300          200
4     1234          200          400
5    24389          494          200
6     4384          400          200
7      188          100          200

A) 我想对它进行排序，使

位于

old_account

处，并直接位于下一行的

new_account

处：

200 xxx
xxx 200

B) 我还想对非 200 进行排序，这样我就可以从

开始，浏览整个 df 寻找

并进行切换：

只有当不再有

时，我才会去下一个，比如说

..

上面的

df

应该是这样的：

   address  old_account  new_account
0     1234          200          300
1     4484          300          200
2    24389          200          100
3      188          100          200
4     4384          200          494
5    24389          494          200
6     1234          200          400
7     4384          400          200

如您所见，200 彼此成对角线，非 200 也是如此。

以下代码仅适用于A)。 我没能同时考虑B） 我有以下代码：

import pandas as pd

# Create the initial DataFrame
df= pd.read_csv('dummy_data.csv', sep=';')

# Initiate sorted df
sorted_df = pd.DataFrame(columns=df.columns)

while not df.empty:
    # Find the first row where '200' is in 'old_account'
    idx_old = df.index[df['old_account'] == 200].min()
    
    if pd.notna(idx_old):
        # Add the corresponding row to the sorted result
        sorted_df = pd.concat([sorted_df, df.loc[[idx_old]]], ignore_index=True)
        
        # Remove the row from the original DataFrame
        df = df.drop(index=idx_old)
        
        # Find the matching row where '200' is in 'new_account'
        idx_new = df.index[df['new_account'] == 200].min()
        
        if pd.notna(idx_new):
            # Add the corresponding row to the sorted result
            sorted_df = pd.concat([sorted_df, df.loc[[idx_new]]], ignore_index=True)
            
            # Remove the row from the original DataFrame
            df = df.drop(index=idx_new)
        else:
            break  # If no matching row is found, exit the loop
    else:
        break  # If no more '200' in 'old_account' is found, exit the loop

# Reset the index of the sorted DataFrame
sorted_df.reset_index(drop=True, inplace=True)

print(sorted_df)

Answer 1

看起来您正在尝试在检测到的图中搜索 [欧拉路径]。

您可能想使用

networkx

：

import networkx as nx

G = nx.from_pandas_edgelist(df, source='old_account', target='new_account',
                            create_using=nx.MultiDiGraph)

tmp = pd.DataFrame(nx.eulerian_circuit(G, keys=True),
                   columns=['old_account', 'new_account', 'n'])


out = (tmp.merge(df.assign(n=df.groupby(['old_account', 'new_account']).cumcount()))
       [df.columns]
       )

输出：

   address  old_account  new_account
0     1234          200          400
1     4384          400          200
2     4384          200          494
3    24389          494          200
4    24389          200          100
5      188          100          200
6     1234          200          300
7     4484          300          200

数据帧上交换行的排序算法

问题描述投票：0回答：1

1个回答

最新问题

数据帧上交换行的排序算法

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1