数据帧上交换行的排序算法

问题描述 投票:0回答:1

我有以下虚拟 df:

import pandas as pd
data = {
    'address': [1234, 24389, 4384, 4484, 1234, 24389, 4384, 188],
    'old_account': [200, 200, 200, 300, 200, 494, 400, 100],
    'new_account': [300, 100, 494, 200, 400, 200, 200, 200]
}

df = pd.DataFrame(data)
print(df)

   address  old_account  new_account
0     1234          200          300
1    24389          200          100
2     4384          200          494
3     4484          300          200
4     1234          200          400
5    24389          494          200
6     4384          400          200
7      188          100          200

A) 我想对它进行排序,使

200
位于
old_account
处,并直接位于下一行的
new_account
处:

200 xxx
xxx 200

B) 我还想对非 200 进行排序,这样我就可以从

300
开始,浏览整个 df 寻找
300
并进行切换:

200 300
300 200
200 300
...

只有当不再有

300
时,我才会去下一个,比如说
400
..

200 300
300 200
200 300
...
200 400
400 200
200 400
...
上面的

df
应该是这样的:

   address  old_account  new_account
0     1234          200          300
1     4484          300          200
2    24389          200          100
3      188          100          200
4     4384          200          494
5    24389          494          200
6     1234          200          400
7     4384          400          200

如您所见,200 彼此成对角线,非 200 也是如此。

以下代码仅适用于A)。 我没能同时考虑B) 我有以下代码:

import pandas as pd

# Create the initial DataFrame
df= pd.read_csv('dummy_data.csv', sep=';')

# Initiate sorted df
sorted_df = pd.DataFrame(columns=df.columns)

while not df.empty:
    # Find the first row where '200' is in 'old_account'
    idx_old = df.index[df['old_account'] == 200].min()
    
    if pd.notna(idx_old):
        # Add the corresponding row to the sorted result
        sorted_df = pd.concat([sorted_df, df.loc[[idx_old]]], ignore_index=True)
        
        # Remove the row from the original DataFrame
        df = df.drop(index=idx_old)
        
        # Find the matching row where '200' is in 'new_account'
        idx_new = df.index[df['new_account'] == 200].min()
        
        if pd.notna(idx_new):
            # Add the corresponding row to the sorted result
            sorted_df = pd.concat([sorted_df, df.loc[[idx_new]]], ignore_index=True)
            
            # Remove the row from the original DataFrame
            df = df.drop(index=idx_new)
        else:
            break  # If no matching row is found, exit the loop
    else:
        break  # If no more '200' in 'old_account' is found, exit the loop

# Reset the index of the sorted DataFrame
sorted_df.reset_index(drop=True, inplace=True)

print(sorted_df)
python pandas dataframe sorting
1个回答
0
投票

看起来您正在尝试在检测到的图中搜索 [欧拉路径]。

您可能想使用

networkx

import networkx as nx

G = nx.from_pandas_edgelist(df, source='old_account', target='new_account',
                            create_using=nx.MultiDiGraph)

tmp = pd.DataFrame(nx.eulerian_circuit(G, keys=True),
                   columns=['old_account', 'new_account', 'n'])


out = (tmp.merge(df.assign(n=df.groupby(['old_account', 'new_account']).cumcount()))
       [df.columns]
       )

输出:

   address  old_account  new_account
0     1234          200          400
1     4384          400          200
2     4384          200          494
3    24389          494          200
4    24389          200          100
5      188          100          200
6     1234          200          300
7     4484          300          200
© www.soinside.com 2019 - 2024. All rights reserved.