加快 Networkx 性能以创建管理层次结构

问题描述 投票:0回答:1

我有一个数据集如下:

  Emp Mgr
0  E1  M1
1  M1  M2
2  M3  M5
3  M2  M5

因此,对于每个用户(行),我需要管理层次结构为:

  Emp Mgr Level_01 Level_02 Level_03 Level_04
0  E1  M1       M5       M2       M1       E1
1  M1  M2       M5       M2       M1
2  M3  M5       M5       M3
3  M2  M5       M5       M2

输出类似于:
Emp > 经理(最高级别为他的直接经理)。
例如:对于 EmpA:Mgr1(首席执行官)- Mgr2(总监)- M3(高级经理)- M4(Emp 的直接经理)

我正在使用网络,如这个答案中所述。有 177K 条记录,有 2 个根节点。生成这个层次结构的总时间超过6个小时。如何才能显着减少脚本所花费的时间。

G = nx.from_pandas_edgelist(df, source='Mgr', target='Emp',
                            create_using=nx.DiGraph)

# find roots (= top managers)
roots = [n for n,d in G.in_degree() if d==0]
    
df2 = (pd.DataFrame([next((p for root in roots for p in nx.all_simple_paths(G, root, node)), [])[:-1]
                     for node in df['Emp']], index=df.index)
         .rename(columns=lambda x: f'Level_{x+1:02d}')
      )
python pandas networkx graph-theory hierarchy
1个回答
1
投票

我使用

networkx
和纯Python与递归函数的组合重写了逻辑:

df = pd.read_csv('demodata2.csv', skiprows=3, usecols=[0, 1])

G = nx.from_pandas_edgelist(df.dropna(subset='MgrUPN'), source='MgrUPN', target='EmpUPN',
                            create_using=nx.DiGraph)
G.remove_edges_from(nx.selfloop_edges(G))

parent = {}

# uncomment the prints to see which nodes have no or multiple parents
for n in nx.dfs_postorder_nodes(G):
    p = list(G.predecessors(n))
    if len(p) == 0:
        #print(f'"{n}" has no parent')
        pass
    else:
        if len(p)>1:
            #print(f'"{n}" has multiple parents ({p}), picking "{p[0]}"')
            pass
        parent[n] = p[0]

def get_parents(n):
    try:
        yield from get_parents(parent[n])
    except KeyError:
        pass
    yield n

out = df.join(pd.DataFrame([list(get_parents(node)) for node in df['EmpUPN']], index=df.index)
                .rename(columns=lambda x: f'Level_{x+1:02d}')
             )
print(out)

输出:

                                                  EmpUPN                         MgrUPN                                        Level_01                 Level_02                   Level_03                     Level_04                    Level_05                       Level_06                 Level_07                         Level_08                                          Level_09 Level_10 Level_11
0                                    [email protected]     [email protected]                                 [email protected]            [email protected]       [email protected]      [email protected]    [email protected]     [email protected]      [email protected]                             None                                              None     None     None
1                               [email protected]         [email protected]                                 [email protected]            [email protected]    [email protected]     [email protected]        [email protected]        [email protected]   [email protected]         [email protected]                                              None     None     None
2                                  [email protected]      [email protected]                          [email protected]  [email protected]  [email protected]        [email protected]                        None                           None                     None                             None                                              None     None     None
3                                      [email protected]  [email protected]                                 [email protected]            [email protected]      [email protected]          [email protected]      [email protected]  [email protected]        [email protected]                             None                                              None     None     None
4                                      [email protected]          [email protected]                                 [email protected]            [email protected]    [email protected]      [email protected]  [email protected]          [email protected]        [email protected]                             None                                              None     None     None
...                                                  ...                            ...                                             ...                      ...                        ...                          ...                         ...                            ...                      ...                              ...                                               ...      ...      ...
177920  [email protected]   [email protected]                                 [email protected]            [email protected]    [email protected]  [email protected]      [email protected]      [email protected]  [email protected]     [email protected]  [email protected]     None     None
177921                    [email protected]                            NaN                  [email protected]                     None                       None                         None                        None                           None                     None                             None                                              None     None     None
177922    [email protected]                            NaN  [email protected]                     None                       None                         None                        None                           None                     None                             None                                              None     None     None
177923                   [email protected]        [email protected]                                 [email protected]            [email protected]    [email protected]     [email protected]    [email protected]          [email protected]  [email protected]  [email protected]                                              None     None     None
177924                 [email protected]                            NaN               [email protected]                     None                       None                         None                        None                           None                     None                             None                                              None     None     None

[177925 rows x 13 columns]

177K 行的运行时间:

1.44 s ± 255 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
© www.soinside.com 2019 - 2024. All rights reserved.