df = pd.DataFrame({
'from': ['Start', '21', '73', 'Start', '55', '1', '2', '3'],
'to': ['21', '73', '55', '1', '54', '2', '3', '4']
})
来自 | 到 | |
---|---|---|
0 | 开始 | 21 |
1 | 21 | 73 |
2 | 73 | 55 |
3 | 开始 | 1 |
4 | 55 | 54 |
5 | 1 | 2 |
6 | 2 | 3 |
7 | 3 | 4 |
所需输出
[['Start', '21', '73', '55', '54'], ['Start', '1', '2', '3', '4']]
行的顺序不必正确,如 55、54。
networkx
来解决它。
weakly_connected_components
,然后得到dag_longest_path
:
import networkx as nx
# deduplicate the Start
m = df['from'].eq('Start')
from_ = df['from'].mask(m, 'Start'+m.cumsum().astype(str))
# ['Start1', '21', '73', 'Start2', '55', '1', '2', '3']
# create a graph
G = nx.from_edgelist(zip(from_, df['to']),
create_using=nx.DiGraph)
# get isolated paths
out = [nx.dag_longest_path(G.subgraph(c))
for c in nx.weakly_connected_components(G)]
输出:
[['Start1', '21', '73', '55', '54'], ['Start2', '1', '2', '3', '4']]