我在 networkX 中构建了一个图形。该图有两种类型的节点,A 和 B。A 节点可以有到 B 节点的有向边。 B 节点没有任何出边。基本上,该图代表参考。所以一个 A 节点可以引用很多 B 节点。两个 A 节点可以有到同一个 B 节点的边。 我想要一个度量来计算两个 A 节点之间的“相似性”。 我发现书目耦合工作正常。 因为 A 节点没有任何入边,所以无法进行共引。 分散也是不可能的,因为 B 节点之间没有任何边缘。
我想尝试更多的指标并进行比较。 我可以使用什么样的测量值?
试试 Jaqcard Similarity。它取两个父节点之间的共享子节点数,然后除以两个节点之间唯一子节点的总数。
import networkx as nx
# create a new graph
G = nx.DiGraph()
# add nodes of type A
G.add_nodes_from(['A1', 'A2', 'A3'])
# add nodes of type B
G.add_nodes_from(['B1', 'B2', 'B3', 'B4'])
# add edges from A-nodes to B-nodes
G.add_edges_from([('A1', 'B1'), ('A1', 'B2'), ('A2', 'B2'), ('A2', 'B3'), ('A2', 'B4'), ('A3', 'B3'), ('A3', 'B4')])
# define function to compute Jaccard similarity between two A-nodes based on the B-nodes they are connected to
def jaccard_similarity(node1, node2):
set1 = set(G.successors(node1))
set2 = set(G.successors(node2))
intersection = set1.intersection(set2)
union = set1.union(set2)
return len(intersection) / len(union)
# compute the Jaccard similarity between A1 and A2
similarity = jaccard_similarity('A1', 'A2')
print(f"The Jaccard similarity between A1 and A2 is: {similarity}")
# draw the plot
import matplotlib.pyplot as plt
# define the layout algorithm for the graph
pos = nx.circular_layout(G)
# draw the nodes and edges of the graph using the layout
nx.draw_networkx_nodes(G, pos, nodelist=['A1', 'A2', 'A3'], node_color='r')
nx.draw_networkx_nodes(G, pos, nodelist=['B1', 'B2', 'B3', 'B4'], node_color='b')
nx.draw_networkx_edges(G, pos, edgelist=G.edges(), edge_color='black', arrows=True)
# add labels to the nodes
labels = {node: node for node in G.nodes()}
nx.draw_networkx_labels(G, pos, labels)
# show the graph
plt.axis('off')
plt.show()
输出:
The Jaccard similarity between A1 and A2 is: 0.25