计算非常大的网络的图表效率指标

Question

What are strategies for calculating the nodal and global efficiency of very large graphs in R?

我试图用igraph计算一个非常大的brainGraph::efficiency(my_graph, type = "global")的全球效率。

library(igraph); library(brainGraph)  
g <- readRDS("path_to_my_graph.rds")  

> ecount(g); vcount(g) # count of edges and vertices
[1] 715758
[1] 290190

它每次都可靠地崩溃R.全球效率是所有节点效率的平均值，所以我试图以这种方式计算它并没有成功。我的图形每条边上的权重都是1，所以我省略了权重，但R仍然崩溃。

# all of these cause R to crash
efficiency(g, type = "global")
efficiency(g, type = "nodal")
efficiency(g, type = "global", weights = NA)
efficiency(g, type = "nodal",  weights = NA)

对于那些想要测试数据的人来说，我的图表（~37MB）可用here on GoogleDrive as an .rds file。

Answer 1

R崩溃，因为brainGraph::efficiency()试图计算一个巨大而密集的距离矩阵，它压倒了我机器的内存（32 GB）。但我发现了一个解决方案，可以将操作分块并行运行。

全局效率是图中所有节点效率的平均值。顶点i的节点效率是：

我们可以顺序计算图中每个顶点的节点效率，将距离矩阵计算分成较小的可管理位。因为每个顶点的效率是独立的，所以我们可以并行化操作，因此不需要永远。

library(igraph)  
library(doParallel)

# nodal efficiency function
get_eff <- function(i){return((1/(length(V(g)) - 1))*sum(1/distances(g, V(g)[i])[-i]))}

no_cores <- detectCores() - 1 
cl       <- makeCluster(no_cores)
registerDoParallel(cl)  

result <- foreach(i = seq_along(V(g)), .combine = c, .packages = "igraph") %dopar% get_eff(i)

stopCluster(cl)
rm(cl)

global_eff <- mean(result)

此外，我们可以绘制节点效率的分布以及全局（平均）效率，这使我们更好地了解网络。

library(ggplot2)
data.frame(x = result) %>% 
  ggplot(aes(x)) + 
  geom_histogram() + 
  geom_vline(xintercept = mean(result), col = "red") # global efficiency
  theme_minimal() +
  labs(title = "Nodal efficiences", x = "", y = "Count")

计算非常大的网络的图表效率指标

问题描述投票：0回答：1

What are strategies for calculating the nodal and global efficiency of very large graphs in R?

1个回答

最新问题

计算非常大的网络的图表效率指标

问题描述 投票：0回答：1

What are strategies for calculating the nodal and global efficiency of very large graphs in R?

1个回答

最新问题

问题描述投票：0回答：1