每个并行化中的 sf_distance

Question

总的来说，我有一个数据框，其中包含连接了空间变量的建筑物。然后我有另一个文件，例如森林，同样

总_df：

身份证	变量	SFC 点对象	邮政编码
1	10	点（543611.8 6389285）	2324
2	15	点（513611.8 6349285）	2324
3	12	点（533611.8 6359285）	2329

大约 200 万个观察值

森林_距离：

身份证	变量	SFC 多边形对象
1	10	多边形 Z ((455302.7 6252026 9.09, 455292.6 6252034 9.09, 455274.8 6252036 9.9, 455246 6252113 14.25, 455286.1 6252124 14.15, 455293.5 6252126 14.13, 455317.8 6252068 14.13, 455331.5 6252073 14.13, 455345.5 6252044 14.78, 455302.7 6252026 9.09))

forest_distance保存在list中，其中原始forest_distance被分成10等份。

我已经计算出了要做的事情之间的距离，并且我还分割了 Total_df，以便在由邮政编码决定的较小子集上进行。

但是现在，为了加快计算速度，我想做一个并行化，我还将forest_distance细分为更小的文件。

我想进行并行化会更快，这样每个会话都会执行细分的forest_distance的一部分。

另外，是否可以从不同的会话进行打印，以查看进度？

    registerDoParallel(cores = 6)    

# Use foreach to loop over list.dfs in parallel
foreach(d = 1:length(list.dfs), .packages = "sf", .combine = 'c') %dopar% {
  # Get the data frame at position 'd' in the list
  df <- list.dfs[[d]]
  
  # Open a list to store combined inner results 
  grand_list <- list()
  
  # Initialize an empty list to store the results of the inner loop
  inner_results <- list()
  
  # zip_code 
  zipcode <- sort(unique(Total_df$zipcode))
  

  # Use a regular for loop to iterate over zipcode
  for(i in zipcode) {
    cat(i, "\n")
    start_time <- Sys.time()
    
    # Subset the data
    subset_df <- Total_df[Total_df$zipcode == i, ]
    
    if(nrow(subset_df) > 0) {
      # Calculate distances
      distances <- sf::st_distance(subset_df, df)
      
      # Define the 'miin' function, or replace it with an appropriate function
      miin <- function(x) min(x, na.rm = TRUE)
      
      # Calculate minimum distances
      min_distances <- apply(distances, 1, miin)
      
      # Store minimum distances in a new column
      subset_df$min_distances <- min_distances
    }
    
    end_time <- Sys.time()
    print(paste("Time for municipality Forest", i, ": ", end_time - start_time))
    
    # Store the updated subset_df in the inner_results list
    inner_results[[i]] <- subset_df
  }
  
  # Combine the results of the inner loop using do.call
  grand_list[[d]] <- do.call(rbind, inner_results)
  
}

它已经运行了好几个小时，不得不停止，但期间没有保存任何结果。

Answer 1

这是未经尝试的，但重写类似的东西可能会起作用：


registerDoParallel(cores = 6)

# Use foreach to loop over list.dfs in parallel
grand_list <- foreach(df = list.dfs, .packages = "sf") %dopar% {

  # Initialize an empty list to store the results of the inner loop
  inner_results <- list()

  # zip_code
  zipcode <- sort(unique(Total_df$zipcode))


  # Use a regular for loop to iterate over zipcode
  for(i in zipcode) {
    cat(i, "\n")
    start_time <- Sys.time()

    # Subset the data
    subset_df <- Total_df[Total_df$zipcode == i, ]

    if(nrow(subset_df) > 0) {
      # Calculate distances
      distances <- sf::st_distance(subset_df, df)

      # Define the 'miin' function, or replace it with an appropriate function
      miin <- function(x) min(x, na.rm = TRUE)

      # Calculate minimum distances
      min_distances <- apply(distances, 1, miin)

      # Store minimum distances in a new column
      subset_df$min_distances <- min_distances
    }

    end_time <- Sys.time()
    print(paste("Time for municipality Forest", i, ": ", end_time - start_time))

    # Store the updated subset_df in the inner_results list
    inner_results[[i]] <- subset_df
  }

  # Combine the results of the inner loop using do.call
  do.call(rbind, inner_results)

}

（虽然你做的打印可能不起作用）

提示：使用 %do% 而不是 %dopar% 调试代码，并仅运行前两个值：

grand_list <- foreach(df = list.dfs[1:2], .packages = "sf") %do% { ... }

根据您的喜好填写调试语句等。当它起作用时，删除[1:2]并将其更改为dopar。

每个并行化中的 sf_distance

问题描述投票：0回答：1

1个回答

最新问题

每个并行化中的 sf_distance

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1