总的来说,我有一个数据框,其中包含连接了空间变量的建筑物。然后我有另一个文件,例如森林,同样
总_df:
身份证 | 变量 | SFC 点对象 | 邮政编码 |
---|---|---|---|
1 | 10 | 点(543611.8 6389285) | 2324 |
2 | 15 | 点(513611.8 6349285) | 2324 |
3 | 12 | 点(533611.8 6359285) | 2329 |
大约 200 万个观察值
森林_距离:
身份证 | 变量 | SFC 多边形对象 |
---|---|---|
1 | 10 | 多边形 Z ((455302.7 6252026 9.09, 455292.6 6252034 9.09, 455274.8 6252036 9.9, 455246 6252113 14.25, 455286.1 6252124 14.15, 455293.5 6252126 14.13, 455317.8 6252068 14.13, 455331.5 6252073 14.13, 455345.5 6252044 14.78, 455302.7 6252026 9.09)) |
forest_distance保存在list中,其中原始forest_distance被分成10等份。
我已经计算出了要做的事情之间的距离,并且我还分割了 Total_df,以便在由邮政编码决定的较小子集上进行。
但是现在,为了加快计算速度,我想做一个并行化,我还将forest_distance细分为更小的文件。
我想进行并行化会更快,这样每个会话都会执行细分的forest_distance的一部分。
另外,是否可以从不同的会话进行打印,以查看进度?
registerDoParallel(cores = 6)
# Use foreach to loop over list.dfs in parallel
foreach(d = 1:length(list.dfs), .packages = "sf", .combine = 'c') %dopar% {
# Get the data frame at position 'd' in the list
df <- list.dfs[[d]]
# Open a list to store combined inner results
grand_list <- list()
# Initialize an empty list to store the results of the inner loop
inner_results <- list()
# zip_code
zipcode <- sort(unique(Total_df$zipcode))
# Use a regular for loop to iterate over zipcode
for(i in zipcode) {
cat(i, "\n")
start_time <- Sys.time()
# Subset the data
subset_df <- Total_df[Total_df$zipcode == i, ]
if(nrow(subset_df) > 0) {
# Calculate distances
distances <- sf::st_distance(subset_df, df)
# Define the 'miin' function, or replace it with an appropriate function
miin <- function(x) min(x, na.rm = TRUE)
# Calculate minimum distances
min_distances <- apply(distances, 1, miin)
# Store minimum distances in a new column
subset_df$min_distances <- min_distances
}
end_time <- Sys.time()
print(paste("Time for municipality Forest", i, ": ", end_time - start_time))
# Store the updated subset_df in the inner_results list
inner_results[[i]] <- subset_df
}
# Combine the results of the inner loop using do.call
grand_list[[d]] <- do.call(rbind, inner_results)
}
它已经运行了好几个小时,不得不停止,但期间没有保存任何结果。
这是未经尝试的,但重写类似的东西可能会起作用:
registerDoParallel(cores = 6)
# Use foreach to loop over list.dfs in parallel
grand_list <- foreach(df = list.dfs, .packages = "sf") %dopar% {
# Initialize an empty list to store the results of the inner loop
inner_results <- list()
# zip_code
zipcode <- sort(unique(Total_df$zipcode))
# Use a regular for loop to iterate over zipcode
for(i in zipcode) {
cat(i, "\n")
start_time <- Sys.time()
# Subset the data
subset_df <- Total_df[Total_df$zipcode == i, ]
if(nrow(subset_df) > 0) {
# Calculate distances
distances <- sf::st_distance(subset_df, df)
# Define the 'miin' function, or replace it with an appropriate function
miin <- function(x) min(x, na.rm = TRUE)
# Calculate minimum distances
min_distances <- apply(distances, 1, miin)
# Store minimum distances in a new column
subset_df$min_distances <- min_distances
}
end_time <- Sys.time()
print(paste("Time for municipality Forest", i, ": ", end_time - start_time))
# Store the updated subset_df in the inner_results list
inner_results[[i]] <- subset_df
}
# Combine the results of the inner loop using do.call
do.call(rbind, inner_results)
}
(虽然你做的打印可能不起作用)
提示:使用 %do% 而不是 %dopar% 调试代码,并仅运行前两个值:
grand_list <- foreach(df = list.dfs[1:2], .packages = "sf") %do% { ... }
根据您的喜好填写调试语句等。当它起作用时,删除[1:2]并将其更改为dopar。