我正在分析相邻组织在创新方面如何相互影响。我有一个数据集,其中包含创新指数值和纬度/经度数据等。我需要获取一定距离(例如 50 公里)内所有情况的指数平均值。
我的数据集如下所示:
df <- data.frame("Name" = c("A","B","C","D","E","F"),
"Index" = c(5,2,8,3,5,9),
"Lat" = c(42.1234, 41.0192, 40.9988, 51.0175, 50.6523, 50.9214),
"Lon" = c(26.5462, 25.9967, 27.0001, 31.1542, 31.8924, 32.1025))
df
> Name Index Lat Lon
> 1 A 5 42.1234 26.5462
> 2 B 2 41.0192 25.9967
> 3 C 8 40.9988 27.0001
> 4 D 3 51.0175 31.1542
> 5 E 5 50.6523 31.8924
> 6 F 9 50.9214 32.1025
我想要得到的是一个看起来像这样的数据框,其中“NearbyIndex”变量显示相对较近案例的平均“Index”值:
df2
> Name Index Lat Lon NearbyIndex
> 1 A 5 42.1234 26.5462 5.0
> 2 B 2 41.0192 25.9967 6.5
> 3 C 8 40.9988 27.0001 3.5
> 4 D 3 51.0175 31.1542 7.0
> 5 E 5 50.6523 31.8924 6.0
> 6 F 9 50.9214 32.1025 4.0
不幸的是,到目前为止我还没有使用过空间数据,所以我不太确定如何最好地解决这个问题。虽然我可以找到很多有关如何计算距离的信息,但我不知道如何获取我正在寻找的值。
好吧,如果你的 df 有很多行,也许我的解决方案在速度方面不是最好的,但作为第一次尝试,我可能会很有用。
# Your data
df <- data.frame("Name" = c("A","B","C","D","E","F"),
"Index" = c(5,2,8,3,5,9),
"Lat" = c(42.1234, 41.0192, 40.9988, 51.0175, 50.6523, 50.9214),
"Lon" = c(26.5462, 25.9967, 27.0001, 31.1542, 31.8924, 32.1025))
# Setting the distance threshold (I decided to change to 80 km because 50 km was
# too low for apprecaiting correctly the example)
dist_threshold_km <- 80
# Installing terra package
# install.packages("terra")
# Creating a matrix of distances
# 'lonlat = TRUE' is for applying a Great Circle (WGS84 ellipsoid) distance.
# 'unit = km' do exist but it is not working properly right now, so it'd
# be better to require the values in meters and then divide them by 1e3
distMat <- terra::distance(x = as.matrix(df[,c("Lon", "Lat")]),
y = as.matrix(df[,c("Lon", "Lat")]),
lonlat = TRUE, unit = "m")/1e3
# 'distMat <= dist_threshold_km' converts your matrix of distance in a boolean
# matrix where the only the values that are less or equal to the threshold will
# be TRUE. Then every row is used for indexing your df$Index and calculates the
# mean.
df$NearbyIndex <- apply(X = distMat <= dist_threshold_km, MARGIN = 1,
FUN = \(x, df) mean(df$Index[x]), df = df)