如何根据基于纬度/经度的地理距离选择的其他项目的特征创建新行？

Question

我正在分析相邻组织在创新方面如何相互影响。我有一个数据集，其中包含创新

Index

和

Lat

/

Lon

数据的值。我需要获取一定距离（例如 50 公里）内所有情况的指数平均值。

如何为数据帧的每个条目创建一个新行，这些条目是根据纬度/经度从特定地理距离内的其他条目派生的？

作为一个具体示例，我的数据集如下所示：

df <- data.frame("Name" = c("A","B","C","D","E","F"), 
                 "Index" = c(5,2,8,3,5,9), 
                 "Lat" = c(42.1234, 41.0192, 40.9988, 51.0175, 50.6523, 50.9214), 
                 "Lon" = c(26.5462, 25.9967, 27.0001, 31.1542, 31.8924, 32.1025))

df
>   Name Index     Lat     Lon
> 1    A     5 42.1234 26.5462
> 2    B     2 41.0192 25.9967
> 3    C     8 40.9988 27.0001
> 4    D     3 51.0175 31.1542
> 5    E     5 50.6523 31.8924
> 6    F     9 50.9214 32.1025

我想要得到的是一个看起来像这样的数据框，其中

NearbyIndex

变量显示附近案例的平均

Index

值：

df2
>   Name Index     Lat     Lon NearbyIndex
> 1    A     5 42.1234 26.5462         5.0
> 2    B     2 41.0192 25.9967         6.5
> 3    C     8 40.9988 27.0001         3.5
> 4    D     3 51.0175 31.1542         7.0
> 5    E     5 50.6523 31.8924         6.0
> 6    F     9 50.9214 32.1025         4.0

Answer 1

好吧，如果您的

df

有很多行，也许我的解决方案在效率方面不是最好的，但作为第一次尝试它可能很有用。

# Your data
df <- data.frame("Name" = c("A","B","C","D","E","F"), 
                 "Index" = c(5,2,8,3,5,9), 
                 "Lat" = c(42.1234, 41.0192, 40.9988, 51.0175, 50.6523, 50.9214), 
                 "Lon" = c(26.5462, 25.9967, 27.0001, 31.1542, 31.8924, 32.1025))

# Setting the distance threshold (I decided to change to 80 km because 50 km was 
# too low for apprecaiting correctly the example)
dist_threshold_km <- 80


# Installing terra package
# install.packages("terra")

# Creating a matrix of distances
# 'lonlat = TRUE' is for applying a Great Circle (WGS84 ellipsoid) distance.
# 'unit = km' do exist but it is not working properly right now, so it'd
# be better to require the values in meters and then divide them by 1e3
distMat <- terra::distance(x = as.matrix(df[,c("Lon", "Lat")]),
                           y = as.matrix(df[,c("Lon", "Lat")]),
                           lonlat = TRUE, unit = "m")/1e3

# 'distMat <= dist_threshold_km' converts your matrix of distance in a boolean
# matrix where the only the values that are less or equal to the threshold will 
# be TRUE. Then every row is used for indexing your df$Index and calculates the 
# mean.
df$NearbyIndex <- apply(X = distMat <= dist_threshold_km, MARGIN = 1, 
                        FUN = \(x, df) mean(df$Index[x]), df = df)

如何根据基于纬度/经度的地理距离选择的其他项目的特征创建新行？

问题描述投票：0回答：1

1个回答

最新问题

如何根据基于纬度/经度的地理距离选择的其他项目的特征创建新行？

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1