最有效的方法来计算许多点之间的地理空间距离？

Question

我有两个数据集，一个数据集描述位置，第二个数据集具有不同的点：

locations.head()
  latitude  longitude  geobounds_lon1  geobounds_lat1  geobounds_lon2  geobounds_lat2
0  52.5054   13.33320        13.08830         52.6755         13.7611         52.3382      
1  54.6192    9.99778         7.86496         55.0581         11.3129         53.3608     
2  41.6671  -71.27420       -71.90730         42.0188        -71.0886         41.0958    
3  25.9859  -80.12280       -87.81370         30.9964        -78.9917         24.5071   
4  43.7004   11.51330         9.63364         44.5102         12.4104         42.1654     

points.head()
   category        lat        lon
0       161  47.923132  11.507743 
1       161  47.926479  11.531736 
2       161  47.943670  11.576099   
3       161  57.617577  12.040591  
4        23  52.124071  -0.491918

我需要计算从每个报价（基于locations.latitude和locations.longitude）到每个类别的每个点（例如，161）的距离。对我来说，只有那些离位置不远的点很重要-我认为使用位置边界可能会有所帮助，因此我不需要计算所有距离然后进行过滤。

对我来说，最大的问题是如何有效地过滤每个位置的点（基于类别和边界），并计算从位置点到这些点的距离，因为数据量非常大（位置中有近900万行，甚至更多）超过1000万行）。

为了计算距离，我尝试了BallTree：

RADIANT_TO_KM_CONSTANT = 6367

class BallTreeIndex:
    def __init__(self,lat_longs):
        self.lat_longs = np.radians(lat_longs)
        self.ball_tree_index = BallTree(self.lat_longs, leaf_size=40, metric='haversine')

    def query_radius(self,query,radius):
        radius_radiant = radius / RADIANT_TO_KM_CONSTANT 
        query = np.radians(np.array([query]))
        result = self.ball_tree_index.query_radius(query, r=radius_radiant,
                                                return_distance=True) 
        return result[1][0]

并且用于过滤点：

condition = (points.category == c) & (points.lat > lat2) & (points.lat < lat1) & (points.lon < lon2) & (points.lon > lon1)
tmp = points[condition]

其中c是特定类别，lat1，lat2，lon1，lon2是位置边界。但是，这将花费很多时间，所以我想知道是否有任何方法可以使其更快。

例如，我想在位置数据框中添加一个新列：

                    distances_161
0 [distance0_0, distance0_1, ...]
1 [distance1_0, distance1_1, ...]
2 [distance2_1, distance2_2, ...]

Answer 1

我知道这是一篇过时的文章，但是由于没有人回复，因此我将对其进行刺探。现在，我不能100％地确定这就是您想要的，但对我来说似乎很有意义。

import numpy as np
import pandas

def haversine_np(lon1, lat1, lon2, lat2):
    """
    Calculate the great circle distance between two points
    on the earth (specified in decimal degrees)

    All args must be of equal length.    

    """
    lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])

    dlon = lon2 - lon1
    dlat = lat2 - lat1

    a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2

    c = 2 * np.arcsin(np.sqrt(a))
    km = 6367 * c
    return km




df = {'lon1': [40.7454513], 
'lat1': [-73.9536799], 
'lon2': [40.7060268], 
'lat2': [-74.0110188]}
df


df['distance'] = haversine_np(df['lon1'],df['lat1'],df['lon2'],df['lat2'])

结果：

array([6.48545403])

因此，Python表示6.485英里，Google表示6.5英里。那有意义吗？它对您有帮助吗？

最有效的方法来计算许多点之间的地理空间距离？

问题描述投票：1回答：1

1个回答

最新问题

最有效的方法来计算许多点之间的地理空间距离？

问题描述 投票：1回答：1

1个回答

最新问题

问题描述投票：1回答：1