0我有两个数据集,其中的点数据位于“纬度”和“经度”列中。一组兴趣点由两个不同的人在两个不同的时间点绘制,我想检查它们的一致性。因此,第一数据集中的许多点对应于第二数据集中的点(大约在几米内)。从视觉上看,对于一个小样本,通过将它们绘制在彼此之上很容易完成,但是对于数百个点,我需要空间合并。我在geopandas中找到了
sjoin_nearest
方法。
最初,我使用的是geopandas 0.9,0.10中添加了
sjoin_nearest
,所以我使用Conda进行更新:
The following packages will be UPDATED:
ca-certificates 2023.05.30-hecd8cb5_0 --> 2024.3.11-hecd8cb5_0
certifi 2023.5.7-py39hecd8cb5_0 --> 2024.6.2-py39hecd8cb5_0
geopandas conda-forge/noarch::geopandas-0.9.0-p~ --> pkgs/main/osx-64::geopandas-0.12.2-py39hecd8cb5_0
geopandas-base conda-forge/noarch::geopandas-base-0.~ --> pkgs/main/osx-64::geopandas-base-0.12.2-py39hecd8cb5_0
openssl 1.1.1u-hca72f7f_0 --> 1.1.1w-hca72f7f_0
现在,我的代码出现以下错误:
import pandas as pd
import geopandas as gpd
df1 = pd.read_stata('df1.dta')
df1 = gpd.GeoDataFrame(df1, geometry = [Point(xy) for xy in zip(df1['lon'], df1['lat'])], crs="EPSG:4326")
df2 = pd.read_stata('df2.dta')
df2 = gpd.GeoDataFrame(df2, geometry = [Point(xy) for xy in zip(df2['lon'], df2['lat'])], crs="EPSG:4326")
df = df1.sjoin_nearest(df2,how='inner',max_distance=0.01,distance_col='dist')
Traceback (most recent call last):
File "<ipython-input-15-d6ba3cf8492f>", line 6, in <module>
df = df1.sjoin_nearest(df2,how='inner',max_distance=0.01,distance_col='dist')
File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/geodataframe.py", line 2173, in sjoin_nearest
return geopandas.sjoin_nearest(
File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/tools/sjoin.py", line 516, in sjoin_nearest
_basic_checks(left_df, right_df, how, lsuffix, rsuffix)
File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/tools/sjoin.py", line 150, in _basic_checks
raise ValueError(
ValueError: 'left_df' should be GeoDataFrame, got <class 'geopandas.geodataframe.GeoDataFrame'>
我看到了类似标题的问题here和here,但在这两种情况下,用户放入例程中的内容都不是GeoDataFrame(并且 ValueError 正确地指出了这一点)。就我而言,它们都是GeoDataFrame。 ValueError 似乎是自相矛盾的。使用 type(df1)
和
type(df2)
之类的东西可以确认两者都是 GeoDataFrame,并且像
df1.plot()
这样的常规 GeoDataFrame 任务工作得很好。我还检查了
sjoin_nearest()
文档是否有任何弃用警告,但空手而归。任何线索表示赞赏。
编辑:MRE(来自sjoin_nearest
import geopandas as gpd
import geodatasets
groceries = gpd.read_file(
geodatasets.get_path("geoda.groceries")
)
chicago = gpd.read_file(
geodatasets.get_path("geoda.chicago_health")
).to_crs(groceries.crs)
groceries_w_communities = gpd.sjoin_nearest(groceries, chicago)
Traceback (most recent call last):
File "<ipython-input-25-75e25adffbfd>", line 9, in <module>
groceries_w_communities = gpd.sjoin_nearest(groceries, chicago)
File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/tools/sjoin.py", line 516, in sjoin_nearest
_basic_checks(left_df, right_df, how, lsuffix, rsuffix)
File "/Users/moritz/Documents/GitHub/networkcoordination/python-environment/lib/python3.9/site-packages/geopandas/tools/sjoin.py", line 150, in _basic_checks
raise ValueError(
ValueError: 'left_df' should be GeoDataFrame, got <class 'geopandas.geodataframe.GeoDataFrame'>
conda update
的小插曲。安装后我没有重新启动 Spyder,因为更新后可以识别
sjoin_nearest
命令。关闭 Spyder,停用环境,再次激活它,重新启动 Spyder,然后运行相同的代码。