Python/Pyspark,返回带有 Z 值的 geojson 记录/截断 Z

问题描述 投票:0回答:1

我有一个数据框,其中有一列名为“几何”,它包含多多边形和多边形值。我想使用 Pyspark 或 Python 查找并返回坐标包含 [X, Y, Z] 的位置;还想创建一个代码块来删除 Z 值(如果存在)。我该怎么做?下面的示例,我想返回第一个坐标值,而不返回任何其他值。

我想做类似下面的事情,但不知道如何将新列附加到数据帧,查找并仅返回具有 X、Y、Z 几何形状的行所需的编码:

我正在考虑使用代码来查找 Z 值,但不起作用:

for row in source_df.collect():
    z_val = len(row.geometry["coordinates"]) =3 for x in row.geometry["coordinates"]]

示例数据:

{
    "type": "MultiPolygon",
    "coordinates": [
        [-120.92484404138442,35.54577502278743,0.0], 
        [-120.92484170835023,35.545764670080004], 
        [-120.92470946198651,35.54517811398435], 
        [-120.92373579577058,35.54476080459215], 
        [-120.92224560209857,35.544644824151], 
        [-120.91471743922112,35.54405891151482], 
        [-120.9137131887035,35.541405607829184], 
        [-120.91370267246779,35.54138005556737], 
        [-120.91368022915093,35.54133577314701], 
        [-120.91365314934913,35.54129325687539], 
        [-120.91364620938849,35.541283659095036], 
        [-120.91019544280519,35.53661949063082], 
        [-120.91016692865233,35.536584105321104], 
        [-120.91013516362523,35.53655061941634], 
        [-120.9101046793985,35.53652289241281], 
        [-120.90545581970368,35.53257237955164], 
        [-120.90540343303125,35.53253236763702]
    ]
}
python pyspark geojson
1个回答
0
投票

您可以做的是在sparkframe中使用

udf
,它可用于解析
JSON string
,允许轻松检查我们是否有('Polygon'或'MultiPolygon'),并迭代坐标以识别任何包含三个值(X、Y、Z)。 最后,您可以使用布尔列(z_column),并且过滤
df
以仅保留该列为True的行。

# create df
df = spark.createDataFrame(data)

# define a UDF to check for the z cords
def contains_z_coordinate(geometry):
    try:
        geometry_object = json.loads(geometry)
        # Check if it's a Polygon or MultiPolygon
        if geometry_object["type"] in ["Polygon", "MultiPolygon"]:
            # Extract the first coordinate set
            coords = geometry_object["coordinates"]
            if geometry_object["type"] == "Polygon":
                coords = [coords]  # Convert Polygon to MultiPolygon
            # Check if any point has a z-coordinate
            for polygon_coords in coords:
                for ring_coordinates in polygon_coords:
                    for point in ring_coordinates:
                        if len(point) == 3:
                            return True
    except (KeyError, TypeError, json.JSONDecodeError):
        return False
    return False


# Register the UDF
z_udf = udf(contains_z_coordinate, BooleanType())

# Add a column with UDF and filter
result_df_using_filter = df.withColumn("has_z", z_udf(col("geometry"))).filter(
    col("z_column")
)

# results
result_df_using_filter.show(truncate=False)

结果

                                                                                
+--------------------------------------------------------------------------------------------------------------------------------+--------+
|geometry                                                                                                                        |z_column|
+----------------------------------------------------------------------------------------------------------------------------------+------+
|{"type":"MultiPolygon","coordinates":[[[[-120.92484404138442,35.54577502278743,0.0],[-120.92484170835023,35.545764670080004]]]]}|true    |
+----------------------------------------------------------------------------------------------------------------------------------+-----+
© www.soinside.com 2019 - 2024. All rights reserved.