PostGIS 集群问题

问题描述 投票:0回答:1

我正在尝试编写一个函数,该函数将使用 PostGIS 根据传递给函数的边界框对几何点进行分组。

我创建了下表:

CREATE TABLE items (
    id SERIAL PRIMARY KEY,
    location GEOMETRY(Point, 4326)  -- SRID 4326 for geographic coordinates
);

并使用以下代码在指定的边界框中填充随机数据:

DO
$$
DECLARE
    i INT := 0;
    min_lat FLOAT := 54.542;
    max_lat FLOAT := 54.559;
    min_lon FLOAT := -5.761;
    max_lon FLOAT := -5.729;
    rand_lat FLOAT;
    rand_lon FLOAT;
BEGIN
    WHILE i < 1000 LOOP
        -- Generate random latitude and longitude within the bounding box
        rand_lat := min_lat + (max_lat - min_lat) * random();
        rand_lon := min_lon + (max_lon - min_lon) * random();
        
        -- Insert a new row with the generated location
        INSERT INTO items (location)
        VALUES (ST_SetSRID(ST_MakePoint(rand_lon, rand_lat), 4326));

        i := i + 1;
    END LOOP;
END
$$;

执行聚类的函数如下所示:

CREATE OR REPLACE FUNCTION get_items_within_bbox(
    min_x FLOAT, 
    min_y FLOAT, 
    max_x FLOAT, 
    max_y FLOAT, 
    cluster_distance FLOAT
)
RETURNS TABLE (
    item_id INT, 
    cluster_id BIGINT, 
    cluster_geom GEOMETRY, 
    cluster_count BIGINT
) AS $$
DECLARE
    bbox GEOMETRY;
BEGIN
    -- Create the bounding box geometry
    bbox := ST_MakeEnvelope(min_x, min_y, max_x, max_y, 4326);

    RETURN QUERY
    WITH items_in_bbox AS (
        SELECT 
            i.id AS item_id, 
            i.location AS item_location
        FROM items i
        WHERE ST_Intersects(i.location, bbox) 
    ),
    clusters AS (
        SELECT 
            unnest(ST_ClusterWithin(item_location, cluster_distance)) AS cluster_geom
        FROM items_in_bbox
    ),
    cluster_points AS (
        SELECT 
            i.item_id AS point_id,
            ROW_NUMBER() OVER () AS point_cluster_id, 
            c.cluster_geom AS cluster_geom
        FROM items_in_bbox i
        JOIN clusters c ON ST_Intersects(i.item_location, c.cluster_geom) 
    ),
    cluster_counts AS (
        SELECT 
            cp.point_cluster_id AS cluster_id, 
            COUNT(*) AS cluster_count 
        FROM cluster_points cp
        GROUP BY cp.point_cluster_id
    )
    SELECT 
        MIN(cp.point_id) AS item_id,  
        cp.point_cluster_id AS cluster_id, 
        ST_Collect(cp.cluster_geom) AS cluster_geom,  
        cc.cluster_count AS cluster_count
    FROM cluster_points cp
    JOIN cluster_counts cc ON cp.point_cluster_id = cc.cluster_id 
    GROUP BY cp.point_cluster_id, cc.cluster_count; 
END;
$$ LANGUAGE plpgsql;

然后我尝试使用以下方式调用该函数:

SELECT * FROM get_items_within_bbox(
    -5.761,  -- min_x (West Longitude)
    54.542,  -- min_y (South Latitude)
    -5.729,  -- max_x (East Longitude)
    54.559,  -- max_y (North Latitude)
    1609.34  -- cluster_distance in meters
);

无论我为 cluster_distance 提供什么值,我总是会返回 1000 行数据,并且每个项目的 cluster_count 始终为 1。

postgresql geospatial postgis spatial
1个回答
0
投票

您正在使用几何类型,因此 PostGIS 将所有距离计算为笛卡尔距离,特别是在以度为单位的无意义距离的情况下。因此,所有距离都远低于您给它的 1609(PostGIS 不是将其解释为米,而是与您传递给 st_makepoint 的单位相同):

# select st_distance(st_setsrid(st_makepoint(-5.761, 54.542), 4326),
                     st_setsrid(st_makepoint(-5.729,  54.559), 4326));
     st_distance      
----------------------
 0.036235341863984985
(1 row)

对于 PostGIS,SRID 仅在您尝试使用

ST_Transform
转换为另一个投影时才重要,但在计算距离时则无关紧要。 ST_ClusterWithin doc 特别提到了这一点:距离是以 SRID 为单位的笛卡尔距离

PostGIS 没有实现地理类型的聚类算法(它进行适当的距离计算)。因此,要在 PostGIS 中以米为单位进行聚类,您可能需要首先将数据转换为合适的投影。

© www.soinside.com 2019 - 2024. All rights reserved.