我正在尝试编写一个函数,该函数将使用 PostGIS 根据传递给函数的边界框对几何点进行分组。
我创建了下表:
CREATE TABLE items (
id SERIAL PRIMARY KEY,
location GEOMETRY(Point, 4326) -- SRID 4326 for geographic coordinates
);
并使用以下代码在指定的边界框中填充随机数据:
DO
$$
DECLARE
i INT := 0;
min_lat FLOAT := 54.542;
max_lat FLOAT := 54.559;
min_lon FLOAT := -5.761;
max_lon FLOAT := -5.729;
rand_lat FLOAT;
rand_lon FLOAT;
BEGIN
WHILE i < 1000 LOOP
-- Generate random latitude and longitude within the bounding box
rand_lat := min_lat + (max_lat - min_lat) * random();
rand_lon := min_lon + (max_lon - min_lon) * random();
-- Insert a new row with the generated location
INSERT INTO items (location)
VALUES (ST_SetSRID(ST_MakePoint(rand_lon, rand_lat), 4326));
i := i + 1;
END LOOP;
END
$$;
执行聚类的函数如下所示:
CREATE OR REPLACE FUNCTION get_items_within_bbox(
min_x FLOAT,
min_y FLOAT,
max_x FLOAT,
max_y FLOAT,
cluster_distance FLOAT
)
RETURNS TABLE (
item_id INT,
cluster_id BIGINT,
cluster_geom GEOMETRY,
cluster_count BIGINT
) AS $$
DECLARE
bbox GEOMETRY;
BEGIN
-- Create the bounding box geometry
bbox := ST_MakeEnvelope(min_x, min_y, max_x, max_y, 4326);
RETURN QUERY
WITH items_in_bbox AS (
SELECT
i.id AS item_id,
i.location AS item_location
FROM items i
WHERE ST_Intersects(i.location, bbox)
),
clusters AS (
SELECT
unnest(ST_ClusterWithin(item_location, cluster_distance)) AS cluster_geom
FROM items_in_bbox
),
cluster_points AS (
SELECT
i.item_id AS point_id,
ROW_NUMBER() OVER () AS point_cluster_id,
c.cluster_geom AS cluster_geom
FROM items_in_bbox i
JOIN clusters c ON ST_Intersects(i.item_location, c.cluster_geom)
),
cluster_counts AS (
SELECT
cp.point_cluster_id AS cluster_id,
COUNT(*) AS cluster_count
FROM cluster_points cp
GROUP BY cp.point_cluster_id
)
SELECT
MIN(cp.point_id) AS item_id,
cp.point_cluster_id AS cluster_id,
ST_Collect(cp.cluster_geom) AS cluster_geom,
cc.cluster_count AS cluster_count
FROM cluster_points cp
JOIN cluster_counts cc ON cp.point_cluster_id = cc.cluster_id
GROUP BY cp.point_cluster_id, cc.cluster_count;
END;
$$ LANGUAGE plpgsql;
然后我尝试使用以下方式调用该函数:
SELECT * FROM get_items_within_bbox(
-5.761, -- min_x (West Longitude)
54.542, -- min_y (South Latitude)
-5.729, -- max_x (East Longitude)
54.559, -- max_y (North Latitude)
1609.34 -- cluster_distance in meters
);
无论我为 cluster_distance 提供什么值,我总是会返回 1000 行数据,并且每个项目的 cluster_count 始终为 1。
您正在使用几何类型,因此 PostGIS 将所有距离计算为笛卡尔距离,特别是在以度为单位的无意义距离的情况下。因此,所有距离都远低于您给它的 1609(PostGIS 不是将其解释为米,而是与您传递给 st_makepoint 的单位相同):
# select st_distance(st_setsrid(st_makepoint(-5.761, 54.542), 4326),
st_setsrid(st_makepoint(-5.729, 54.559), 4326));
st_distance
----------------------
0.036235341863984985
(1 row)
对于 PostGIS,SRID 仅在您尝试使用
ST_Transform
转换为另一个投影时才重要,但在计算距离时则无关紧要。 ST_ClusterWithin doc 特别提到了这一点:距离是以 SRID 为单位的笛卡尔距离。
PostGIS 没有实现地理类型的聚类算法(它进行适当的距离计算)。因此,要在 PostGIS 中以米为单位进行聚类,您可能需要首先将数据转换为合适的投影。