我正在简化 MYSQL 数据库(v 8.0)中的邮政编码多边形,我正在减少每个多边形的坐标数量。
因此,我有一个名为
zip_city
的表,其中包含名为 boundary
的列,它是原始的多多边形列,并且我使用简化的多边形 boundary_simplified
创建了另一个表。它们都有 SRID 4326(我已包含位置列,因为它可能很重要):
+---------------------+--------------------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+---------------------+--------------------------------+------+-----+---------+----------------+
| boundary | multipolygon | NO | MUL | NULL | |
| is_point | tinyint unsigned | NO | MUL | 0 | |
| boundary_simplified | multipolygon | NO | MUL | NULL | |
+---------------------+--------------------------------+------+-----+---------+----------------+
运行 SHOW INDEXES,我有这个:
mysql> SHOW INDEXES FROM zip_city;
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | Expression |
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
| zip_city | 1 | idx_is_point | 1 | is_point | A | 2 | NULL | NULL | | BTREE | | | YES | NULL |
| zip_city | 1 | boundary | 1 | boundary | A | 34287 | 32 | NULL | | SPATIAL | | | YES | NULL |
| zip_city | 1 | boundary_simplified | 1 | boundary_simplified | A | 34287 | 32 | NULL | | SPATIAL | | | YES | NULL |
+----------+------------+---------------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+---------------+---------+------------+
看起来完全一样,但是当我尝试使用
st_contains
运行查询时,它对它们的作用不同,例如:
mysql> SELECT zip FROM zip_city
WHERE
ST_CONTAINS(boundary, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
AND is_point = 0 LIMIT 1;
+-------+
| zip |
+-------+
| 99901 |
+-------+
1 row in set (0.03 sec)
mysql> SELECT zip FROM zip_city
WHERE
ST_CONTAINS(boundary_simplified, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
AND
is_point = 0 LIMIT 1;
+-------+
| zip |
+-------+
| 99901 |
+-------+
1 row in set (4.84 sec)
当我解释这两个查询时,我发现使用boundary_simplified的查询没有使用索引:
mysql> EXPLAIN SELECT zip FROM zip_city
WHERE
ST_CONTAINS(boundary, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
AND
is_point = 0 LIMIT 1;
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
| 1 | SIMPLE | zip_city | NULL | range | idx_is_point,boundary | boundary | 34 | NULL | 1 | 50.00 | Using where |
+----+-------------+----------+------------+-------+-----------------------+----------+---------+------+------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
mysql> EXPLAIN SELECT zip FROM zip_city
WHERE
ST_CONTAINS(boundary_simplified, ST_GeomFromGeoJSON('{"type": "Point", "coordinates": [-131.64, 55.34]}'))
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
| 1 | SIMPLE | zip_city | NULL | ref | idx_is_point | idx_is_point | 1 | const | 17143 | 100.00 | Using where |
+----+-------------+----------+------------+------+---------------+--------------+---------+-------+-------+----------+-------------+
1 row in set, 1 warning (0.00 sec)
这方面有什么线索吗?我觉得我错过了一些简单的东西,但我找不到有关此的信息。另外,在创建索引时,对于
boundary
列需要~23.25秒,对于boundary_simplified
只需要~0.75秒(这很奇怪。坐标会影响索引的效率吗?)
我尝试删除两个索引并分别创建它们,我测试了没有改变的索引的行为,当然,我尝试在查询中使用 FORCE INDEX 或 USE INDEX ,这导致了相同/更糟糕的行为。
编辑:由于 user1191247 的观察,我修复了显示的索引。另外,我没有显示完整的表格信息,因为它没有用。
感谢用户1191247的评论,我查找了他询问的信息,找到了这个:
| zip_city | CREATE TABLE `zip_city` (
`id` int unsigned NOT NULL AUTO_INCREMENT,
`state_id` int unsigned NOT NULL,
`zip` mediumint(5) unsigned zerofill NOT NULL,
`city` varchar(64) NOT NULL,
`slug` varchar(64) NOT NULL,
`location` point NOT NULL /*!80003 SRID 4326 */,
`boundary` multipolygon NOT NULL /*!80003 SRID 4326 */,
`is_point` tinyint unsigned NOT NULL DEFAULT '0',
`fit_market` tinyint unsigned NOT NULL DEFAULT '0',
`boundary_simplified` multipolygon NOT NULL,
PRIMARY KEY (`id`),
KEY `fk_zip_to_city_state1_idx` (`state_id`),
KEY `idx_zip` (`zip`),
KEY `idx_slug` (`slug`),
KEY `idx_city` (`city`),
SPATIAL KEY `idx_location` (`location`),
SPATIAL KEY `boundary` (`boundary`),
SPATIAL KEY `boundary_simplified` (`boundary_simplified`),
CONSTRAINT `fk_zip_to_city_state1` FOREIGN KEY (`state_id`) REFERENCES `state` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=41381 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
,正如您所看到的,
boundary_simplified
缺少SRID定义,这对于索引正常工作至关重要(使用SELECT DISTINCT ST_SRID(boundary_simplified) FROM zip_city;
我已经获得了SRID 4326,所以我不认为这是问题所在,但是列定义中缺少它)。我通过运行这些查询解决了这个问题:
DROP INDEX boundary_simplified ON zip_city;
ALTER TABLE zip_city MODIFY COLUMN boundary_simplified MULTIPOLYGON NOT NULL SRID 4326;
(花了~53秒)
ALTER TABLE zip_city ADD SPATIAL INDEX idx_boundary_simplified (boundary_simplified);
(现在大约需要 24 秒,这已经是好消息了)
然后 INDEX 完美运行:)