我有两个数据库,每个数据库都包含数据和 "标签"(用*_p符号表示)。在一个数据库中,标签被嵌入到数据中(存储在表中),在另一个数据库中,标签存储在不同的表中,因此需要一个连接来访问标签。对于大多数查询,使用嵌入式标签的变体更快,除了一个查询。我想知道是否有人能给我一些关于为什么会这样的见解,我自己对postgres、sql或数据库的细节不是很熟悉。我把'EXPLAIN ANALYZE'的输出列在下面。先谢谢你。
数据库是由地点和与这些地点点相关联的用户组成的,数据是用*_p字段标注的。这两个数据库之间的区别是,在一个数据库中,标签与数据嵌入在同一个表中,而在另一个数据库中,标签存储在不同的表中,因此需要一个额外的连接。除了这个区别之外,两个查询做的事情是完全一样的。对于大多数查询,我们看到嵌入标签的方法更快,但对于这个特定的查询,它更慢,我想知道为什么,如果可能的话。我并不是真的在寻找详细的答案,我只是想知道是否有什么明显的原因导致一个查询计划比另一个慢。
外部标签查询。
SELECT firstname, lastname, latitude, longitude
FROM locations INNER
JOIN users ON locations.userid = users.id
JOIN users_p users0x ON users.id = users0x.users_id
JOIN locations_p locations0x ON locations.id = locations0x.locations_id
WHERE country = ? AND date_part('year', age(birthdate)) > 18
AND (date_part('year', to_date(timestamp, 'YYYY-MM-DD')) BETWEEN 2010 AND 2019)
AND (locations0x.longitude_p & '2') != 0 AND (users0x.lastname_p & '2') != 0 AND (users0x.firstname_p & '2') != 0 AND (users0x.birthdate_p & '2') != 0 AND (locations0x.timestamp_p & '2') != 0
AND (users0x.country_p & '2') != 0 AND (locations0x.latitude_p & '2') != 0
解释分析输出。
Nested Loop (cost=0.42..10116.71 rows=12 width=22) (actual time=317.811..331.306 rows=954 loops=1)
-> Nested Loop (cost=0.00..10051.48 rows=12 width=26) (actual time=317.793..328.701 rows=954 loops=1)
Join Filter: (locations.userid = users0x.users_id)
Rows Removed by Join Filter: 94446
-> Seq Scan on users_p users0x (cost=0.00..4.00 rows=96 width=4) (actual time=0.007..0.051 rows=100 loops=1)
Filter: (((country_p & 2) <> 0) AND ((birthdate_p & 2) <> 0) AND ((lastname_p & 2) <> 0) AND ((firstname_p & 2) <> 0))
-> Materialize (cost=0.00..10030.23 rows=12 width=34) (actual time=0.000..3.209 rows=954 loops=100)
-> Nested Loop (cost=0.00..10030.17 rows=12 width=34) (actual time=0.034..317.266 rows=954 loops=1)
Join Filter: (locations.userid = users.id)
Rows Removed by Join Filter: 98382
-> Seq Scan on users (cost=0.00..6.25 rows=1 width=18) (actual time=0.019..0.043 rows=2 loops=1)
Filter: ((country = 'Colombia'::text) AND (date_part('year'::text, age((('now'::cstring)::date)::timestamp with time zone, (to_date(birthdate, 'YYYY-MM-DD'::text))::timestamp with time zone)) > '18'::double precision))
Rows Removed by Filter: 98
-> Seq Scan on locations (cost=0.00..10008.30 rows=1250 width=16) (actual time=0.007..155.286 rows=49668 loops=2)
Filter: ((date_part('year'::text, (to_date("timestamp", 'YYYY-MM-DD'::text))::timestamp without time zone) >= '2010'::double precision) AND (date_part('year'::text, (to_date("timestamp", 'YYYY-MM-DD'::text))::timestamp without time zone) <= '2019'::double precision))
Rows Removed by Filter: 200332
-> Index Scan using locations_purpose_index on locations_p locations0x (cost=0.42..5.43 rows=1 width=4) (actual time=0.002..0.002 rows=1 loops=954)
Index Cond: (locations_id = locations.id)
Filter: (((latitude_p & 2) <> 0) AND ((longitude_p & 2) <> 0) AND ((timestamp_p & 2) <> 0))
Planning time: 0.555 ms
Execution time: 331.449 ms
嵌入式标签查询:
SELECT firstname, lastname, latitude, longitude
FROM locations INNER JOIN users ON locations.userid = users.id
WHERE country = ? AND date_part('year', age(birthdate)) > 18
AND (date_part('year', to_date(timestamp, 'YYYY-MM-DD')) BETWEEN 2010 AND 2019)
AND (users.firstname_p & '2') != 0
AND (locations.timestamp_p & '2') != 0
AND (users.country_p & '2') != 0 AND (locations.longitude_p & '2') != 0
AND (locations.latitude_p & '2') != 0
AND (users.lastname_p & '2') != 0 AND (users.birthdate_p & '2') != 0
解释分析输出。
Nested Loop (cost=0.00..13782.09 rows=12 width=22) (actual time=0.113..421.690 rows=954 loops=1)
Join Filter: (locations.userid = users.id)
Rows Removed by Join Filter: 98382
-> Seq Scan on users (cost=0.00..8.25 rows=1 width=18) (actual time=0.062..0.087 rows=2 loops=1)
Filter: ((country = 'Colombia'::text) AND ((country_p & 2) <> 0) AND ((birthdate_p & 2) <> 0) AND ((lastname_p & 2) <> 0) AND ((firstname_p & 2) <> 0) AND (date_part('year'::text, age((('now'::cstring)::date)::timestamp with time zone, (to_date(birthdate, 'YYYY-MM-DD'::text))::timestamp with time zone)) > '18'::double precision))
Rows Removed by Filter: 98
-> Seq Scan on locations (cost=0.00..13758.45 rows=1231 width=12) (actual time=0.018..207.065 rows=49668 loops=2)
Filter: (((latitude_p & 2) <> 0) AND ((longitude_p & 2) <> 0) AND ((timestamp_p & 2) <> 0) AND (date_part('year'::text, (to_date("timestamp", 'YYYY-MM-DD'::text))::timestamp without time zone) >= '2010'::double precision) AND (date_part('year'::text, (to_date("timestamp", 'YYYY-MM-DD'::text))::timestamp without time zone) <= '2019'::double precision))
Rows Removed by Filter: 200332
Planning time: 0.811 ms
Execution time: 421.820 ms
date_part('year', age(birthdate)) > 18
(date_part('year', to_date(timestamp, 'YYYY-MM-DD')) BETWEEN 2010 AND 2019)
第一个是 "正确的 "吗? 它似乎忽略了月份和日期。 考虑取curdate,减去18年,然后比较一下。birthdate
. 见 https:/en.wikipedia.orgwikiSargable。
第二个应该完全不需要转换时间戳。 而是只要有与之兼容的日期范围字样即可。
为什么要用 & 2
? 这闻起来像个性能杀手。
那就考虑一下这些综合指数吧。
(country, birthdate)
(country, timestamp)