我写了查询
select * from table
where exists (select 1 from table1 where table.column = table1.column)
如果我将其更改为
select * from table
where exists (select 1 from table1 where table.column = table1.column limit 1)
它改变了逻辑吗?
我问是因为计划查询的成本已更改(17000 -> 2400)。我使用 Postgres 9.4
更新: 解释分析两个查询的详细信息
explain (analyze, verbose)
select * from sr_srv_rendered r
where exists (select 1 from sr_res_group rg where rg.id = r.res_group_id and rg.responsible_id = 1)
limit 30
"Limit (cost=62.06..74.63 rows=30 width=157) (actual time=0.017..0.017 rows=0 loops=1)"
" Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_id, r.price_ (...)"
" -> Nested Loop (cost=62.06..287707.96 rows=686607 width=157) (actual time=0.017..0.017 rows=0 loops=1)"
" Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_id, r. (...)"
" -> Bitmap Heap Scan on public.sr_res_group rg (cost=61.62..10093.63 rows=2734 width=4) (actual time=0.017..0.017 rows=0 loops=1)"
" Output: rg.id, rg.bdate, rg.edate, rg.is_system, rg.name, rg.department_id, rg.org_id, rg.responsible_id, rg.is_available_in_electronic_queue, rg.label_id, rg.ignore_regclinic_check, rg.note, rg.blocked, rg.block_comment, rg.template_res_grou (...)"
" Recheck Cond: (rg.responsible_id = 1)"
" -> Bitmap Index Scan on responsible_fk (cost=0.00..60.94 rows=2734 width=0) (actual time=0.015..0.015 rows=0 loops=1)"
" Index Cond: (rg.responsible_id = 1)"
" -> Index Scan using fkb95967dd9f6b119a on public.sr_srv_rendered r (cost=0.43..99.03 rows=251 width=157) (never executed)"
" Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_ (...)"
" Index Cond: (r.res_group_id = rg.id)"
"Planning time: 0.931 ms"
"Execution time: 0.355 ms"
explain (analyze, verbose)
select * from sr_srv_rendered r
where exists (select 1 from sr_res_group rg where rg.id = r.res_group_id and rg.responsible_id = 1 limit 1)
limit 30
"Limit (cost=0.00..509.03 rows=30 width=157) (actual time=49392.352..49392.352 rows=0 loops=1)"
" Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_id, r.price_ (...)"
" -> Seq Scan on public.sr_srv_rendered r (cost=0.00..100177996.03 rows=5904050 width=157) (actual time=49392.340..49392.340 rows=0 loops=1)"
" Output: r.id, r.bdate, r.comment, r.cost, r.duration, r.edate, r.is_rendered, r.quantity, r.total_cost, r.contract_id, r.customer_id, r.funding_id, r.res_group_id, r.service_id, r.duration_measure_unit_id, r.begin_time, r.prototype_id, r.org_id, r. (...)"
" Filter: (SubPlan 1)"
" Rows Removed by Filter: 11062881"
" SubPlan 1"
" -> Limit (cost=0.43..8.46 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=11062881)"
" Output: (1)"
" -> Index Scan using sr_res_group_pk on public.sr_res_group rg (cost=0.43..8.46 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=11062881)"
" Output: 1"
" Index Cond: (rg.id = r.res_group_id)"
" Filter: (rg.responsible_id = 1)"
" Rows Removed by Filter: 1"
"Planning time: 0.694 ms"
"Execution time: 49392.495 ms"
无需更快构建参数即可解释
基于这些结果,高层逻辑没有改变(两种情况下都返回相同的空集),但计划发生了改变,导致了很大的性能差异。
似乎正在发生的事情是 PostgreSQL 理解并乐意将第一种情况(EXISTS 内没有 LIMIT)转换为嵌套循环连接,而在第二种情况(EXISTS 内有 LIMIT),PostgreSQL 不会知道如何将其转换为联接(由于限制)并使用简单的方法实现它 - 对表进行顺序扫描并为每一行运行子查询。
PostgreSQL 理解 EXISTS 的工作原理,并且知道它只需要查找一行,添加“LIMIT 1”是不必要的,在这种情况下,实际上最终是有害的。
PostgreSQL 可能会得到改进,认识到 EXISTS 内的 LIMIT 1 只是噪音,应该没有意义,但这会增加计划查询所需的时间,而且目前还不清楚这样的时间是否合适花了。
select EXISTS(select * from test LIMIT 1) //~50ms
select EXISTS(select * from test) //~50ms
select * from test //~2000ms
限制不影响时间。
(“test”有 10,000,000 个由“generate_series(0,10000000)”生成的整数)