PostgreSQL 12可以在执行时通过子查询返回列表来对分区进行修剪吗?

问题描述 投票:3回答:2

我试图在一种情况下利用分区的优势:我有表“事件”,该表按字段“ dt_pk”按列表划分,该字段是表“日期”的外键。

-- Schema
drop schema if exists test cascade;
create schema test;

-- Tables
create table if not exists test.dates (
  id bigint primary key,
  dt date   not null
);

create sequence test.seq_events_id;

create table if not exists test.events
(
  id          bigint  not null,
  dt_pk       bigint  not null, 
  content_int bigint,

  foreign key (dt_pk) references test.dates(id) on delete cascade,
  primary key (dt_pk, id)
)
partition by list (dt_pk);

-- Partitions
create table test.events_1 partition of test.events for values in (1);
create table test.events_2 partition of test.events for values in (2);
create table test.events_3 partition of test.events for values in (3);

-- Fill tables
insert into test.dates (id, dt)
select id, dt
from (
  select 1 id, '2020-01-01'::date as dt
union all
  select 2 id, '2020-01-02'::date as dt
union all
  select 3 id, '2020-01-03'::date as dt
) t;

do $$
declare
  dts record;
begin  
  for dts in (
    select id
    from test.dates
  ) loop
    for k in 1..10000 loop    
      insert into test.events (id, dt_pk, content_int)
      values (nextval('test.seq_events_id'), dts.id, random_between(1, 1000000));
    end loop;
    commit;
  end loop;
end;
$$;

vacuum analyze test.dates, test.events;

我想像这样运行选择:

select *
from test.events e
  join test.dates d on e.dt_pk = d.id
where d.dt between '2020-01-02'::date and '2020-01-03'::date;

但是在这种情况下,分区修剪不起作用。很明显,分区键没有常量。但是从documentation中,我知道执行时会进行分区修剪,它适用于从子查询获得的值:

分区修剪不仅可以在计划给定查询,也可以在执行期间。这很有用当子句包含表达式时,允许修剪更多分区其值在查询计划时未知,例如,在PREPARE语句中定义的参数,使用从子查询,或在内部使用参数化值嵌套循环联接。

所以我这样重写查询,并希望在修剪中进行分区:

select *
from test.events e
where e.dt_pk in (
  select d.id
  from test.dates d
  where d.dt between '2020-01-02'::date and '2020-01-03'::date
);

但是此选择的explain说:

Hash Join  (cost=1.07..833.07 rows=20000 width=24) (actual time=3.581..15.989 rows=20000 loops=1)
  Hash Cond: (e.dt_pk = d.id)
  ->  Append  (cost=0.00..642.00 rows=30000 width=24) (actual time=0.005..6.361 rows=30000 loops=1)
        ->  Seq Scan on events_1 e  (cost=0.00..164.00 rows=10000 width=24) (actual time=0.005..1.104 rows=10000 loops=1)
        ->  Seq Scan on events_2 e_1  (cost=0.00..164.00 rows=10000 width=24) (actual time=0.005..1.127 rows=10000 loops=1)
        ->  Seq Scan on events_3 e_2  (cost=0.00..164.00 rows=10000 width=24) (actual time=0.008..1.097 rows=10000 loops=1)
  ->  Hash  (cost=1.04..1.04 rows=2 width=8) (actual time=0.006..0.006 rows=2 loops=1)
        Buckets: 1024  Batches: 1  Memory Usage: 9kB
        ->  Seq Scan on dates d  (cost=0.00..1.04 rows=2 width=8) (actual time=0.004..0.004 rows=2 loops=1)
              Filter: ((dt >= '2020-01-02'::date) AND (dt <= '2020-01-03'::date))
              Rows Removed by Filter: 1
Planning Time: 0.206 ms
Execution Time: 17.237 ms

因此,我们读取了所有分区。我什至试图让计划者使用嵌套循环连接,因为我在文档中读到了“ 嵌套循环连接内侧的参数化值”,但是它没有用:

set enable_hashjoin to off;
set enable_mergejoin to off;

再次:

Nested Loop  (cost=0.00..1443.05 rows=20000 width=24) (actual time=9.160..25.252 rows=20000 loops=1)
  Join Filter: (e.dt_pk = d.id)
  Rows Removed by Join Filter: 30000
  ->  Append  (cost=0.00..642.00 rows=30000 width=24) (actual time=0.008..6.280 rows=30000 loops=1)
        ->  Seq Scan on events_1 e  (cost=0.00..164.00 rows=10000 width=24) (actual time=0.008..1.105 rows=10000 loops=1)
        ->  Seq Scan on events_2 e_1  (cost=0.00..164.00 rows=10000 width=24) (actual time=0.008..1.047 rows=10000 loops=1)
        ->  Seq Scan on events_3 e_2  (cost=0.00..164.00 rows=10000 width=24) (actual time=0.007..1.082 rows=10000 loops=1)
  ->  Materialize  (cost=0.00..1.05 rows=2 width=8) (actual time=0.000..0.000 rows=2 loops=30000)
        ->  Seq Scan on dates d  (cost=0.00..1.04 rows=2 width=8) (actual time=0.004..0.004 rows=2 loops=1)
              Filter: ((dt >= '2020-01-02'::date) AND (dt <= '2020-01-03'::date))
              Rows Removed by Filter: 1
Planning Time: 0.202 ms
Execution Time: 26.516 ms

然后,我注意到在“执行时分区修剪”的每个示例中,我仅看到=条件,而不是in。它确实可以这样工作:

explain (analyze) select * from test.events e where e.dt_pk = (select id from test.dates where id = 2);

Append  (cost=1.04..718.04 rows=30000 width=24) (actual time=0.014..3.018 rows=10000 loops=1)
  InitPlan 1 (returns $0)
    ->  Seq Scan on dates  (cost=0.00..1.04 rows=1 width=8) (actual time=0.007..0.008 rows=1 loops=1)
          Filter: (id = 2)
          Rows Removed by Filter: 2
  ->  Seq Scan on events_1 e  (cost=0.00..189.00 rows=10000 width=24) (never executed)
        Filter: (dt_pk = $0)
  ->  Seq Scan on events_2 e_1  (cost=0.00..189.00 rows=10000 width=24) (actual time=0.004..2.009 rows=10000 loops=1)
        Filter: (dt_pk = $0)
  ->  Seq Scan on events_3 e_2  (cost=0.00..189.00 rows=10000 width=24) (never executed)
        Filter: (dt_pk = $0)
Planning Time: 0.135 ms
Execution Time: 3.639 ms

这是我的最后一个问题:执行时分区修剪是否仅适用于子查询返回一个项目,或者有一种方法可以利用子查询返回列表来获得分区修剪的优势?

并且为什么它不能与嵌套循环连接一起使用,我在语言上理解不对吗:

这包括子查询的值和执行时的值参数,例如来自参数化嵌套循环联接的参数。

或“ 参数化嵌套循环连接”与常规嵌套循环连接有所不同吗?

postgresql partitioning database-partitioning pruning
2个回答
1
投票

嵌套循环联接中没有分区修剪,因为分区表位于外侧,始终会被完全扫描。使用来自外部的连接键作为参数扫描内部(因此进行参数化扫描),因此,如果分区表位于嵌套循环连接的内部,则可能会发生分区修剪。


0
投票

[如果您不关心细节,更关心它是否可以正常工作,并且还没有尝试过:您可以将查询重写为类似的内容

© www.soinside.com 2019 - 2024. All rights reserved.