我有一个包含三列的表(使用Oracle 19c),其中两列代表父子关系,另一列代表文本描述。我使用不带
connect by
的 start with
,但后跟 group by
,如果描述列是带有 group by
限制的 not null
列,Oracle 决定不应用 unique
,从而导致由于缺少 start with
子句而重复的行将包含在最终输出中。
create table test_hie (id int, parent int, name varchar2(64) not null);
insert into test_hie (id, parent, name) values (0, null, 'ABC');
insert into test_hie (id, parent, name) values (1, 0, 'DEF');
create unique index test_hie_idx_name on test_hie (name);
alter session set statistics_level = all;
select id from test_hie connect by prior id = parent group by id;
select * from table(dbms_xplan.display_cursor('6pfqf6fg5crck', 0, 'ALLSTATS LAST PROJECTION'));
select id from test_hie connect by prior id = parent group by id, name;
select * from table(dbms_xplan.display_cursor('g56y8n3pubzud', 0, 'ALLSTATS LAST PROJECTION'));
如果没有
group by
或 start with
,行 1
将被包含两次:作为其自己树的根和作为 0
的子级。
要删除重复项,我应用了
group by
。在第一个查询中,由于 str
不包含在 group by
列表中,行 1
仅发生一次,但如果我将 str
包含在 group by
中,那么它会发生两次。
Output of query 1 (str not in the group by)
id
---
1
0
Output of query 2 (str in the group by)
id
---
1
0
1
这种行为是不正确的。如果我按
id
分组,我不应该看到任何重复的 id
。
将
str
添加到列列表不会改变任何内容,也不会使用任何聚合函数或向表添加更多列。只需要 str
成为具有唯一索引的不可空列即可观察不正确的行为。
第一个查询的执行计划;请注意,该计划包含
HASH GROUP BY
和 CONNECT BY WITHOUT FILTERING (UNIQUE)
:
SQL_ID 6pfqf6fg5crck, child number 0
-------------------------------------
select id from test_hie connect by prior id = parent group by id
Plan hash value: 2961238377
----------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
----------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 2 |00:00:00.01 | 7 | | | |
| 1 | HASH GROUP BY | | 1 | 2 | 2 |00:00:00.01 | 7 | 1968K| 1968K| 647K (0)|
|* 2 | CONNECT BY WITHOUT FILTERING (UNIQUE)| | 1 | | 3 |00:00:00.01 | 7 | 2048 | 2048 | 2048 (0)|
| 3 | TABLE ACCESS FULL | TEST_HIE | 1 | 2 | 2 |00:00:00.01 | 7 | | | |
----------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("PARENT"=PRIOR NULL)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (rowset=256) "ID"[NUMBER,22]
2 - "ID"[NUMBER,22], "PARENT"[NUMBER,22], PRIOR NULL[22], LEVEL[4]
3 - "ID"[NUMBER,22], "PARENT"[NUMBER,22]
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
第二次查询的计划;请注意,
group by
操作已消失,并且 CONNECT BY WITHOUT FILTERING
也不再是唯一的:
SQL_ID g56y8n3pubzud, child number 0
-------------------------------------
select id from test_hie connect by prior id = parent group by id, name
Plan hash value: 4109999158
------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers | OMem | 1Mem | Used-Mem |
------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 3 |00:00:00.01 | 7 | | | |
|* 1 | CONNECT BY WITHOUT FILTERING| | 1 | | 3 |00:00:00.01 | 7 | 2048 | 2048 | 2048 (0)|
| 2 | TABLE ACCESS FULL | TEST_HIE | 1 | 2 | 2 |00:00:00.01 | 7 | | | |
------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - access("PARENT"=PRIOR NULL)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - "ID"[NUMBER,22], "PARENT"[NUMBER,22], PRIOR NULL[22], LEVEL[4]
2 - "ID"[NUMBER,22], "PARENT"[NUMBER,22]
Note
-----
- dynamic statistics used: dynamic sampling (level=2)
这是 Oracle 19c 的错误吗?看起来它以某种方式“认为”重复是不可能的(因为非空的唯一列),因此分组是不必要的,忘记了
connect by
是人为地在行集中注入重复的行。
顺便问一下,
unique
在CONNECT BY WITHOUT FILTERING (UNIQUE)
中的真正含义是什么?
是的,这对我来说看起来像是一个错误(我会记录下来)。我怀疑我们对唯一索引的读取“太多”了。我们看到 GROUP BY 'unique col' 并说“酷,我们不需要分组依据”,这可以从 10053 跟踪中看出
QB before group-by removal:******* UNPARSED QUERY IS *******
SELECT "TEST_HIE"."ID" "ID" FROM "SCOTT"."TEST_HIE" "TEST_HIE" CONNECT BY PRIOR "TEST_HIE"."ID"="TEST_HIE"."PARENT" GROUP BY "TEST_HIE"."ID","TEST_HIE"."NAME"
QB before group-by elimination:******* UNPARSED QUERY IS *******
SELECT "TEST_HIE"."ID" "ID" FROM "SCOTT"."TEST_HIE" "TEST_HIE" CONNECT BY PRIOR "TEST_HIE"."ID"="TEST_HIE"."PARENT" GROUP BY "TEST_HIE"."ID","TEST_HIE"."NAME"
Registered qb: SEL$47952E7A 0x78ddb940 (ELIMINATION OF GROUP BY SEL$1; SEL$1)
---------------------
QUERY BLOCK SIGNATURE
---------------------
signature (): qb_name=SEL$47952E7A nbfros=1 flg=0
fro(0): flg=0 objn=189661 hint_alias="TEST_HIE"@"SEL$1"
QB after group-by elimination:******* UNPARSED QUERY IS *******
SELECT "TEST_HIE"."ID" "ID" FROM "SCOTT"."TEST_HIE" "TEST_HIE" CONNECT BY PRIOR "TEST_HIE"."ID"="TEST_HIE"."PARENT"
Registered qb: SEL$9BB7A81A 0x78ddb940 (ELIMINATION OF GROUP BY SEL$47952E7A; SEL$47952E7A)
作为临时措施,您可以添加冗余的 START WITH
SQL> create table test_hie (id int, parent int, name varchar2(64) not null);
Table created.
SQL> insert into test_hie (id, parent, name) values (0, null, 'ABC');
1 row created.
SQL> insert into test_hie (id, parent, name) values (1, 0, 'DEF');
1 row created.
SQL> create unique index test_hie_idx_name on test_hie (name);
Index created.
SQL>
SQL> select id from test_hie connect by prior id = parent group by id;
ID
----------
1
0
SQL> select id from test_hie connect by prior id = parent group by id, name;
ID
----------
1
0
1
SQL> select id from test_hie start with 1=1 connect by prior id = parent group by id, name;
ID
----------
1
0