我正在识别基于多列的重复项,但我发现有些记录没有所有数据都是我重复的标准 - 比如dob,age,gender。所以我想通过dob进行分区,但如果它的null或不匹配,则按年龄划分,如果为null或不匹配,则按性别划分。这可能吗?
SELECT ID, V1, V2, V3, V4, CreatedDate
FROM (
SELECT T1.ID, V1, V2, V3, V4, CreatedDate,
COUNT(*)
OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct,
COUNT( CASE CreatedDate WHEN DATE '2017-08-01' THEN 1 END )
OVER ( PARTITION BY V1, V2, V3, V4 ) AS ct_date_match
FROM T1
INNER JOIN T2
ON ( T1.ID = T2.ID )
INNER JOIN T3
ON ( T1.ID = T3.ID )
)
WHERE ct > 1
AND ct_date_match > 0
如果我修改我的分区语句如下,它会工作吗?
(PARTITION BY V1, V2, V3, V4
(case when dob is null then age end),
(case when age is null then gender_id end))
@mathguy是对的,如果你刚试过它,你本可以节省一些时间。
它将工作,使用coalesce函数并确保所有coalesce函数参数具有相同的类型。以下是使用int,varchar,date和float的示例:
drop table deleteme_tbl;
create table deleteme_tbl ( a int not null, b varchar2(5) , c date, d float(6), e varchar2(20));
insert into deleteme_tbl(a,b,c,d,e) values( 1, 'B', date '2017-12-01', 1.55, 'First Record');
insert into deleteme_tbl(a,b,c,d,e) values( 2, null, date '2017-12-02', 2.55, 'Second Record');
insert into deleteme_tbl(a,b,c,d,e) values( 3, 'B', null, 1.55, 'Third Record');
insert into deleteme_tbl(a,b,c,d,e) values( 4, 'B',null, null, 'Fourth Record');
insert into deleteme_tbl(a,b,c,d,e) values( 5, 'B', date '2017-12-01', 1.55, 'Fifth Record');
commit;
SELECT a.*
, COUNT (*)
OVER (PARTITION BY COALESCE (
TO_CHAR (a)
, b
, TO_CHAR (c, 'YYYYMMDD')
, TO_CHAR (d)
))
cnt
FROM deleteme_tbl a;
这导致:
A B C D E CNT
1 B 12/1/2017 1.6 First Record 1
2 12/2/2017 2.6 Second Record 1
3 B 1.6 Third Record 1
4 B Fourth Record 1
5 B 12/1/2017 1.6 Fifth Record 1
不要把不同的case语句放在COALESCE(dob,age,gender)代替DOB,它应该工作。确保将它包含在查询输出中,以防你想看到它们并比较它是否正好是什么你需要