我有一个表,例如,产品(Id,名称)
身份证 | 姓名 |
---|---|
1 | '一个' |
2 | ‘二’ |
3 | ‘三’ |
我有更新表所需的新数据版本,其中可能存在重复值。例如,
姓名 |
---|
‘一个’ |
‘二’ |
‘二’ |
‘二’ |
‘四’ |
我需要结果表包含与第二个表中的每个“名称”一样多的行,即
身份证 | 姓名 |
---|---|
1 | '一个' |
2 | ‘二’ |
3 | ‘三’ |
4 | ‘二’ |
5 | ‘二’ |
6 | ‘四’ |
因此,如果值(例如“one”)已经存在,我们不需要插入重复项,但是如果第二个表中有更多相同值的行(例如“two”或“four”),我们需要插入重复项分别向表中插入 2 个和 1 个值。
如何使用 SQL 来完成此操作?
我已经尝试过建议插入表中不重复的答案,这不是我的任务。
首先我设置了一些表进行测试:
create temp table Product(id integer primary key generated always as identity, name text);
insert into Product (name) values('one');
insert into Product (name) values('two');
insert into Product (name) values('three');
select * from Product;
create temp table newdata(name text);
insert into newdata values ('one');
insert into newdata values ('two');
insert into newdata values ('two');
insert into newdata values ('two');
insert into newdata values ('four');
select * from newdata;
然后我使用分析函数“row_number()”在结果上创建行号。这只是计算列表中每个名称有多少个实例的方法
select name,row_number() over (partition by name order by name) rn from newdata;
+------+----+
| name | rn |
+------+----+
| four | 1 |
| one | 1 |
| two | 1 |
| two | 2 |
| two | 3 |
+------+----+
对于产品:
select name,row_number() over (partition by name order by name) rn from product
+-------+----+
| name | rn |
+-------+----+
| one | 1 |
| three | 1 |
| two | 1 |
+-------+----+
找到两者之间的差异可以告诉我缺少哪些名字:
select name,row_number() over (partition by name order by name) rn from newdata
except
select name,row_number() over (partition by name order by name) rn from product;
+------+----+
| name | rn |
+------+----+
| two | 3 |
| four | 1 |
| two | 2 |
+------+----+
然后我只需要插入缺少的名字:
insert into product (name)
select name from (
select name,row_number() over (partition by name order by name) rn from newdata
except
select name,row_number() over (partition by name order by name) rn from product
) a;
select * from product
+----+-------+
| id | name |
+----+-------+
| 1 | one |
| 2 | two |
| 3 | three |
| 4 | two |
| 5 | four |
| 6 | two |
+----+-------+