我有一些带有 ID 的数据,我需要将其识别为相关的。我无法识别亲戚的亲戚(以及亲戚的亲戚等等......)。
*部分问题在于数据实际上不是分层的。根据各个字段匹配(驾驶执照#、ssn 等),它们被识别为相关。我布置了数据,以便 Parent 始终是较低的 ID #,这样我们就不会在递归中遇到任何无限循环。
样本数据:
drop table if exists #AllSimilarMinParent
create table #AllSimilarMinParent (ParentId bigint, ChildId bigint)
insert into #AllSimilarMinParent (ParentId, ChildId) values (10, 20)
insert into #AllSimilarMinParent (ParentId, ChildId) values (20, 30)
insert into #AllSimilarMinParent (ParentId, ChildId) values (30, 40)
insert into #AllSimilarMinParent (ParentId, ChildId) values (40, 50)
insert into #AllSimilarMinParent (ParentId, ChildId) values (39, 40)
insert into #AllSimilarMinParent (ParentId, ChildId) values (39, 51)
insert into #AllSimilarMinParent (ParentId, ChildId) values (49, 51)
insert into #AllSimilarMinParent (ParentId, ChildId) values (49, 61)
insert into #AllSimilarMinParent (ParentId, ChildId) values (59, 61)
insert into #AllSimilarMinParent (ParentId, ChildId) values (59, 71)
我正在使用递归 CTE 来获取每个 ID 的层次结构:
WITH RelationHierarchy as (
SELECT ChildId, ParentId
FROM #AllSimilarMinParent MP
)
, RCTE AS
(
--recursive CTE to generate hierarchy
SELECT ParentId, ChildId, 1 AS Lvl FROM RelationHierarchy
UNION ALL
SELECT rh.ParentId, rc.ChildId, Lvl+1 AS Lvl
FROM RelationHierarchy rh
INNER JOIN RCTE rc ON rh.ChildId = rc.ParentId
)
select F0.ParentId, F0.ChildId, F0.Lvl FROM RCTE F0 ORDER BY ChildId, ParentId
返回以下内容:
家长ID | 孩子ID | Lvl |
---|---|---|
10 | 20 | 1 |
10 | 30 | 2 |
20 | 30 | 1 |
10 | 40 | 3 |
20 | 40 | 2 |
30 | 40 | 1 |
39 | 40 | 1 |
10 | 50 | 4 |
20 | 50 | 3 |
30 | 50 | 2 |
39 | 50 | 2 |
40 | 50 | 1 |
39 | 51 | 1 |
49 | 51 | 1 |
49 | 61 | 1 |
59 | 61 | 1 |
59 | 71 | 1 |
我也尝试加入 CTE 以获取所有亲戚的亲戚:
select F0.ParentId, F0.ChildId, F0.Lvl FROM RCTE F0
UNION
select F1.ChildId, F2.ChildId, -1 FROM RCTE F1 JOIN RCTE F2 ON F1.ParentId = F2.ParentId AND F1.ChildId <> F2.ChildId
UNION
select F2.ParentId, F1.ParentId, -2 FROM RCTE F1 JOIN RCTE F2 ON F1.ChildId = F2.ChildId AND F1.ParentId <> F2.ParentId
..这在一定程度上有所帮助,但我仍然没有得到我正在寻找的每一种关系。例如,我无法将 10 与 71 相关。这个特定数据的关联方式,似乎我必须多次向上(到父级)和向下(到子级)才能从:
71 至 59
59 降至 61
61 至 49
49 降至 51
51 至 39
39 降至 40
40 至 30
30 至 20
20 至 10
我考虑过在另一个递归 CTE 中使用第一个递归 CTE 的结果,但事实上它必须随机向上和向下(即向上然后向下,而不是始终如一的一个方向或另一个方向)让我难住了。
有什么想法可以明确返回相关记录的每个排列吗?该特定数据集中的每个 ID 都应与其他每个 ID 相关。
*编辑-下面是所需的输出,显示每个 ID 彼此相关
家长ID | 孩子ID |
---|---|
10 | 20 |
10 | 30 |
10 | 39 |
10 | 40 |
10 | 49 |
10 | 50 |
10 | 51 |
10 | 59 |
10 | 61 |
10 | 71 |
20 | 30 |
20 | 39 |
20 | 40 |
20 | 49 |
20 | 50 |
20 | 51 |
20 | 59 |
20 | 61 |
20 | 71 |
30 | 39 |
30 | 40 |
30 | 49 |
30 | 50 |
30 | 51 |
30 | 59 |
30 | 61 |
30 | 71 |
39 | 40 |
39 | 49 |
39 | 50 |
39 | 51 |
39 | 59 |
39 | 61 |
39 | 71 |
40 | 49 |
40 | 50 |
40 | 51 |
40 | 59 |
40 | 61 |
40 | 71 |
49 | 50 |
49 | 51 |
49 | 59 |
49 | 61 |
49 | 71 |
50 | 51 |
50 | 59 |
50 | 61 |
50 | 71 |
51 | 59 |
51 | 61 |
51 | 71 |
59 | 61 |
59 | 71 |
61 | 71 |
我能够调整 siggemannen 上面发布的答案:
..它确实给出了预期的结果。然而,性能却行不通。对于我的样本数据中的 11 条记录,花费了 11 秒。我又添加了 11 条记录,这些记录与我最初的 11 条记录没有任何关系,查询花费了超过一分钟的时间。当我添加 1 个与第一组中的一条记录相关的记录到第二组中的一条记录时(意味着所有 22 个 ID 现在都以某种方式相互关联),查询已经运行了半个多小时。在我的现实世界应用程序中,我预计需要处理数万条记录。
(有效但速度缓慢)解决方案:
--create relations
drop table if exists #AllSimilarMinParent
create table #AllSimilarMinParent (ParentId bigint, ChildId bigint)
insert into #AllSimilarMinParent (ParentId, ChildId) values (10, 20)
insert into #AllSimilarMinParent (ParentId, ChildId) values (20, 30)
insert into #AllSimilarMinParent (ParentId, ChildId) values (30, 40)
insert into #AllSimilarMinParent (ParentId, ChildId) values (40, 50)
insert into #AllSimilarMinParent (ParentId, ChildId) values (39, 40)
insert into #AllSimilarMinParent (ParentId, ChildId) values (39, 51)
insert into #AllSimilarMinParent (ParentId, ChildId) values (49, 51)
insert into #AllSimilarMinParent (ParentId, ChildId) values (49, 61)
insert into #AllSimilarMinParent (ParentId, ChildId) values (59, 61)
insert into #AllSimilarMinParent (ParentId, ChildId) values (59, 71)
--build recursive CTE and put results into temp table
drop table if exists #RCTE
;
WITH RelationHierarchy as (
SELECT ChildId, ParentId
FROM #AllSimilarMinParent MP
)
, RCTE AS
(
--recursive CTE to generate hierarchy
SELECT ParentId, ChildId, 1 AS Lvl FROM RelationHierarchy
UNION ALL
SELECT rh.ParentId, rc.ChildId, Lvl+1 AS Lvl
FROM RelationHierarchy rh
INNER JOIN RCTE rc ON rh.ChildId = rc.ParentId
)
select F0.ParentId, F0.ChildId, F0.Lvl
INTO #RCTE
FROM RCTE F0
select * FROM #RCTE
--create all combinations (both directions)
drop table if exists #mytable2
create table #mytable2 (id int, groupid int)
insert into #mytable2 (id, groupid)
select ParentId, ChildId FROM #RCTE
UNION
select ChildId, ParentId FROM #RCTE
UNION
select ParentId, ParentId FROM #RCTE --also needed every ID to belog to itself as a group
UNION
select ChildId, ChildId FROM #RCTE --also needed every ID to belog to itself as a group
--suggested solution adapted from https://stackoverflow.com/questions/76272634/how-can-i-combine-group-identifiers-into-single-group?answertab=scoredesc#tab-top
;with
uniquenodes as (select distinct id from #mytable2)
, nodes as (
select t.id, v.grp
from uniquenodes t
cross apply ( select groupid from #mytable2 t1 where t1.id = t.id ) v(grp)
)
,
edges as (
select distinct n1.id as id1, n2.id as id2
from nodes n1
inner join nodes n2 on n1.grp = n2.grp
)
,
rec as (
select id1, id2, cast(id1 as nvarchar(max)) as visited from edges
union all
select r.id1, e.id2, concat(r.visited, ',', e.id2)
from rec r
inner join edges e on e.id1 = r.id2
where concat(',', r.visited, ',') not like concat('%,', e.id2, ',%')
)
,
fin as (
select id1, min(value) min_id
from rec r
cross apply string_split(r.visited, ',')
group by id1
)
select id1 as id, dense_rank() over(order by min_id) grp
from fin f
它把所有11个ID都放在第1组中,这确实满足了我的需求。初步测试表明,操作 ID 可以达到预期的结果(如果我孤立 ID 或添加仅彼此相关的新 ID,则会创建第二组),但由于性能问题,我还没有进行过多的测试。