我在 2012 年通过 通过 2 个父/子表的 PostgreSQL Recursive 创建用于植物育种的 Postgres Pedigree 流程时得到了帮助。 谱系父/子层次结构由科/植物 ID 定义。每个植物通过“id_family”外键“链接”到族表。具有根级亲本的植物的 ID 为 1,映射到“NA”,并且 Family is_root 值设置为“Y”。 问题是您无法确定路径输出中实际的子女死者。
父/子映射用例: 通过关联映射回前一个家族的 ptst_plant id_family 值来确定 F2+ 父级(非根级别)与子级的关系,然后为子级植物添加 @ 字符前缀,以表明它是后代植物。
通过 ptst-pedigree.sql 进程的父/子映射工作流程:
下面的示例带有所需的@字符植物前缀:
F1族1AA=(f1A x m2A) >F2族3AE=(@f7A x m1E) >F3族5AEAG=(@f1AE x m1AG)
F1族2AA=(f3A x m4A) >F2族4AG=(@f8A x m1G) >F3族5AEAG=(f1AE x @m1AG)
以下是使用Postgresql 16.6的测试表/数据:
DROP TABLE if exists ptst_family CASCADE;
DROP TABLE if exists ptst_plant CASCADE;
DROP TABLE if exists ptst_pedigree CASCADE;
CREATE TABLE ptst_family (
id serial,
family_key VARCHAR(20) UNIQUE,
female_plant_id INTEGER NOT NULL DEFAULT 1,
male_plant_id INTEGER NOT NULL DEFAULT 1,
is_root VARCHAR NOT NULL DEFAULT '0', -- Root level familes are always the first level pedigree (F1)
CONSTRAINT ptst_family_pk PRIMARY KEY (id)
);
CREATE TABLE ptst_plant (
id serial,
plant_key VARCHAR(20) UNIQUE,
id_family INTEGER NOT NULL,
CONSTRAINT ptst_plant_pk PRIMARY KEY (id),
CONSTRAINT ptst_plant_id_family_fk FOREIGN KEY(id_family) REFERENCES ptst_family(id)
);
CREATE TABLE ptst_pedigree (
id serial,
pedigree_key VARCHAR NOT NULL,
path VARCHAR NOT NULL UNIQUE
);
-- FAMILY Table DATA:
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (1,'NA',1,1,'Y'); -- Default place holder record
-- F1 Root level Alba families
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (2,'family1AA',2,3,'Y');
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (3,'family2AA',4,5,'Y');
-- F2 Hybrid Families
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (5,'family3AE',6,8,'N');
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (6,'family4AG',7,9,'N');
-- F3 Double Hybrid family:
insert into ptst_family (id, family_key, female_plant_id, male_plant_id, is_root) VALUES (9,'family5AEAG',10,11,'N');
-- PLANT Table DATA:
insert into ptst_plant (id, plant_key, id_family) VALUES (1,'NA',1); -- Default place holder record
insert into ptst_plant (id, plant_key, id_family) VALUES (2,'f1A',1);
insert into ptst_plant (id, plant_key, id_family) VALUES (3,'m2A',1);
insert into ptst_plant (id, plant_key, id_family) VALUES (4,'f3A',1);
insert into ptst_plant (id, plant_key, id_family) VALUES (5,'m4A',1);
-- Female Alba progeny:
insert into ptst_plant (id, plant_key, id_family) VALUES (6,'f7A',2);
insert into ptst_plant (id, plant_key, id_family) VALUES (7,'f8A',3);
-- Male/female Aspen Root level parents:
insert into ptst_plant (id, plant_key, id_family) VALUES (8,'m1E',1);
insert into ptst_plant (id, plant_key, id_family) VALUES (9,'m1G',1);
-- F1 Hybrid progeny:
insert into ptst_plant (id, plant_key, id_family) VALUES (10,'f1AE',5);
insert into ptst_plant (id, plant_key, id_family) VALUES (11,'m1AG',6);
下面是2012年开发的pedigree.sql脚本:
WITH RECURSIVE expanded_family AS (
SELECT
f.id,
f.family_key,
pf.id_family pf_family,
pm.id_family pm_family,
f.is_root,
f.family_key || '=(' || pf.plant_key || ' x ' || pm.plant_key || ')' pretty_print
FROM ptst_family f
JOIN ptst_plant pf ON f.female_plant_id = pf.id
JOIN ptst_plant pm ON f.male_plant_id = pm.id
),
search_tree AS
(
SELECT
f.id,
f.family_key,
f.id family_root,
1 depth,
'>F1 ' || f.pretty_print path
FROM expanded_family f
WHERE
f.id != 1
AND f.is_root = 'Y'
UNION ALL
SELECT
f.id,
f.family_key,
st.family_root,
st.depth + 1,
st.path || ' >F' || st.depth+1 || ' ' || f.pretty_print
FROM search_tree st
JOIN expanded_family f
ON f.pf_family = st.id
OR f.pm_family = st.id
WHERE
f.id <> 1
)
SELECT
family_key,
path
FROM
(
SELECT
family_key,
rank() over (partition by family_root order by depth desc),
path
FROM search_tree
) AS ranked
-- WHERE rank = 1
WHERE path NOT LIKE '%(N/A x N/A)%' -- Remove rows with no filial output
ORDER BY family_key, path
下面是我的谱系表所需的输出,其中带有“@”前缀的子嗣:
ID | Pedigree_key | Path
1 family2AA >F1 family2AA=(f3A x m4A)
2 family3AE >F1 family1AA=(f1A x m2A) >F2 family3AE=(@f7A x m1E)
3 family4AG >F1 family2AA=(f3A x m4A) >F2 family4AG=(@f8A x m1G)
4 family5AEAG >F1 family1AA=(f1A x m2A) >F2 family3AE=(@f7A x m1E) >F3 family5AEAG=(@f1AE x m1AG)
5 family5AEAG >F1 family2AA=(f3A x m4A) >F2 family4AG=(@f8A x m1G) >F3 family5AEAG=(f1AE x @m1AG)
我发现我可以通过单独的更新脚本来解决此问题,通过使用我的植物视图搜索 F2+ 家族来更改系谱表路径字符串,找到/验证亲本植物,然后在后代子代前面添加“@”字符。 这是一个有价值的练习,因为它迫使我深入挖掘,从不同的角度理解问题,然后用一个简单的 POC 来证明它,如下例所示:
Pedigree Path: >F1 41XAA91=(A10 x A73) >F2 99XAA10=(30AA5MF x AA4102)
psql r4p -c "select plant_key, family_key from avw_plant where family_key = '41XAA91' and plant_key = 'AA4102';"
Updated Path: >F1 41XAA91=(A10 x A73) >F2 99XAA10=(30AA5MF x @AA4102)