单个查询中多次调用 array_agg()

问题描述 投票:0回答:2

我试图用我的查询完成一些事情,但它并没有真正起作用。我的应用程序曾经有一个 mongo 数据库,因此该应用程序用于获取字段中的数组,现在我们必须更改为 Postgres,我不想更改我的应用程序代码以保持 v1 工作。

为了获取 Postgres 中 1 个字段中的数组,我使用了

array_agg()
函数。到目前为止效果很好。但是,我现在需要另一个不同表的字段中的另一个数组。

例如:

我有我的员工。员工有多个地址并有多个工作日。

SELECT name, age, array_agg(ad.street) FROM employees e 
JOIN address ad ON e.id = ad.employeeid
GROUP BY name, age

现在这对我来说效果很好,这会导致例如:

| name  | age| array_agg(ad.street)
| peter | 25 | {1st street, 2nd street}|

现在我想在工作日加入另一张桌子,所以我这样做了:

SELECT name, age, array_agg(ad.street), arrag_agg(wd.day) FROM employees e 
JOIN address ad ON e.id = ad.employeeid 
JOIN workingdays wd ON e.id = wd.employeeid
GROUP BY name, age

这会导致:

| peter | 25 | {1st street, 1st street, 1st street, 1st street, 1st street, 2nd street, 2nd street, 2nd street, 2nd street, 2nd street}| "{Monday,Tuesday,Wednesday,Thursday,Friday,Monday,Tuesday,Wednesday,Thursday,Friday}

但我需要它的结果:

| peter | 25 | {1st street, 2nd street}| {Monday,Tuesday,Wednesday,Thursday,Friday}

我知道这与我的联接有关,因为多个行的多个联接,但我不知道如何实现这一点,任何人都可以给我正确的提示吗?

sql arrays postgresql aggregate-functions
2个回答
36
投票

DISTINCT
通常用于修复从内部腐烂的查询,这通常是昂贵的和/或不正确的。不要一开始就将行相乘,这样就不必在最后折叠不需要的重复项。

连接多个 n 表(“有很多”)会增加结果集中的行。这实际上是一个

CROSS JOIN
笛卡尔积代理。参见:

有多种方法可以避免这种错误。

先聚合,后加入

从技术上讲,只要您在聚合之前连接到一个具有多行的表,查询就可以工作:

SELECT e.id, e.name, e.age, e.streets, array_agg(wd.day) AS days
FROM  (
   SELECT e.id, e.name, e.age, array_agg(ad.street) AS streets
   FROM   employees e 
   JOIN   address  ad ON ad.employeeid = e.id
   GROUP  BY e.id  -- PK covers whole row
   ) e
JOIN   workingdays wd ON wd.employeeid = e.id
GROUP  BY e.id, e.name, e.age;

最好包含主键

id
GROUP BY
,因为
name
age
不一定是唯一的。否则你可能会错误地合并员工。

但是更好地在连接之前在子查询中聚合,这在没有选择性WHERE

条件的情况下是优越的
employees

SELECT e.id, e.name, e.age, ad.streets, array_agg(wd.day) AS days FROM employees e JOIN ( SELECT employeeid, array_agg(ad.street) AS streets FROM address GROUP BY 1 ) ad ON ad.employeeid = e.id JOIN workingdays wd ON e.id = wd.employeeid GROUP BY e.id, ad.streets;
或合计两者:

SELECT name, age, ad.streets, wd.days FROM employees e JOIN ( SELECT employeeid, array_agg(ad.street) AS streets FROM address GROUP BY 1 ) ad ON ad.employeeid = e.id JOIN ( SELECT employeeid, array_agg(wd.day) AS days FROM workingdays GROUP BY 1 ) wd ON wd.employeeid = e.id;
如果您检索基表中的全部或大部分

行,最后一行通常会更快。

请注意,使用

JOIN

 而不是 
LEFT JOIN
 会从结果中删除 
address
 中没有行或 
workingdays 中没有行的员工。这可能是有意的,也可能不是。切换到 LEFT JOIN
 以在结果中保留 
所有
员工。 相关子查询 / JOIN LATERAL

对于
employees

上的

选择性过滤器,请考虑相关子查询:

SELECT name, age
    , (SELECT array_agg(street) FROM address WHERE employeeid = e.id) AS streets
    , (SELECT array_agg(day) FROM workingdays WHERE employeeid = e.id) AS days
FROM   employees e
WHERE  e.namer = 'peter';  -- very selective

LATERAL
子查询:

SELECT e.name, e.age, a.streets, w.days
FROM   employees e
CROSS  JOIN LATERAL (
   SELECT ARRAY(
      SELECT street
      FROM   address
      WHERE  employeeid = e.id
      )
   ) a(streets)
CROSS  JOIN LATERAL (
   SELECT ARRAY(
      SELECT day
      FROM   workingdays
      WHERE  employeeid = e.id
      )
   ) w(days)
WHERE  e.name = 'peter';  -- very selective

参见:

PostgreSQL 中的 LATERAL JOIN 和子查询有什么区别?
  • 为什么 array_agg() 比非聚合 ARRAY() 构造函数慢?
  • 最后两个查询在结果中保留
  • 所有
符合条件的员工。

每当您需要不重复的值时,请使用 DISTINCT,如下所示:

2
投票
SELECT name, age, array_agg(DISTINCT ad.street), array_agg(DISTINCT wd.day) FROM employees e JOIN address ad ON e.id = ad.employeeid JOIN workingdays wd ON e.id = wd.employeeid GROUP BY name, age


© www.soinside.com 2019 - 2024. All rights reserved.