比多个 SELECT 语句更好的方法?

问题描述 投票:0回答:5

我正在创建一个显示饼图的网络应用程序。为了在单个 HTTP 请求中从 PostgreSQL 9.3 数据库获取图表的所有数据,我将多个

SELECT
语句与
UNION ALL
结合起来 — 这是一部分:

SELECT 'spf' as type, COUNT(*)
    FROM (SELECT cai.id
          FROM common_activityinstance cai
          JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
          JOIN common_activitysetting cas ON cas.id = cais.id
          JOIN quizzes_quiz q ON q.id = cai.activity_id
          WHERE cai.end_time::date = '2015-09-12'
          AND q.name != 'Exit Ticket Quiz'
          AND cai.activity_type = 'QZ'
          AND (cas.key = 'disable_student_nav' AND cas.value = 'True'
            OR cas.key = 'pacing' AND cas.value = 'student')
          GROUP BY cai.id
          HAVING COUNT(cai.id) = 2) sub
UNION ALL
SELECT 'spn' as type, COUNT(*)
    FROM common_activityinstance cai
    JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
    JOIN common_activitysetting cas ON cas.id = cais.id
    WHERE cai.end_time::date = '2015-09-12'
    AND cai.activity_type = 'QZ'
    AND cas.key = 'disable_student_nav'
    AND cas.value = 'False'
UNION ALL
SELECT 'tp' as type, COUNT(*)
    FROM (SELECT cai.id 
          FROM common_activityinstance cai
          JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
          JOIN common_activitysetting cas ON cas.id = cais.id
          WHERE cai.end_time::date = '2015-09-12'
          AND cai.activity_type = 'QZ'
          AND cas.key = 'pacing' AND cas.value = 'teacher') sub;

这会产生一个很好的、小的响应,用于发送回客户端:

 type |  count 
------+---------
 spf  |  100153
 spn  |   96402
 tp   |   84211

我想知道我的查询是否可以变得更有效率。每个 SELECT 语句都使用大部分相同的 JOIN 操作。有没有办法不为每个新的 SELECT 重复 JOIN?
实际上我更喜欢单行 3 列。

或者,总的来说,是否有一些完全不同但比我正在做的更好的方法?

sql postgresql select common-table-expression postgresql-performance
5个回答
2
投票

您可以将大部分成本捆绑在 CTE 中的单个主查询中,并多次重复使用结果。
这将返回一个单行三列,以每个

type
命名(按照评论中的要求):

WITH cte AS (
   SELECT cai.id, cai.activity_id, cas.key, cas.value
   FROM   common_activityinstance cai
   JOIN   common_activityinstance_settings s ON s.activityinstance_id = cai.id
   JOIN   common_activitysetting cas ON cas.id = s.id
   WHERE  cai.end_time::date = '2015-09-12'   -- problem?
   AND    cai.activity_type = 'QZ'
   AND   (cas.key = 'disable_student_nav' AND cas.value IN ('True', 'False') OR
          cas.key = 'pacing' AND cas.value IN ('student', 'teacher'))
   )
SELECT *
FROM  (
   SELECT count(*) AS spf
   FROM  (
      SELECT c.id
      FROM   cte c
      JOIN   quizzes_quiz q ON q.id = c.activity_id
      WHERE  q.name <> 'Exit Ticket Quiz'
      AND   (c.key, c.value) IN (('disable_student_nav', 'True')
                               , ('pacing', 'student'))
      GROUP  BY 1
      HAVING count(*) = 2
      ) sub
   ) spf
,  (
   SELECT count(key = 'disable_student_nav' AND value = 'False' OR NULL) AS spn
        , count(key = 'pacing' AND value = 'teacher' OR NULL) AS tp
   FROM   cte
   ) spn_tp;

应该适用于 Postgres 9.3。在 Postgres 9.4 中使用新的聚合

FILTER
子句:

  count(*) FILTER (WHERE key = 'disable_student_nav' AND value = 'False') AS spn
, count(*) FILTER (WHERE key = 'pacing' AND value = 'teacher') AS tp

两种语法变体的详细信息:

标记为

problem?
的条件可能是很大的性能问题,具体取决于
cai.end_time
的数据类型。其一,它不是 sargable。而如果是
timestamptz
类型,则表达式很难索引,因为结果取决于会话的当前时区设置 - 这也会导致在不同时区执行时得到不同的结果。

比较:

您只需命名用于定义您的日期的时区即可。以我在维也纳的时区为例:

WHERE  cai.end_time >= '2015-09-12 0:0'::timestamp AT TIME ZONE 'Europe/Vienna' 
AND    cai.end_time <  '2015-09-13 0:0'::timestamp AT TIME ZONE 'Europe/Vienna'

您也可以提供简单的

timestamptz
值。你甚至可以:

WHERE  cai.end_time >= '2015-09-12'::date
AND    cai.end_time <  '2015-09-12'::date + 1

但第一个变体不依赖于当前时区设置。
详细解释在上面的链接。

现在查询可以使用您的索引,并且如果表中有许多不同的日期,查询应该会快得多。


1
投票

这只是完全不同方法的草图:为您需要的所有条件构造一个布尔“超立方体” 在你的“交叉表”中。选择或聚合子集的逻辑可以稍后完成(例如抑制 exit_tickets,其业务逻辑我不清楚)


SELECT DISTINCT not_exit, disabled, pacing
    , COUNT(*) AS the_count
    FROM (SELECT DISTINCT cai.id
          , EXISTS (SELECT *
            FROM quizzes_quiz q 
            WHERE q.id = cai.activity_id AND q.name != 'Exit Ticket Quiz'
            ) AS not_exit
          , EXISTS ( SELECT *
            FROM common_activityinstance_settings cais  
            JOIN common_activitysetting cas ON cas.id = cais.id
            WHERE cai.id = cais.activityinstance_id
            AND cas.key = 'disable_student_nav' AND cas.value = 'True'
            ) AS disabled
          , EXISTS ( SELECT *
            FROM common_activityinstance_settings cais 
            JOIN common_activitysetting cas ON cas.id = cais.id
            WHERE cai.id = cais.activityinstance_id
            AND cas.key = 'pacing' AND cas.value = 'student')
            ) AS pacing
          FROM common_activityinstance cai
          WHERE cai.end_time::date = '2015-09-12' AND cai.activity_type = 'QZ'
    ) my_cube
GROUP BY 1,2,3
ORDER BY 1,2,3
  ;

最后说明:此方法基于我的假设,即底层数据模型实际上是 EAV 模型,并且每个学生最多可以出现一次属性。


0
投票

这是部分答案。 后两者可以合并为一个查询:

SELECT (case when key = 'disable_student_nav' then 'spn' 
             when key = 'pacing' then 'tp'
        end) as type, COUNT(*)
FROM common_activityinstance cai JOIN
     common_activityinstance_settings cais
     ON cai.id = cais.activityinstance_id JOIN
     common_activitysetting cas
     ON cas.id = cais.id
WHERE cai.end_time::date = '2015-09-12' AND cai.activity_type = 'QZ' AND
      (key, value) in (('disable_student_nav', 'False'), ('pacing', 'teacher'))
GROUP BY type

我想知道是否有办法将第一组放入类似的逻辑中。 例如,如果

QZ
条件可以应用于所有三个组,那么添加第一组就会很容易。


0
投票

您可以将

case
与每种类型的
where
子句中的条件一起使用。然而,第一个查询的
having
条件不会被满足。

select type, count(*) as count
from
(
SELECT cai.id,
case when q.name!= 'Exit Ticket Quiz' and key = 'disable_student_nav' 
AND value = 'True' OR key = 'pacing' AND value = 'student' then 'spf'
     when key = 'disable_student_nav' AND value = 'False' then 'spn'
     when key = 'pacing' AND value = 'teacher' then 'tp'
 end as type
      FROM common_activityinstance cai
      JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id
      JOIN common_activitysetting cas ON cas.id = cais.id
      JOIN quizzes_quiz q ON q.id = cai.activity_id
      WHERE cai.end_time::date = '2015-09-12'
      AND q.name != 'Exit Ticket Quiz'
      AND cai.activity_type = 'QZ'
) t
group by type

-1
投票

没有办法让查询更加高效,不是。 您可以设置一个视图或其他任何东西,但它总是必须运行三次。 但是您可以通过在 PHP 或 PL/SQL 等中进行一些后处理来解决该问题。 从更简单的查询开始,如下所示:

SELECT COUNT(*), cai.id, q.name, key, value 来自 common_activityinstance 蔡 JOIN common_activityinstance_settings cais ON cai.id = cais.activityinstance_id JOIN common_activitysetting cas ON cas.id = cais.id 哪里 cai.end_time::date = '2015-09-12' 按 cai.id、q.name、键、值分组

...从您的解释中我不清楚这是否会产生合理数量的输出行。 但假设确实如此,请编写一些代码将它们调整为您想要的形状。

© www.soinside.com 2019 - 2024. All rights reserved.