有一个示例表(> 20M 行),如下所示:
CREATE TABLE T (
D DATE,
V INT
);
INSERT INTO T VALUES ('2024-07-01', 1), ('2024-07-02', 2), ('2024-07-02', 3);
我有一个现有的查询,大约需要 5 秒:
SELECT D, 'SUM', SUM(V)
FROM T
GROUP BY D
UNION
SELECT D, 'AVG', AVG(V)
FROM T
GROUP BY D
ORDER BY 1;
和需要输出:
A 栏 | B 栏 | B 栏 |
---|---|---|
2024-07-01 | 总和 | 1.0 |
2024-07-01 | 平均 | 1.0 |
2024-07-02 | 总和 | 5.0 |
2024-07-02 | 平均 | 2.5 |
为了避免多次扫描,我这样重写(大约1秒):
SELECT D, SUM(V), AVG(V)
FROM T
GROUP BY D;
我需要保留输出,只能在一个查询中完成,所以我尝试了一个公用表表达式:
WITH CTE AS (
SELECT D, SUM(V) AS S, AVG(V) AS A
FROM T
GROUP BY D
)
SELECT D, 'SUM', S
FROM CTE
UNION
SELECT D, 'AVG', A
FROM CTE
ORDER BY 1;
但是表仍然扫描两次,查询仍然在 5 秒内:
选择类型 | 桌子 |
---|---|
小学 | < derived2> |
派生 | T |
联盟 | < derived4> |
派生 | T |
工会结果 |
是否有一个选项可以在一次查询且仅一次扫描中执行此操作?
编辑:
找到了 json 的解决方案,但我不喜欢它:
WITH CTE AS (
SELECT D, JSON_ARRAY(SUM(V), AVG(V)) AS data
FROM T
GROUP BY D
)
SELECT
c.D,
CASE WHEN JT.Id = 1 THEN 'SUM'
WHEN JT.Id = 2 THEN 'AVG'
END AS F,
JT.N
FROM CTE c,
JSON_TABLE(c.data, '$[*]'
COLUMNS(
Id for ordinality,
N FLOAT PATH '$[0]'
)
) AS JT;
我会为此使用子选择:
select D, rowtype, if(rowtype='SUM',sum_v,avg_v) value
from (
select D, sum(V) sum_v, avg(V) avg_v
from T
group by D
) sum_or_avg
join (select 'SUM' rowtype union all select 'AVG') rowtype
order by D, rowtype desc