注意:这里涉及到ColumnStore。
在工作中,我们有一个很大的 SQL 语句,在产品上执行需要太多内存。我目前正在努力减少查询消耗的大小。我尝试过使用不同的方法,但由于某种原因,到目前为止,除了
WITH ... AS (...)
之外,没有任何方法可以解决问题。但是,我需要将其与 INSERT INTO ...
结合起来。
这是我正在尝试运行的代码
TRUNCATE db1.myTable;
INSERT INTO db1.myTable(`all`, `needed`, `columns`)
(WITH everything AS (
SELECT all, needed, columns
FROM db1.mainTable T1
JOIN db1.secondTable T2
ON (T1.someCol = T2.someCol)
JOIN db2.thirdTable T3
ON (T1.anotherCol = T3.anotherCol)
LEFT JOIN db1.fourthTable T4
ON (T4.anotherCol = T1.anotherCol)
WHERE T2.yetAnotherCol >= (some_SELECT_subquery)
AND T1.valid = 1
) SELECT * FROM everything);
EXPLAIN (WITH everything AS ...
回归
+------+-------------+-----------------------+------+---------------+------+---------+------+------+-------------------------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+------+-------------+-----------------------+------+---------------+------+---------+------+------+-------------------------------------------------+
| 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 16000000000000 | |
| 2 | PRIMARY | T1 | ALL | NULL | NULL | NULL | NULL | 2000 | Using where with pushed condition |
| 2 | PRIMARY | T2 | ALL | NULL | NULL | NULL | NULL | 2000 | Using where; Using join buffer (flat, BNL join) |
| 2 | PRIMARY | T3 | ALL | NULL | NULL | NULL | NULL | 2000 | Using where; Using join buffer (flat, BNL join) |
| 2 | PRIMARY | T4 | ALL | NULL | NULL | NULL | NULL | 2000 | Using where |
| 3 | SUBQUERY | some_SELECT_subquery | ALL | NULL | NULL | NULL | NULL | 2000 | Using where with pushed condition |
+------+-------------+-----------------------+------+---------------+------+---------+------+------+-------------------------------------------------+
5 rows in set (0,21 sec)
如果我只使用
WITH
语句,我就可以让它工作。例如,我不使用 INSERT INTO
。完全没有问题,而且这样查询速度更快。我还进行了快速测试,试图将查询分成几个WITH
,但放弃了,因为我相信我弄乱了语法。我不太擅长 SQL,更不擅长 JOIN
s(初级开发人员)。
当我将
WITH
语句与 INSER INTO ...
组合时,MariaDB 会响应 ERROR 1064 (42000) at line 3: You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near ') SELECT * FROM everything)' at line 1
。我还尝试在 ... valid = 1
之后添加分号,合并最后两行,将 ... AS
之后的左括号放置在新行上,以及我认为可能与语法相关的其他一些问题。没有运气。
我目前的想法是,你不能将
INSERT INTO ... SELECT ...
与 WITH ...
结合起来。至少开头没有 WITH
,而 SELECT 应该在该位置。这是我可以从docs收集到的内容。
所以,简而言之,我的问题是:我可以将
INSERT INTO ... SELECT
与 WITH
语句结合起来吗?如果没有,我可以用另一种技术实现类似的效果吗?
还有其他方法可以提高查询的内存利用率吗?我不想弄乱 MariaDB 或 Docker 的配置选项,但如果这是唯一的可能性,我会考虑它。
你尝试过这个吗?
TRUNCATE db1.myTable;
WITH everything AS (
SELECT all, needed, columns
FROM db1.mainTable T1
JOIN db1.secondTable T2
ON (T1.someCol = T2.someCol)
JOIN db2.thirdTable T3
ON (T1.anotherCol = T3.anotherCol)
LEFT JOIN db1.fourthTable T4
ON (T4.anotherCol = T1.anotherCol)
WHERE T2.yetAnotherCol >= (some_SELECT_subquery)
AND T1.valid = 1
) INSERT INTO db1.myTable SELECT * FROM everything;
虽然我没有找到原始问题的答案,但我们决定通过减少子查询中收集的数据量来解决该问题。我没有在原来的问题中透露这一点,因为这不是我在发布问题时所知道的解决方案。我们只需从 Python 脚本中调用 SQL,即可循环获取我们想要获取的周数。
WHERE T2.ID >= (SELECT ID - {week_number} FROM db1.secondTable WHERE NOW() BETWEEN monday AND sunday) AND T1.valid = 1);
我知道这是一个非常古老的问题,并假设这已经在OP中解决了,但我发布了一个答案,这样它可能会帮助其他人寻找。
原始查询存在几个问题,我将对其进行更改以优化查询。
为了使WITH()、Insert、SElect 工作,我将像这样重写查询。我将优化建议决定留给 OP。
TRUNCATE db1.myTable;
INSERT INTO db1.myTable(`all`, `needed`, `columns`)
SELECT all, needed, COLUMNS FROM
(
WITH everything AS (
SELECT all, needed, columns
FROM db1.mainTable T1
JOIN db1.secondTable T2
ON (T1.someCol = T2.someCol)
JOIN db2.thirdTable T3
ON (T1.anotherCol = T3.anotherCol)
LEFT JOIN db1.fourthTable T4
ON (T4.anotherCol = T1.anotherCol)
WHERE T2.yetAnotherCol >= (some_SELECT_subquery)
AND T1.valid = 1
)
SELECT * FROM everything
) s1
;