我有一个批处理 DuckDB 脚本,它设置自定义
memory_limit
。但是,当它在一条语句上出现 OOM 时,它似乎不会退出,而是继续并尝试批处理脚本中的其余语句(由于语句失败,这些语句全部失败或毫无意义)。这会在我的日志文件中留下非常混乱的消息,如果说它在 UPDATE
命令上失败,但随后继续执行文件的其余部分,则可能会导致错误。
如果语句失败(例如上面的 OOM),有没有办法强制 DuckDB 立即退出?
示例脚本:
SET memory_limit = '100GB';
SET threads TO 64;
CREATE TABLE summary AS
SELECT
point_id,
COUNT(pred_occ) AS ensemble_support,
AVG(pred_range) AS pat_mean,
QUANTILE_CONT(pred_occ, [0.1, 0.5, 0.9]) AS occurrence_quantiles,
QUANTILE_CONT(pred_count, [0.1, 0.5, 0.9]) AS count_quantiles,
QUANTILE_CONT(pred_occ * pred_count, [0.1, 0.5, 0.9]) AS abundance_quantiles
FROM predictions
GROUP BY point_id;
CREATE TABLE erd AS
SELECT
CAST(SPLIT_PART(point_id, '-', 2) AS BIGINT) AS checklist_id,
SPLIT_PART(point_id, '-', 1) AS type,
ensemble_support, pat_mean,
occurrence_quantiles[2] AS occurrence_median,
occurrence_quantiles[1] AS occurrence_lower,
occurrence_quantiles[3] AS occurrence_upper,
count_quantiles[2] AS count_median,
count_quantiles[1] AS count_lower,
count_quantiles[3] AS count_upper,
abundance_quantiles[2] AS abundance_median,
abundance_quantiles[1] AS abundance_lower,
abundance_quantiles[3] AS abundance_upper
FROM summary
WHERE point_id LIKE 'test-%';
COPY (
SELECT
s.*,
e.latitude, e.longitude, e.year, e.day_of_year, e.observer_id
FROM erd as s
INNER JOIN '{input_erd_pq}' as e
ON s.checklist_id = e.checklist_id
) TO '{predictions_erd_pq}' (FORMAT 'parquet');
COPY (
SELECT
CAST(SPLIT_PART(point_id, '-', 2) AS BIGINT) AS srd_id,
CAST(NULLIF(SPLIT_PART(point_id, '-', 3), '') AS INTEGER) AS day_of_year,
ensemble_support, pat_mean,
occurrence_quantiles[2] AS occurrence_median,
occurrence_quantiles[1] AS occurrence_lower,
occurrence_quantiles[3] AS occurrence_upper,
count_quantiles[2] AS count_median,
count_quantiles[1] AS count_lower,
count_quantiles[3] AS count_upper,
abundance_quantiles[2] AS abundance_median,
abundance_quantiles[1] AS abundance_lower,
abundance_quantiles[3] AS abundance_upper
FROM summary
WHERE point_id LIKE 'srd-%') TO '{predictions_srd_pq}' (FORMAT 'parquet');
错误日志:
Out of Memory Error: Failed to allocate block of 2048 bytes (bad allocation)
Catalog Error: Table with name summary does not exist!
Did you mean "temp.information_schema.schemata"?
LINE 15: FROM summary
^
Catalog Error: Table with name erd does not exist!
Did you mean "temp.information_schema.tables"?
LINE 5: FROM erd as s
^
Catalog Error: Table with name summary does not exist!
Did you mean "temp.information_schema.schemata"?
LINE 15: FROM summary
^
在我看来,它已经尝试运行所有 4 个命令,即使之前的每个命令都失败了!
在 duckdb CLI 中使用
-bail
标志。
$ duckdb -help 2>&1 | grep bail
-bail stop after hitting an error
在 DuckDB 1.0.0 上测试,使用
a.sql
:
set memory_limit = '1mb';
select 1;
select * from range(10000) a, range(10000) b order by a.range + b.range;
select 2;
(使用
-csv
仅用于简洁输出)
$ duckdb -csv < a.sql
1
1
Out of Memory Error: could not allocate block of size 256.0 KiB (784.0 KiB/976.5 KiB used)
2
2
$ duckdb -csv -bail < a.sql
1
1
Out of Memory Error: could not allocate block of size 256.0 KiB (784.0 KiB/976.5 KiB used)