DuckDB OOM 时批量退出

问题描述 投票:0回答:1

我有一个批处理 DuckDB 脚本,它设置自定义

memory_limit
。但是,当它在一条语句上出现 OOM 时,它似乎不会退出,而是继续并尝试批处理脚本中的其余语句(由于语句失败,这些语句全部失败或毫无意义)。这会在我的日志文件中留下非常混乱的消息,如果说它在
UPDATE
命令上失败,但随后继续执行文件的其余部分,则可能会导致错误。

如果语句失败(例如上面的 OOM),有没有办法强制 DuckDB 立即退出?

示例脚本:

SET memory_limit = '100GB';
SET threads TO 64;

CREATE TABLE summary AS
  SELECT
    point_id,
    COUNT(pred_occ) AS ensemble_support,
    AVG(pred_range) AS pat_mean,
    QUANTILE_CONT(pred_occ, [0.1, 0.5, 0.9]) AS occurrence_quantiles,
    QUANTILE_CONT(pred_count, [0.1, 0.5, 0.9]) AS count_quantiles,
    QUANTILE_CONT(pred_occ * pred_count, [0.1, 0.5, 0.9]) AS abundance_quantiles
  FROM predictions
  GROUP BY point_id;

CREATE TABLE erd AS
  SELECT
    CAST(SPLIT_PART(point_id, '-', 2) AS BIGINT) AS checklist_id,
    SPLIT_PART(point_id, '-', 1) AS type,
    ensemble_support, pat_mean,
    occurrence_quantiles[2] AS occurrence_median,
    occurrence_quantiles[1] AS occurrence_lower,
    occurrence_quantiles[3] AS occurrence_upper,
    count_quantiles[2] AS count_median,
    count_quantiles[1] AS count_lower,
    count_quantiles[3] AS count_upper,
    abundance_quantiles[2] AS abundance_median,
    abundance_quantiles[1] AS abundance_lower,
    abundance_quantiles[3] AS abundance_upper
  FROM summary
  WHERE point_id LIKE 'test-%';

COPY (
  SELECT
    s.*,
    e.latitude, e.longitude, e.year, e.day_of_year, e.observer_id
  FROM erd as s
  INNER JOIN '{input_erd_pq}' as e
    ON s.checklist_id = e.checklist_id
) TO '{predictions_erd_pq}' (FORMAT 'parquet');

COPY (
  SELECT
    CAST(SPLIT_PART(point_id, '-', 2) AS BIGINT) AS srd_id,
    CAST(NULLIF(SPLIT_PART(point_id, '-', 3), '') AS INTEGER) AS day_of_year,
    ensemble_support, pat_mean,
    occurrence_quantiles[2] AS occurrence_median,
    occurrence_quantiles[1] AS occurrence_lower,
    occurrence_quantiles[3] AS occurrence_upper,
    count_quantiles[2] AS count_median,
    count_quantiles[1] AS count_lower,
    count_quantiles[3] AS count_upper,
    abundance_quantiles[2] AS abundance_median,
    abundance_quantiles[1] AS abundance_lower,
    abundance_quantiles[3] AS abundance_upper
  FROM summary
  WHERE point_id LIKE 'srd-%') TO '{predictions_srd_pq}' (FORMAT 'parquet');

错误日志:

Out of Memory Error: Failed to allocate block of 2048 bytes (bad allocation)
Catalog Error: Table with name summary does not exist!
Did you mean "temp.information_schema.schemata"?
LINE 15:   FROM summary
                ^
Catalog Error: Table with name erd does not exist!
Did you mean "temp.information_schema.tables"?
LINE 5:   FROM erd as s
               ^
Catalog Error: Table with name summary does not exist!
Did you mean "temp.information_schema.schemata"?
LINE 15:   FROM summary
                ^

在我看来,它已经尝试运行所有 4 个命令,即使之前的每个命令都失败了!

sql out-of-memory duckdb
1个回答
0
投票

在 duckdb CLI 中使用

-bail
标志。

$ duckdb -help 2>&1 | grep bail
   -bail                stop after hitting an error

在 DuckDB 1.0.0 上测试,使用

a.sql

set memory_limit = '1mb';

select 1;

select * from range(10000) a, range(10000) b order by a.range + b.range;

select 2;

(使用

-csv
仅用于简洁输出)

$ duckdb -csv < a.sql
1
1
Out of Memory Error: could not allocate block of size 256.0 KiB (784.0 KiB/976.5 KiB used)
2
2
$ duckdb -csv -bail < a.sql
1
1
Out of Memory Error: could not allocate block of size 256.0 KiB (784.0 KiB/976.5 KiB used)
© www.soinside.com 2019 - 2024. All rights reserved.