将记录分入预定义大小的批次的最佳方法是什么?我想用批次/桶号标记每条记录以供进一步处理。
例如,假设我有 1110 条记录,批次/桶大小为 200,我最终应该是:
背景/上下文用于每个批次,然后由外部服务处理,该外部服务具有允许的最大批次大小(每个服务各不相同)。
我找到了一种使用 NTILE 窗口函数:
来做到这一点的方法WITH records AS (
SELECT RANDOM() AS value
FROM TABLE (generator(rowcount => 1110))
),
batches AS (
SELECT
value,
NTILE(
(SELECT COUNT(*) FROM records)/200
) OVER (
ORDER BY NULL
) AS batch
FROM records
)
SELECT batch, COUNT(*)
FROM batches
GROUP BY ALL;
结果:
我的第一种方法是使用 ROW_NUMBER(),但是这种方法的开销更大(尽管结果是相同的):
WITH records AS (
SELECT RANDOM() AS value
FROM TABLE (generator(rowcount => 1110))
),
batches AS (
SELECT
value,
ROW_NUMBER() OVER (
ORDER BY NULL
) AS rn,
(rn % CEIL((SELECT COUNT(*) FROM records)/200)) AS batch
FROM records
)
SELECT batch, COUNT(*)
FROM batches
GROUP BY ALL;