我们使用 Prisma 访问 Postgres 数据库。下面的查询运行速度非常慢。对 SQL 进行简单修改(如下所示)可使其速度提高几个数量级。有没有办法告诉 Prisma 使用修改后的版本。我想避免速度变慢,最好不要诉诸原始 SQL。
Prisma 查询:
this.prismaService.users.findMany({
where: {
events: {
// they have not had activity since timeWindowEnd
none: {
timestamp: {
gte: timeWindowEnd,
},
},
},
},
});
这会生成以下 SQL
SELECT *
FROM "public"."users" AS "t1"
WHERE (
("t1"."id") NOT IN (
SELECT "t5"."user_id"
FROM "public"."events" AS "t5"
WHERE (
"t5"."timestamp" >= '2024-07-01T11:46:00'
AND "t5"."user_id" IS NOT NULL
)
-- GROUP BY "t5"."user_id" -- This would fix the slowdown
)
)
ORDER BY "t1"."id" ASC
执行查询时,Postgres 会具体化子查询。由于用户可以拥有数千个事件,因此这样做的成本非常高。
如果我修改子查询,添加上面注释掉的
GROUP BY
,那么中间结果会小很多,并且查询几乎立即完成。
有没有一种方法可以在不使用原始 SQL 的情况下获得快速行为?
编辑添加两个版本的查询计划:
这是慢速版本。请注意,它具体化了大约 150k 中间行:
Index Scan using users_pkey on public.users t1 (cost=3255.46..247110031.77 rows=7422 width=1751) (actual time=119.698..218304.
278 rows=13094 loops=1)
Output: t1.id, [... 17 columns omitted]
Filter: (NOT (SubPlan 1))
Rows Removed by Filter: 1750
Buffers: shared hit=10733, temp read=3342203 written=242
SubPlan 1
-> Materialize (cost=3255.17..36189.31 rows=143833 width=4) (actual time=0.001..8.240 rows=131478 loops=14844)
Output: t5.user_id
Buffers: shared hit=5577, temp read=3342203 written=242
-> Bitmap Heap Scan on public.events t5 (cost=3255.17..34908.15 rows=143833 width=4) (actual time=5.537..21.790 rows=1413
24 loops=1)
Output: t5.user_id
Recheck Cond: (t5."timestamp" >= '2024-07-01 11:46:00'::timestamp without time zone)
Filter: (t5.user_id IS NOT NULL)
Heap Blocks: exact=5054
Buffers: shared hit=5577
-> Bitmap Index Scan on events_timestamp_index (cost=0.00..3219.21 rows=143838 width=0) (actual time=5.01
3..5.013 rows=141324 loops=1)
Index Cond: (t5."timestamp" >= '2024-07-01 11:46:00'::timestamp without time zone)
Buffers: shared hit=523
Planning Time: 0.183 ms
JIT:
Functions: 10
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 0.941 ms, Inlining 6.450 ms, Optimization 42.365 ms, Emission 22.165 ms, Total 71.920 ms
Execution Time: 218312.152 ms
这是使用 GROUP BY 的更快版本:
Index Scan using users_pkey on public.users t1 (cost=35374.63..36142.55 rows=7422 width=1751) (actual time=57.621..62.388 rows
=13094 loops=1)
Output: t1.id, [... 17 columns omitted]
Filter: (NOT (hashed SubPlan 1))
Rows Removed by Filter: 1750
Buffers: shared hit=10733
SubPlan 1
-> HashAggregate (cost=35267.73..35353.02 rows=8529 width=4) (actual time=56.818..57.077 rows=1750 loops=1)
Output: t5.user_id
Group Key: t5.user_id
Batches: 1 Memory Usage: 529kB
Buffers: shared hit=5577
-> Bitmap Heap Scan on public.events t5 (cost=3255.17..34908.15 rows=143833 width=4) (actual time=8.642..36.639 rows=1413
24 loops=1)
Output: t5.id, [... 16 columns omitted]
Recheck Cond: (t5."timestamp" >= '2024-07-01 11:46:00'::timestamp without time zone)
Filter: (t5.user_id IS NOT NULL)
Heap Blocks: exact=5054
Buffers: shared hit=5577
-> Bitmap Index Scan on events_timestamp_index (cost=0.00..3219.21 rows=143838 width=0) (actual time=7.90
7..7.907 rows=141324 loops=1)
Index Cond: (t5."timestamp" >= '2024-07-01 11:46:00'::timestamp without time zone)
Buffers: shared hit=523
Planning Time: 0.185 ms
Execution Time: 63.114 ms
我们最终使用了以下查询以及相关子查询。这似乎比
NOT IN
加 GROUP BY
的表现还要好。 @FrankHeikens 建议的用户 ID + 时间戳索引也有助于相关子查询(但不适用于原始版本)。
SELECT u.*
FROM users u
WHERE 1 = 1
--they have not had activity since timeWindowEnd
AND NOT EXISTS (
SELECT 1
FROM events e3
WHERE e3.user_id = u.id
AND e3.timestamp >= '2024-07-01T11:46:00'
)
;
有一个用于优化
some
和 none
关系过滤器的开放 Prisma 问题:https://github.com/prisma/prisma/issues/24524