避免关系条件产生较大的中间结果

Question

我们使用 Prisma 访问 Postgres 数据库。下面的查询运行速度非常慢。对 SQL 进行简单修改（如下所示）可使其速度提高几个数量级。有没有办法告诉 Prisma 使用修改后的版本。我想避免速度变慢，最好不要诉诸原始 SQL。

Prisma 查询：

this.prismaService.users.findMany({
  where: {
    events: {
      // they have not had activity since timeWindowEnd
      none: {
        timestamp: {
          gte: timeWindowEnd,
        },
      },
    },
  },
});

这会生成以下 SQL

SELECT *
FROM "public"."users" AS "t1"
WHERE (
        ("t1"."id") NOT IN (
            SELECT "t5"."user_id"
            FROM "public"."events" AS "t5"
            WHERE (
                    "t5"."timestamp" >= '2024-07-01T11:46:00'
                    AND "t5"."user_id" IS NOT NULL
                )
            -- GROUP BY "t5"."user_id"  -- This would fix the slowdown
        )
    )
ORDER BY "t1"."id" ASC

执行查询时，Postgres 会具体化子查询。由于用户可以拥有数千个事件，因此这样做的成本非常高。

如果我修改子查询，添加上面注释掉的

GROUP BY

，那么中间结果会小很多，并且查询几乎立即完成。

有没有一种方法可以在不使用原始 SQL 的情况下获得快速行为？

编辑添加两个版本的查询计划：

这是慢速版本。请注意，它具体化了大约 150k 中间行：

 Index Scan using users_pkey on public.users t1  (cost=3255.46..247110031.77 rows=7422 width=1751) (actual time=119.698..218304.
278 rows=13094 loops=1)
   Output: t1.id, [... 17 columns omitted]
   Filter: (NOT (SubPlan 1))
   Rows Removed by Filter: 1750
   Buffers: shared hit=10733, temp read=3342203 written=242
   SubPlan 1
     ->  Materialize  (cost=3255.17..36189.31 rows=143833 width=4) (actual time=0.001..8.240 rows=131478 loops=14844)
           Output: t5.user_id
           Buffers: shared hit=5577, temp read=3342203 written=242
           ->  Bitmap Heap Scan on public.events t5  (cost=3255.17..34908.15 rows=143833 width=4) (actual time=5.537..21.790 rows=1413
24 loops=1)
                 Output: t5.user_id
                 Recheck Cond: (t5."timestamp" >= '2024-07-01 11:46:00'::timestamp without time zone)
                 Filter: (t5.user_id IS NOT NULL)
                 Heap Blocks: exact=5054
                 Buffers: shared hit=5577
                 ->  Bitmap Index Scan on events_timestamp_index  (cost=0.00..3219.21 rows=143838 width=0) (actual time=5.01
3..5.013 rows=141324 loops=1)
                       Index Cond: (t5."timestamp" >= '2024-07-01 11:46:00'::timestamp without time zone)
                       Buffers: shared hit=523
 Planning Time: 0.183 ms
 JIT:
   Functions: 10
   Options: Inlining true, Optimization true, Expressions true, Deforming true
   Timing: Generation 0.941 ms, Inlining 6.450 ms, Optimization 42.365 ms, Emission 22.165 ms, Total 71.920 ms
 Execution Time: 218312.152 ms

这是使用 GROUP BY 的更快版本：

 Index Scan using users_pkey on public.users t1  (cost=35374.63..36142.55 rows=7422 width=1751) (actual time=57.621..62.388 rows
=13094 loops=1)
   Output: t1.id, [... 17 columns omitted]
   Filter: (NOT (hashed SubPlan 1))
   Rows Removed by Filter: 1750
   Buffers: shared hit=10733
   SubPlan 1
     ->  HashAggregate  (cost=35267.73..35353.02 rows=8529 width=4) (actual time=56.818..57.077 rows=1750 loops=1)
           Output: t5.user_id
           Group Key: t5.user_id
           Batches: 1  Memory Usage: 529kB
           Buffers: shared hit=5577
           ->  Bitmap Heap Scan on public.events t5  (cost=3255.17..34908.15 rows=143833 width=4) (actual time=8.642..36.639 rows=1413
24 loops=1)
                 Output: t5.id, [... 16 columns omitted]
                 Recheck Cond: (t5."timestamp" >= '2024-07-01 11:46:00'::timestamp without time zone)
                 Filter: (t5.user_id IS NOT NULL)
                 Heap Blocks: exact=5054
                 Buffers: shared hit=5577
                 ->  Bitmap Index Scan on events_timestamp_index  (cost=0.00..3219.21 rows=143838 width=0) (actual time=7.90
7..7.907 rows=141324 loops=1)
                       Index Cond: (t5."timestamp" >= '2024-07-01 11:46:00'::timestamp without time zone)
                       Buffers: shared hit=523
 Planning Time: 0.185 ms
 Execution Time: 63.114 ms

Answer 1

我们最终使用了以下查询以及相关子查询。这似乎比

NOT IN

加

GROUP BY

的表现还要好。 @FrankHeikens 建议的用户 ID + 时间戳索引也有助于相关子查询（但不适用于原始版本）。

SELECT u.*
FROM users u
WHERE 1 = 1
  --they have not had activity since timeWindowEnd
  AND NOT EXISTS (
    SELECT 1
    FROM events e3
    WHERE e3.user_id = u.id
      AND e3.timestamp >= '2024-07-01T11:46:00'
  )
;

有一个用于优化

some

和

none

关系过滤器的开放 Prisma 问题：https://github.com/prisma/prisma/issues/24524

避免关系条件产生较大的中间结果

问题描述投票：0回答：1

1个回答

最新问题

避免关系条件产生较大的中间结果

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1