如何获取父数据及其所有子数据,然后聚合到数组?

问题描述 投票:0回答:1

transactions
桌子和
logs
桌子。
logs
通过
transactions
链接到
transaction_id
。我需要通过
logs
查询
address
,将其与
transactions
连接,将日志聚合到数组,限制事务(示例为
LIMIT 2
)和 获取该事务中的所有日志(但仅查询一个
address
字段)。
transactions.hash
varchar

create table transactions
(hash varchar,
 t_value varchar
);
 
insert into transactions values 
('h1','v1'),
('h2','v2'),
('h3','v3'),
('h4','v4'),
('h5','v5')
;

create table logs
(transaction_hash varchar,
 address varchar,
 l_value varchar
);
 
insert into logs values 
('h1', 'a1', 'h1.a1.1'),
('h1', 'a1', 'h1.a1.2'),
('h1', 'a3', 'h1.a3.1'),
('h2', 'a1', 'h2.a1.1'),
('h2', 'a2', 'h2.a2.1'),
('h2', 'a2', 'h2.a2.2'),
('h2', 'a3', 'h2.a3.1'),
('h3', 'a2', 'h3.a2.1'),
('h4', 'a1', 'h4.a1.1'),
('h5', 'a2', 'h5.a2.1'),
('h5', 'a3', 'h5.a3.1')
;


create index on transaction(hash);
create index on logs(address);

结果必须带有查询

WHERE log.address='a2' LIMIT 2
:

hash    t_value  logs_array
h2      v2       {"{"address" : "a1", "l_value" : "h2.a1.1"}","{"address" : "a2", "l_value" : "h2.a2.1"}","{"address" : "a2", "l_value" : "h2.a2.2"}","{"address" : "a3", "l_value" : "h2.a3.1"}"}
h3      v3       {"{"address" : "a2", "l_value" : "h3.a2.1"}"}

问题:下面的 sql 查询工作正常,但是对于大量日志(1 个地址有 100k+ 日志),搜索可能需要很多分钟。解决方案将在

LIMIT
中设置
MATERIALIZED
,但在这种情况下,我可以获得不完全正确的日志列表的事务。如何修复?要么重写不带
MATERIALIZED
的查询,并在彼此内部使用多个
SELECT
,但我不知道如何,或者用
MATERIALIZED
修复。

所以问题是 Postgres 无法正确理解

MATERIALIZED
我需要有限数量的事务,它首先搜索所有日志,然后将它们附加到有限制的事务中(正如我猜测的那样)。
logs(address)
上的索引已设置。

WITH 
    b AS MATERIALIZED (
        SELECT lg.transaction_hash
        FROM logs lg
        WHERE lg.address='a2'
      
        -- this must be commented, otherwise not correct results, although fast execution
        -- LIMIT 2
    )
SELECT 
    hash,
    t_value,
    (SELECT array_agg(JSON_BUILD_OBJECT('address',address,'l_value',l_value)) FROM logs WHERE transaction_hash = t.hash) logs_array
FROM transactions t 
WHERE t.hash IN 
    (SELECT transaction_hash FROM b)
LIMIT 2

现实世界的示例,查询执行约 30 秒(在数据库中我有

transaction_id
整数,但它没有增加主键):

EXPLAIN WITH 
    b AS MATERIALIZED (
        SELECT lg.transaction_id
        FROM _logs lg
        WHERE lg.address in ('0xca530408c3e552b020a2300debc7bd18820fb42f', '0x68e78497a7b0db7718ccc833c164a18d8e626816')
    )
SELECT 
    (SELECT array_agg(JSON_BUILD_OBJECT('address',address)) FROM _logs WHERE transaction_id = t.id) logs_array
FROM _transactions t 
WHERE t.id IN 
    (SELECT transaction_id FROM b)
LIMIT 5000;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=87540.62..3180266.26 rows=5000 width=32)
   CTE b
     ->  Index Scan using _logs_address_idx on _logs lg  (cost=0.70..85820.98 rows=76403 width=8)
           Index Cond: ((address)::text = ANY ('{0xca530408c3e552b020a2300debc7bd18820fb42f,0x68e78497a7b0db7718ccc833c164a18d8e626816}'::text[]))
   ->  Nested Loop  (cost=1719.64..47260423.09 rows=76403 width=32)
         ->  HashAggregate  (cost=1719.07..1721.07 rows=200 width=8)
               Group Key: b.transaction_id
               ->  CTE Scan on b  (cost=0.00..1528.06 rows=76403 width=8)
         ->  Index Only Scan using _transactions_pkey on _transactions t  (cost=0.57..2.79 rows=1 width=8)
               Index Cond: (id = b.transaction_id)
         SubPlan 2
           ->  Aggregate  (cost=618.53..618.54 rows=1 width=32)
                 ->  Index Scan using _logs_transaction_id_idx on _logs  (cost=0.57..584.99 rows=6707 width=43)
                       Index Cond: (transaction_id = t.id)
 JIT:
   Functions: 17
   Options: Inlining true, Optimization true, Expressions true, Deforming true
(17 rows)
sql postgresql greatest-n-per-group psql postgresql-performance
1个回答
0
投票

您应该将 DISTINCT 关键字添加到名为 b 的 CTE 中,并在 CTE 中使用 LIMIT 2。你不再需要在主sql中使用LIMIT:

WITH 
    b AS MATERIALIZED
      ( Select  DISTINCT lg.transaction_hash
        From    logs lg
        Where   lg.address='a2'
        LIMIT 2
      )
Select    hash, t_value,
          ( Select   ARRAY_AGG( JSON_BUILD_OBJECT('address', address, 'l_value', l_value) ) 
            From     logs 
            Where    transaction_hash = t.hash
          ) logs_array
From     transactions t 
Where    t.hash IN( Select transaction_hash From b)
© www.soinside.com 2019 - 2024. All rights reserved.