我有一张带有 DDL 的表:
CREATE TABLE public.test_table (
id serial4 NOT NULL,
"date" int8 NOT NULL,
user_id int4 NOT NULL,
device_id int4 NULL,
CONSTRAINT test_table_date_user_id_device_id_key UNIQUE (date, user_id, device_id),
CONSTRAINT test_table_pkey PRIMARY KEY (id)
);
ALTER TABLE public.test_table ADD CONSTRAINT test_table FOREIGN KEY (device_id) REFERENCES public.t_device(id);
ALTER TABLE public.test_table ADD CONSTRAINT test_table_user_id_fkey FOREIGN KEY (user_id) REFERENCES public.t_user(id) ON DELETE CASCADE;
样品:
| id | date | user_id |
| -------- | -------------- | ------- |
| 1886 | 1716625890 | 5 |
| 1887 | 1716626430 | 5 |
| 1888 | 1716627030 | 5 |
我需要找到按天分组的行数,还需要打印每天的第一个和最后一个日期。作为输入,我有两个日期:
from
和to
和user_id
。
我正在尝试使用以下sql:
with calendar as (
select
d
from
generate_series(to_timestamp(:FROM)::date,
to_timestamp(:TO)::date,
interval '1 day') d)
select
c.d::date as item_date,
count(dt.id) as item_count,
min(dt."date") as item_min,
max(dt."date") as item_max
from
test_table dt
left join
calendar c
on
to_timestamp(dt.date)::date >= c.d
and
to_timestamp(dt.date)::date < c.d + interval '1 day'
where dt.user_id = 5
group by
c.d
order by
c.d;
我目前的输出:
|item_date |item_count|item_min |item_max |
|----------|----------|-------------|-------------|
|2024-05-25|144 |1,716,584,490|1,716,670,230|
|2024-05-26|144 |1,716,670,816|1,716,756,630|
|2024-05-27|144 |1,716,757,221|1,716,843,030|
| |4,770 |1,716,286,230|1,719,514,890|
explain(analyze, verbose, buffers, settings)
结果:
Sort (cost=199067.86..199068.36 rows=200 width=36) (actual time=10.152..10.154 rows=3 loops=1)
Output: ((d.d)::date), (count(dt.id)), (min(dt.date)), (max(dt.date)), d.d
Sort Key: d.d
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=363
-> HashAggregate (cost=199057.72..199060.22 rows=200 width=36) (actual time=10.144..10.146 rows=3 loops=1)
Output: (d.d)::date, count(dt.id), min(dt.date), max(dt.date), d.d
Group Key: d.d
Buffers: shared hit=363
-> Nested Loop (cost=0.30..192966.61 rows=609111 width=20) (actual time=0.292..10.074 rows=432 loops=1)
Output: d.d, dt.id, dt.date
Join Filter: (((to_timestamp((dt.date)::double precision))::date >= d.d) AND ((to_timestamp((dt.date)::double precision))::date < (d.d + '1 day'::interval)))
Rows Removed by Join Filter: 15174
Buffers: shared hit=363
-> Function Scan on pg_catalog.generate_series d (cost=0.01..10.01 rows=1000 width=8) (actual time=0.006..0.008 rows=3 loops=1)
Output: d.d
Function Call: generate_series((('2024-05-25 11:31:30+03'::timestamp with time zone)::date)::timestamp with time zone, (('2024-05-27 14:50:30+03'::timestamp with time zone)::date)::timestamp with time zone, '1 day'::interval)
-> Materialize (cost=0.29..1100.30 rows=5482 width=12) (actual time=0.011..1.198 rows=5202 loops=3)
Output: dt.id, dt.date
Buffers: shared hit=363
-> Index Scan using test_table_date_user_id_device_id_key on public.test_table dt (cost=0.29..1072.89 rows=5482 width=12) (actual time=0.032..2.070 rows=5202 loops=1)
Output: dt.id, dt.date
Index Cond: (dt.user_id = 5)
Buffers: shared hit=363
Settings: effective_cache_size = '1377MB', effective_io_concurrency = '200', max_parallel_workers = '1', random_page_cost = '1.1', search_path = 'public, public, "$user"', work_mem = '524kB'
Planning Time: 0.147 ms
Execution Time: 10.264 ms
查询是否最优?
不是的。 CTE 是不必要的 - 您已经将 Unix 时间戳转换为日期了
to_timestamp(dt.date)::date
。我不认为有理由生成与 generate_series()
不匹配的日期。
是否可以删除结果中最后一个总计行?
它在那里是因为您在 CTE 中生成的
4,770
日期与 test_table
中的任何内容都不匹配。如果删除 CTE,您将摆脱最后一行,缩短、简化并加快查询速度:db<>fiddle 的演示
select
to_timestamp(dt.date)::date as item_date,
count(dt.id) as item_count,
min(dt."date") as item_min,
max(dt."date") as item_max
from
test_table dt
where dt.user_id = 5
--and to_timestamp(dt.date)::date >= to_timestamp(:FROM)::date
--and to_timestamp(dt.date)::date < to_timestamp(:TO)::date
group by
item_date
order by
item_date;
如果
generate_series()
是为了缩小目标期限(看起来像 13 年),您可以取消注释上面的两个附加条件。