在redshift中使用sql函数generate_series()

Question

我想使用redshift中的生成系列功能，但没有成功。

redshift 文档说不支持。以下代码确实有效：

select *
from generate_series(1,10,1)

输出：

1
2
3
...
10

我想对日期做同样的事情。我尝试了多种变体，包括：

select *
from generate_series(date('2008-10-01'),date('2008-10-10 00:00:00'),1)

踢出：

 ERROR: function generate_series(date, date, integer) does not exist
 Hint: No function matches the given name and argument types.
 You may need to add explicit type casts. [SQL State=42883]

还尝试过：

select *
from generate_series('2008-10-01 00:00:00'::timestamp,
'2008-10-10 00:00:00'::timestamp,'1 day')

并尝试过：

select *
from generate_series(cast('2008-10-01 00:00:00' as datetime),
cast('2008-10-10 00:00:00' as datetime),'1 day')

双双踢出：

ERROR: function generate_series(timestamp without time zone, timestamp without time zone, "unknown") does not exist
Hint: No function matches the given name and argument types.
You may need to add explicit type casts. [SQL State=42883]

如果没有，我将使用另一篇文章中的代码：

SELECT to_char(DATE '2008-01-01'
+ (interval '1 month' * generate_series(0,57)), 'YYYY-MM-DD') AS ym

PostgreSQLgenerate_series()以SQL函数作为参数

Answer 1

Amazon Redshift 似乎基于 PostgreSQL 8.0.2。 generate_series() 的时间戳参数是在 8.4 中添加的。

类似这样的东西可以避免这个问题，可能可以在 Redshift 中工作。

SELECT current_date + (n || ' days')::interval
from generate_series (1, 30) n

它可以在 PostgreSQL 8.3 中运行，这是我可以测试的最早版本。它记录在 8.0.26 中。

稍后。。 .

Redshift 中似乎不支持generate_series()。但鉴于您已经验证 select * from generate_series(1,10,1)

does 有效，上面的语法至少给了您一个战斗的机会。（尽管间隔数据类型也被记录为在 Redshift 上不受支持。）

再晚一点。。 .

您还可以创建一个整数表。

create table integers (
  n integer primary key
);

随心所欲地填充它。您也许可以在本地使用generate_series()、转储表并将其加载到Redshift 上。（我不知道；我不使用 Redshift。）

无论如何，您可以使用该表进行简单的日期算术，而无需直接引用generate_series()或间隔数据类型。

select (current_date + n)
from integers
where n < 31;

至少在 8.3 中有效。

Answer 2

今天使用 Redshift，您可以通过使用日期时间函数并输入数字表来生成一系列日期。

select (getdate()::date - generate_series)::date from generate_series(1,30,1)

为我生成这个

Answer 3

Redshift 不完全支持

generate_series()

 功能。请参阅开发人员指南的

不支持的 PostgreSQL 函数部分。

更新

generate_series 现在正在与 Redshift 合作。

SELECT CURRENT_DATE::TIMESTAMP  - (i * interval '1 day') as date_datetime 
FROM generate_series(1,31) i 
ORDER BY 1

这将生成最近 30 天的日期

参考：

Amazon Redshift 中的generate_series 函数

Answer 4

在撰写本文时，我们的 Redshift 实例 (1.0.33426) 上的

generate_series()

 无法用于创建表等操作：

# select generate_series(1,100,1);
1
2
...

# create table normal_series as select generate_series(1,100,1);
INFO: Function "generate_series(integer, integer, integer) not supported.
ERROR: Specified types or functions (one per INFO message) not supported on Redshift tables.

但是，

with recursive

有效：

# create table recursive_series as with recursive t(n) as (select 1::integer union all select n+1 from t where n < 100) select n from t;
SELECT

-- modify as desired, here is a date series:
# select getdate()::date + n from recursive_series;
2021-12-18
2021-12-19
...

Answer 5

我需要做类似的事情，但在 7 天内间隔 5 分钟。所以这是一个基于 CTE 的 hack（丑陋但不太冗长）

INSERT INTO five_min_periods
WITH 
periods  AS (select 0 as num UNION select 1 as num UNION select 2 UNION select 3 UNION select 4 UNION select 5 UNION select 6 UNION select 7 UNION select 8 UNION select 9 UNION select 10 UNION select 11),
hours    AS (select num from periods UNION ALL select num + 12 from periods),
days     AS (select num from periods where num <= 6),
rightnow AS (select CAST( TO_CHAR(GETDATE(), 'yyyy-mm-dd hh24') || ':' || trim(TO_CHAR((ROUND((DATEPART (MINUTE, GETDATE()) / 5), 1) * 5 ),'09')) AS TIMESTAMP) as start)
select  
  ROW_NUMBER() OVER(ORDER BY d.num DESC, h.num DESC, p.num DESC) as idx
  , DATEADD(minutes, -p.num * 5, DATEADD( hours, -h.num, DATEADD( days, -d.num, n.start ) ) ) AS period_date
from days d, hours h, periods p, rightnow n

应该能够将其扩展到其他发电方案。这里的技巧是使用笛卡尔积连接（即没有 JOIN/WHERE 子句）来乘以手工制作的 CTE，以产生必要的增量并应用于锚定日期。

Answer 6

Redshift 的generate_series() 函数是仅限领导节点的函数，因此您不能将其用于计算节点上的下游处理。这可以用递归 CTE 代替（或在数据库中保留“日期”表）。我在最近的回答中有一个这样的例子：

将 Redshift 与日期序列交叉连接

我想在这样的答案中给出的一个警告是，在处理非常大的表时要小心不等式连接（或交叉连接或任何不合格的连接），这在 Redshift 中经常发生。如果您要加入一个中等大小的 Redshift 表（例如 100 万行），那么一切都会好起来的。但是，如果您在 1B 行的表上执行此操作，那么随着查询溢出到磁盘，数据爆炸可能会导致严重的性能问题。

我写了几篇白皮书，介绍如何以数据空间敏感的方式编写此类查询。这种大量中间结果的问题并不是 Redshift 独有的，我首先开发了解决客户的 HIVE 查询问题的方法。 “为大数据编写 SQL 的第一条规则 - 不要制造更多”

Answer 7

根据

@Ryan Tuck 和 @Slobodan Pejic 的评论，加入另一个表时，generate_series()

 在 Redshift 上不起作用。

我使用的解决方法是写出查询中系列中的每个值：

SELECT
'2019-01-01'::date AS date_month
UNION ALL
SELECT
'2019-02-01'::date AS date_month

使用这样的Python函数：

import arrow

def generate_date_series(start, end):
    start = arrow.get(start)
    end = arrow.get(end)

    months = list(
        f"SELECT '{month.format('YYYY-MM-DD')}'::date AS date_month"
        for month in arrow.Arrow.range('month', start, end)
    )

    return "\nUNION ALL\n".join(months)

Answer 8

也许不如其他解决方案那么优雅，但我是这样做的：

drop table if exists #dates;
create temporary table #dates as
with recursive cte(val_date) as
    (select 
        cast('2020-07-01' as date) as val_date
    union all
    select
        cast(dateadd(day, 1, val_date) as date) as val_date
    from 
        cte
    where 
        val_date <= getdate()
    )
select 
    val_date as yyyymmdd
from
    cte
order by
    val_date
;

Answer 9

对于五分钟的时间段，我会执行以下操作：

select date_trunc('minute', getdate()) - (i || ' minutes')::interval
from generate_series(0, 60*5-1, 5) as i

您可以将 5 替换为任何给定的间隔，将 60 替换为您想要的行数。

Answer 10

一个反最佳实践的解决方案，但对于临时数据生成任务很有用。您可以使用一些现有的表并获取 row_number()

选择 row_number() over() 作为 i 来自 some_table 限制10个

Answer 11

SELECT CURRENT_DATE::TIMESTAMP - (i * interval '1 day') as date_datetime 
FROM generate_series(1,(select datediff(day,'01-Jan-2021',now()::date))) i 
ORDER BY 1

在redshift中使用sql函数generate_series()

问题描述投票：0回答：11

11个回答

最新问题

在redshift中使用sql函数generate_series()

问题描述 投票：0回答：11

11个回答

最新问题

问题描述投票：0回答：11