通过时间旅行在 Apache 冰山表中联合

问题描述 投票:0回答:1

我正在尝试对两个冰山表应用联合,这两个冰山表是通过 pyspark 中的时间旅行获取的。

这是我尝试过的代码:

union_query = f"""
    SELECT * FROM {table_name} FOR SYSTEM_TIME AS OF TIMESTAMP '{initialdate}' LIMIT 1000
    UNION ALL
    SELECT * FROM {table_name} FOR SYSTEM_TIME AS OF TIMESTAMP '{lastdate}' LIMIT 1000
"""
uniondf = spark.sql(union_query)

但它抛出以下错误:

[PARSE_SYNTAX_ERROR] Syntax error at or near 'UNION'.(line 3, pos 12)

== SQL ==
    SELECT * FROM glue.def.hugedata FOR SYSTEM_TIME AS OF TIMESTAMP '2024-10-08T09:06:51.932' LIMIT 1000
    UNION ALL
------------^^^
    SELECT * FROM glue.def.hugedata FOR SYSTEM_TIME AS OF TIMESTAMP '2024-11-05T13:16:44' LIMIT 1000

at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:257)
at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:98)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:54)

注意: 上述查询是否有可能起作用,因为我的要求必须是这两个冰山表应该通过时间旅行获取,并且必须在单个查询中连接。

apache-spark pyspark apache-spark-sql apache-iceberg
1个回答
0
投票

UNION ALL 语法不正确:

缺少分号:确保每个 SELECT 语句都以分号终止。 更正的查询:

SQL
SELECT * FROM glue.def.hugedata FOR SYSTEM_TIME AS OF TIMESTAMP '2024-10-08T09:06:51.932' LIMIT 1000;
UNION ALL
SELECT * FROM glue.def.hugedata FOR SYSTEM_TIME AS OF TIMESTAMP '2024-11-05T13:16:44' LIMIT 1000;
© www.soinside.com 2019 - 2024. All rights reserved.