我正在尝试对两个冰山表应用联合,这两个冰山表是通过 pyspark 中的时间旅行获取的。
这是我尝试过的代码:
union_query = f"""
SELECT * FROM {table_name} FOR SYSTEM_TIME AS OF TIMESTAMP '{initialdate}' LIMIT 1000
UNION ALL
SELECT * FROM {table_name} FOR SYSTEM_TIME AS OF TIMESTAMP '{lastdate}' LIMIT 1000
"""
uniondf = spark.sql(union_query)
但它抛出以下错误:
[PARSE_SYNTAX_ERROR] Syntax error at or near 'UNION'.(line 3, pos 12)
== SQL ==
SELECT * FROM glue.def.hugedata FOR SYSTEM_TIME AS OF TIMESTAMP '2024-10-08T09:06:51.932' LIMIT 1000
UNION ALL
------------^^^
SELECT * FROM glue.def.hugedata FOR SYSTEM_TIME AS OF TIMESTAMP '2024-11-05T13:16:44' LIMIT 1000
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(parsers.scala:257)
at org.apache.spark.sql.catalyst.parser.AbstractParser.parse(parsers.scala:98)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:54)
注意: 上述查询是否有可能起作用,因为我的要求必须是这两个冰山表应该通过时间旅行获取,并且必须在单个查询中连接。
UNION ALL 语法不正确:
缺少分号:确保每个 SELECT 语句都以分号终止。 更正的查询:
SQL
SELECT * FROM glue.def.hugedata FOR SYSTEM_TIME AS OF TIMESTAMP '2024-10-08T09:06:51.932' LIMIT 1000;
UNION ALL
SELECT * FROM glue.def.hugedata FOR SYSTEM_TIME AS OF TIMESTAMP '2024-11-05T13:16:44' LIMIT 1000;