在 Delta Lake 历史表上写 WHERE 子句

问题描述 投票:0回答:2

我正在尝试按照以下链接中的描述查询 Delta Lake 表历史

https://learn.microsoft.com/en-us/azure/databricks/delta/history

当我如下描述delta表时

describe history '/mnt/lake/BASE/SQLClassification/cdcTest/dbo/cdcmergetest/1'

我得到下表输出

版本 时间戳 用户名 用户名 操作 操作参数 工作 笔记本 clusterId 阅读版本 隔离级别 是盲追 操作指标 用户元数据 引擎信息
"2 18/03/2023 12:25:54.0000000 615257000000000 [email protected] 合并 {""predicate"":""(s.primary_key_hash = t.primary_key_hash)"",""matchedPredicates"":""[{""predicate"":""(NOT (s.change_key_hash = t.change_key_hash ))"",""actionType"":""更新""}]"",""notMatchedPredicates"":"""[{""actionType"":""插入""}]"",""notMatchedBySourcePredicates "":""[{""actionType"":""删除""}]""} (空) {""notebookId"":""3807690121522291""} 0318-105603-oyrrx3xc 1 可序列化 {""numTargetRowsCopied"":""0"",""numTargetRowsDeleted"":""1"",""numTargetFilesAdded"":""1"",""numTargetBytesAdded"":""9070", ""numTargetBytesRemoved"":""9176"",""numTargetDeletionVectorsAdded"":""0"",""numTargetRowsMatchedUpdated"":""27"",""executionTimeMs"":""13999"","" numTargetRowsInserted"":""0"",""numTargetRowsMatchedDeleted"":""0"",""scanTimeMs"":""4276"",""numTargetRowsUpdated"":""27"",""numOutputRows" ":""27"",""numTargetDeletionVectorsRemoved"":""0"",""numTargetRowsNotMatchedBySourceUpdated"":""0"",""numTargetChangeFilesAdded"":""0"",""numSourceRows"": ""27"",""numTargetFilesRemoved"":""1"",""numTargetRowsNotMatchedBySourceDeleted"":""1"",""rewriteTimeMs"":""9012""} (空) Databricks-Runtime/12.2.x-scala2.12"
"1 18/03/2023 12:14:43.0000000 615257000000000 [email protected] 合并 {""predicate"":""(s.primary_key_hash = t.primary_key_hash)"",""matchedPredicates"":""[{""predicate"":""(NOT (s.change_key_hash = t.change_key_hash ))"",""actionType"":""更新""}]"",""notMatchedPredicates"":"""[{""actionType"":""插入""}]"",""notMatchedBySourcePredicates "":""[{""actionType"":""删除""}]""} (空) {""notebookId"":""3807690121522291""} 0318-105603-oyrrx3xc 0 可序列化 {""numTargetRowsCopied"":""0"",""numTargetRowsDeleted"":""0"",""numTargetFilesAdded"":""1"",""numTargetBytesAdded"":""9176", ""numTargetBytesRemoved"":""0"",""numTargetDeletionVectorsAdded"":""0"",""numTargetRowsMatchedUpdated"":""0"",""executionTimeMs"":""6222"","" numTargetRowsInserted"":""28"",""numTargetRowsMatchedDeleted"":""0"",""scanTimeMs"":""2280"",""numTargetRowsUpdated"":""0"",""numOutputRows" ":""28"",""numTargetDeletionVectorsRemoved"":""0"",""numTargetRowsNotMatchedBySourceUpdated"":""0"",""numTargetChangeFilesAdded"":""0"",""numSourceRows"": ""28"",""numTargetFilesRemoved"":""0"",""numTargetRowsNotMatchedBySourceDeleted"":""0"",""rewriteTimeMs"":""3593""} (空) Databricks-Runtime/12.2.x-scala2.12"
"0 18/03/2023 12:14:23.0000000 615257000000000 [email protected] 创建或替换表 {""isManaged"":""false"",""description"":null,""partitionBy"":""[]"",""properties"":""{}""} (空) {""notebookId"":""3807690121522291""} 0318-105603-oyrrx3xc (空) 可序列化 真实 {} (空) Databricks-Runtime/12.2.x-scala2.12"

我已经为路径分配了以下变量

saveloc = '/mnt/lake/BASE/SQLClassification/cdcTest/dbo/cdcmergetest/1'

正如您从上面的历史输出中看到的那样,有一个名为 versionoperationParameters

的字段

通过以下代码很容易从历史表中获取最新版本:

df4 = spark.read.option("versionAsof", 3).load(saveloc)

有多种获取最新版本的方法,例如:

df5 = spark.read.load("/mnt/lake/BASE/SQLClassification/cdcTest/dbo/cdcmergetest/1@v3")

Or

df6 = df5 = spark.read.load(saveloc+"@v3")

Or in SQL it would be something similar to:

SELECT * FROM saveloc@v3

有人可以告诉我是否可以在版本字段上写一个 WHERE 子句,例如

Select * From saveloc
where version > 2
databricks azure-databricks delta-lake delta
2个回答
0
投票
select * from deltatable version as of 9

0
投票

这是不可能的。假设您有一个包含 5 个版本的表。如果您使用像

这样的查询
Select * From saveloc
where version > 2

您希望看到哪个版本,3、4 或 5? 您需要指定一个版本。

© www.soinside.com 2019 - 2024. All rights reserved.