我有以下代码
以前我有一个增量表,其中 my_path
´ 中有
180列,我选择一列并尝试覆盖
columns_to_select = ["one_column"]
df_one_column = df.select(*columns_to_select)
df_one_column.write.format("delta").mode("overwrite").option("mergeSchema", "true").save(my_path)
new_schema = spark.read.format("delta").load(my_path).schema
target_column = [field.name for field in new_schema.fields]
print(len(target_column)) # return 180
我期望返回 1,因为我只从数据框中选择一列,但返回了 180 列
写作时需要使用
option("overwriteSchema", "True")
这是示例示例
df.write.format("delta").mode("overwrite").save(my_path)
df_first = spark.read.format("delta").load(my_path)
print(df_first.columns, len(df_first.columns))
columns_to_select = ["firstname"]
df_one_column = df.select(*columns_to_select)
df_one_column.write.format("delta").mode("overwrite").option("overwriteSchema", "True").option("mergeSchema", "true").save(my_path)
df_second = spark.read.format("delta").load(my_path)
print(df_second.columns, len(df_second.columns))
请参阅以下链接了解更多信息
mergeSchema:https://delta.io/blog/2023-02-08-delta-lake-schema-evolution/