我正在尝试从 json 文件解析架构,该文件包含 bigquery 表的架构文件。 我正在使用以下代码。
import apache_beam as beam
from apache_beam.io.gcp.bigquery import parse_table_schema_from_json
import json
schema_data = json.dumps(json.load(open("/content/sample_data/schema.json")))
table_schema = parse_table_schema_from_json(schema_data)
print(table_schema)
但是它抛出以下错误。
module 'apache_beam.io.gcp.internal.clients.bigquery' has no attribute 'TableFieldSchema'
我已经使用安装了 gcp 特定的 apache beam 库 pip 安装 apache-beam[gcp]。
任何人都可以帮忙或知道如何解决吗?
我的架构:
它包含嵌套和重复的列。
{
"fields": [
{
"fields": [
{
"mode": "NULLABLE",
"name": "second",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "minute",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "hour",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "timeZoneId",
"type": "STRING"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "month",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "day",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "year",
"type": "INTEGER"
}
],
"mode": "NULLABLE",
"name": "date",
"type": "RECORD"
}
],
"mode": "NULLABLE",
"name": "lastModifiedDateTime",
"type": "RECORD"
},
{
"mode": "REPEATED",
"name": "companionCreativeIds",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "masterCreativeId",
"type": "INTEGER"
},
{
"mode": "NULLABLE",
"name": "name",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "id",
"type": "INTEGER"
},
{
"fields": [
{
"mode": "NULLABLE",
"name": "name",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "value",
"type": "STRING"
}
],
"mode": "REPEATED",
"name": "soft_error_fields",
"type": "RECORD"
},
{
"mode": "NULLABLE",
"name": "dw_ingest_time",
"type": "TIMESTAMP"
},
{
"mode": "NULLABLE",
"name": "dw_partition_date",
"type": "DATE"
},
{
"mode": "NULLABLE",
"name": "dw_publish_time",
"type": "TIMESTAMP"
},
{
"mode": "NULLABLE",
"name": "dw_source_object_name",
"type": "STRING"
},
{
"mode": "NULLABLE",
"name": "dw_batch_id",
"type": "STRING"
}
]
}
############忽略###### Lorem Ipsum 是印刷和排版行业的简单虚拟文本。自 1500 年代以来,Lorem Ipsum 一直是行业标准的虚拟文本,当时一位不知名的印刷商拿走了一堆字体并将其打乱以制作一本字体样本簿。它不仅经历了五个世纪的考验,而且跨越了电子排版的时代,基本保持不变。它在 20 世纪 60 年代随着包含 Lorem Ipsum 段落的 Letraset 表的发布而流行起来,最近又随着包含 Lorem Ipsum 版本的 Aldus PageMaker 等桌面出版软件而流行。 ##############################
我使用以下 Json 架构进行了测试:
schema.json
文件
{
"fields": [
{
"name": "r",
"type": "RECORD",
"mode": "REQUIRED",
"description": "r description",
"fields": [
{
"name": "s",
"type": "STRING",
"mode": "NULLABLE",
"description": "s description"
},
{
"name": "n",
"type": "INTEGER",
"mode": "REQUIRED",
"description": "n description"
}
]
}
]
}
我检查了这个class
的格式我使用了单元测试,它按预期正常工作:
from apache_beam.io.gcp.bigquery_tools import parse_table_schema_from_json
from apache_beam.testing.test_pipeline import TestPipeline
def test_pipeline(self):
with TestPipeline() as p:
schema_file = f'{ROOT_DIR}/tests/schema.json'
schema_data = json.dumps(json.load(open(schema_file)))
table_schema = parse_table_schema_from_json(schema_data)
有 2 件事可以帮助你:
parse_table_schema_from_json
方法使用了不同的导入,因为另一个现已弃用Beam
版本2.37.0
测试了这段代码,我遇到了和你一样的问题Beam
最后一个版本2.47.0
测试了相同的代码,它工作正常请使用相同的导入并将您的
Beam
版本升级到 2.47.0
。
对我来说,它通过安装 GCP 特定的 apache beam ==> pip install apache-beam[gcp] 来工作