模块“apache_beam.io.gcp.internal.clients.bigquery”没有属性“TableFieldSchema”?

问题描述 投票:0回答:2

我正在尝试从 json 文件解析架构,该文件包含 bigquery 表的架构文件。 我正在使用以下代码。

import apache_beam as beam
from apache_beam.io.gcp.bigquery import parse_table_schema_from_json
import json

schema_data = json.dumps(json.load(open("/content/sample_data/schema.json")))
table_schema = parse_table_schema_from_json(schema_data)
print(table_schema)

但是它抛出以下错误。

module 'apache_beam.io.gcp.internal.clients.bigquery' has no attribute 'TableFieldSchema'

我已经使用安装了 gcp 特定的 apache beam 库 pip 安装 apache-beam[gcp]。

任何人都可以帮忙或知道如何解决吗?

我的架构:

它包含嵌套和重复的列。

{
  "fields": [
    {
      "fields": [
        {
          "mode": "NULLABLE",
          "name": "second",
          "type": "INTEGER"
        },
        {
          "mode": "NULLABLE",
          "name": "minute",
          "type": "INTEGER"
        },
        {
          "mode": "NULLABLE",
          "name": "hour",
          "type": "INTEGER"
        },
        {
          "mode": "NULLABLE",
          "name": "timeZoneId",
          "type": "STRING"
        },
        {
          "fields": [
            {
              "mode": "NULLABLE",
              "name": "month",
              "type": "INTEGER"
            },
            {
              "mode": "NULLABLE",
              "name": "day",
              "type": "INTEGER"
            },
            {
              "mode": "NULLABLE",
              "name": "year",
              "type": "INTEGER"
            }
          ],
          "mode": "NULLABLE",
          "name": "date",
          "type": "RECORD"
        }
      ],
      "mode": "NULLABLE",
      "name": "lastModifiedDateTime",
      "type": "RECORD"
    },
    {
      "mode": "REPEATED",
      "name": "companionCreativeIds",
      "type": "INTEGER"
    },
    {
      "mode": "NULLABLE",
      "name": "masterCreativeId",
      "type": "INTEGER"
    },
    {
      "mode": "NULLABLE",
      "name": "name",
      "type": "STRING"
    },
    {
      "mode": "NULLABLE",
      "name": "id",
      "type": "INTEGER"
    },
    {
      "fields": [
        {
          "mode": "NULLABLE",
          "name": "name",
          "type": "STRING"
        },
        {
          "mode": "NULLABLE",
          "name": "value",
          "type": "STRING"
        }
      ],
      "mode": "REPEATED",
      "name": "soft_error_fields",
      "type": "RECORD"
    },
    {
      "mode": "NULLABLE",
      "name": "dw_ingest_time",
      "type": "TIMESTAMP"
    },
    {
      "mode": "NULLABLE",
      "name": "dw_partition_date",
      "type": "DATE"
    },
    {
      "mode": "NULLABLE",
      "name": "dw_publish_time",
      "type": "TIMESTAMP"
    },
    {
      "mode": "NULLABLE",
      "name": "dw_source_object_name",
      "type": "STRING"
    },
    {
      "mode": "NULLABLE",
      "name": "dw_batch_id",
      "type": "STRING"
    }
  ]
}

############忽略###### Lorem Ipsum 是印刷和排版行业的简单虚拟文本。自 1500 年代以来,Lorem Ipsum 一直是行业标准的虚拟文本,当时一位不知名的印刷商拿走了一堆字体并将其打乱以制作一本字体样本簿。它不仅经历了五个世纪的考验,而且跨越了电子排版的时代,基本保持不变。它在 20 世纪 60 年代随着包含 Lorem Ipsum 段落的 Letraset 表的发布而流行起来,最近又随着包含 Lorem Ipsum 版本的 Aldus PageMaker 等桌面出版软件而流行。 ##############################

python google-cloud-platform google-bigquery google-cloud-dataflow apache-beam
2个回答
0
投票

我使用以下 Json 架构进行了测试:

schema.json
文件

{
  "fields": [
    {
      "name": "r",
      "type": "RECORD",
      "mode": "REQUIRED",
      "description": "r description",
      "fields": [
        {
          "name": "s",
          "type": "STRING",
          "mode": "NULLABLE",
          "description": "s description"
        },
        {
          "name": "n",
          "type": "INTEGER",
          "mode": "REQUIRED",
          "description": "n description"
        }
      ]
    }
  ]
}

我检查了这个class

的格式

我使用了单元测试,它按预期正常工作:

from apache_beam.io.gcp.bigquery_tools import parse_table_schema_from_json
from apache_beam.testing.test_pipeline import TestPipeline

  def test_pipeline(self):
      with TestPipeline() as p:
          schema_file = f'{ROOT_DIR}/tests/schema.json'

          schema_data = json.dumps(json.load(open(schema_file)))
          table_schema = parse_table_schema_from_json(schema_data)

有 2 件事可以帮助你:

  • 我对
    parse_table_schema_from_json
    方法使用了不同的导入,因为另一个现已弃用
  • 我首先用旧的
    Beam
    版本
    2.37.0
    测试了这段代码,我遇到了和你一样的问题
  • 然后我用当前的
    Beam
    最后一个版本
    2.47.0
    测试了相同的代码,它工作正常

请使用相同的导入并将您的

Beam
版本升级到
2.47.0


0
投票

对我来说,它通过安装 GCP 特定的 apache beam ==> pip install apache-beam[gcp] 来工作

© www.soinside.com 2019 - 2024. All rights reserved.