AttributeError：“RuntimeValueProvider”对象没有属性“projectId”

Question

我正在尝试在 Dataflow runner 中运行 apache beam pipeline；该作业从 bigquery 表中读取数据并将数据写入数据库。

我正在数据流中使用经典模板选项运行作业 - 意味着首先我必须暂存管道，然后使用适当的参数运行它。

我的管道选项如下

options = PipelineOptions()
    options.view_as(SetupOptions).save_main_session = True
    importer_options = options.view_as(ImporterOptions)
    google_options = options.view_as(GoogleCloudOptions)
    with beam.Pipeline(options=options) as p:
        p | 'BigQuery Read' >> beam.io.ReadFromBigQuery(
            table=importer_options.input_table)

ImportOptions 当前接受 input_table 作为参数。

parser.add_value_provider_argument('--input-table',
                                           help='The bigquery input table in the format dataset.table_name')

但是运行管道会引发如下错误

文件 “/usr/local/lib/python3.8/site-packages/apache_beam/io/gcp/bigquery.py”，第 791 行，如果不是 self.table_reference.projectId，则在 split 中： AttributeError：“RuntimeValueProvider”对象没有属性 '项目ID'

任何人都知道我在这里缺少什么。

我正在使用以下命令构建模板。

python -m main
--runner DataflowRunner
--项目测试项目
--region=欧洲-west1
--staging_location gs://test/staging_python
--temp_location gs://测试/测试
--template_location gs://test/templates_python/test \

注意 - 我尝试通过针对 input_table 提供完全限定的表名称（意味着包括项目 ID）来运行管道，但这也没有帮助。

Answer 1

我们面临着同样的问题，这是自 2020 年底发布的 2.26.0 版本以来的一个错误。

创建错误报告：https://issues.apache.org/jira/browse/BEAM-12514

现在可以使用拉取请求：https://github.com/apache/beam/pull/15040

希望在下一个版本（2.31.0）中修复它。

Answer 2

您应该使用flex模板，它不需要价值提供者，并取消了其他限制。

AttributeError：“RuntimeValueProvider”对象没有属性“projectId”

问题描述投票：0回答：2

2个回答

最新问题

AttributeError：“RuntimeValueProvider”对象没有属性“projectId”

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2