GCP BigQuery从桶中加载数据

问题描述 投票:1回答:1

我真的是Google云平台的新生。我正在尝试使用从桶对象CSV文件中提取的数据填充BigQuery表。我创建了一个Python测试脚本来创建和填充表。创建完成但是当我运行文件时,我就陷入了困境。

我的剧本:

from google.cloud import bigquery
from google.cloud.bigquery import LoadJobConfig
from google.cloud.bigquery import SchemaField

client = bigquery.Client()
dataset_ref = client.dataset('datasetname')


## Create the table

schema = [
    bigquery.SchemaField('start_date', 'DATETIME', mode='NULLABLE'),
    bigquery.SchemaField('start_station_code', 'INTEGER', mode='NULLABLE'),
    bigquery.SchemaField('end_date', 'DATETIME', mode='NULLABLE'),
    bigquery.SchemaField('end_station_code', 'INTEGER', mode='NULLABLE'),
    bigquery.SchemaField('duration_sec', 'INTEGER', mode='NULLABLE'),
    bigquery.SchemaField('is_member', 'INTEGER', mode='NULLABLE')
]
table_ref = dataset_ref.table('tablename')
table = bigquery.Table(table_ref, schema=schema)
table = client.create_table(table)  # API request

## Loading data


SCHEMA = [
    SchemaField('start_date', 'DATETIME', mode='NULLABLE'),
    SchemaField('start_station_code', 'INTEGER', mode='NULLABLE'),
    SchemaField('end_date', 'DATETIME', mode='NULLABLE'),
    SchemaField('end_station_code', 'INTEGER', mode='NULLABLE'),
    SchemaField('duration_sec', 'INTEGER', mode='NULLABLE'),
    SchemaField('is_member', 'INTEGER', mode='NULLABLE')
]
#table_ref = client.dataset('dataset_name').table('table_name')

load_config = LoadJobConfig()
load_config.skip_leading_rows = 1
load_config.schema = SCHEMA
uri = 'gs://gcp-development/object.csv'

load_job = client.load_table_from_uri(
    uri,
    table_ref,
    job_config=load_config)

load_job.result()

destination_table = client.get_table(table_ref)
print('Loaded {} rows.'.format(destination_table.num_rows))

根据documentation,这似乎是正确的。但是,我收到以下错误,我不明白,我不知道如何查看日志以获取更多详细信息。

错误:

google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: CSV table encountered too many errors, giving up. Rows: 1; errors: 1. Please look into the error stream for more details.

错误流在哪里?我试过了:

GET https://www.googleapis.com/bigquery/v2/projects/projectId/queries/jobId 

跟随troubleshooting documentation,但我没有找到任何东西。

谢谢你的帮助。

python-3.x google-cloud-platform google-bigquery google-cloud-storage
1个回答
1
投票

我可以使用您的脚本加载数据,完全没有问题。您可以通过在第一个左侧菜单中选择BigQuery来检查Logs Viewer中的完整错误消息。这可能与关于日期时间字段的解析错误有关。您可以在this document中找到有关如何使用日志查看器的更多信息

我使用的数据样本如下:

start_date,start_station_code,end_date,end_station_code,duration_sec,is_member
0001-01-01 00:00:00.000000,1,9999-12-31 23:59:59.999999,2,3,4
0001-01-01 00:00:00.000000,2,9999-12-31 23:59:59.999999,3,4,5
0001-01-01 00:00:00.000000,3,9999-12-31 23:59:59.999999,4,5,6
0001-01-01 00:00:00.000000,4,9999-12-31 23:59:59.999999,5,6,7
0001-01-01 00:00:00.000000,5,9999-12-31 23:59:59.999999,6,7,8
0001-01-01 00:00:00.000000,6,9999-12-31 23:59:59.999999,7,8,9
0001-01-01 00:00:00.000000,7,9999-12-31 23:59:59.999999,8,9,10
0001-01-01 00:00:00.000000,8,9999-12-31 23:59:59.999999,9,10,11
0001-01-01 00:00:00.000000,9,9999-12-31 23:59:59.999999,10,11,12
0001-01-01 00:00:00.000000,10,9999-12-31 23:59:59.999999,11,12,13
© www.soinside.com 2019 - 2024. All rights reserved.