我正在运行一个循环,该循环将运行一些进程创建一个数据框,然后将数据框添加到大查询表。但是当我在现有的表上追加数据时,我得到了一个错误。请验证DataFrame中的结构和数据类型是否与目标表的模式相匹配。
from pandas.io import gbq
import pandas as pd
import numpy as np
import datetime as dt
from datalab.context import Context
import time
for id_name in ID_:
df= ['id_recip','length_data','length_action', 'daily_mail_freq', 'weekly_mail_frequency', 'imp_hour', 'imp_day']
columns = list(df)
data=[]
values = [id_name,length_data,length_action, daily_mail, weekly_mail, imp_hour, imp_day]
zipped = zip(columns, values)
a_dictionary = dict(zipped)
print(a_dictionary)
final_output=pd.DataFrame(a_dictionary)
final_output = final_output.astype(str)
final_output.info()
final_output.to_gbq('internal.frequency_output3',
Context.default().project_id,
if_exists='append')
为了避免数据类型不匹配,我将数据框架中的所有数据都转换为字符串。在第一个循环中,如果表不存在,就会被创建。
Structure in bigquery table
daily_mail_freq STRING NULLABLE
id_recip STRING NULLABLE
imp_day STRING NULLABLE
imp_hour STRING NULLABLE
length_action STRING NULLABLE
length_data STRING NULLABLE
weekly_mail_frequency STRING NULLABLE
表中没有日期
一种方法是使用google.cloud bigquery.在这种情况下,它改变为sql语句并推送数据,而不是使用数据框架。
def export_items_to_bigquery(daily_mail_freq,id_recip,imp_day,imp_hour,length_action,length_data,weekly_mail_frequency ):
# Instantiates a client
client = bigquery.Client()
bigquery_client = bigquery.Client()
# Prepares a reference to the dataset
dataset_ref = bigquery_client.dataset('dbn')
table_ref = dataset_ref.table('fqo')
table = bigquery_client.get_table(table_ref)
rows_to_insert = [
(daily_mail_freq , id_recip, imp_day, imp_hour, length_action , length_data, weekly_mail_frequency)]
errors = bigquery_client.insert_rows(table, rows_to_insert) # API request
assert errors == []
现在在循环中只需将数据传递给函数