尝试使用存储 API 中的 JsonStreamWriter() 将 JSON 列写入 BigQuery JSON 列,但失败并显示:
com.google.cloud.bigquery.storage.v1.Exceptions$AppendSerializationError: INVALID_ARGUMENT: Append serialization failed :
**{0=Field root.Context failed to convert to JSON. Error: JSONObject does not have a string field at root.Context.}**
root.Context 本身就是一个 JSON 字段,我喜欢直接存储在 BQ JSON 列中。当我使用 JSONStreamWriter() 时,完整的有效负载是 JSON,因此“Context”JSON 字段嵌套在 JSON 有效负载内,这可能是问题所在?是否可以使用 JSONStreamWriter() 来实现此目的,或者我做错了什么?
我的 JSON 格式的有效负载如下所示:
[
{
"Context": {
"UserAgent": "Mozilla/5.0 (X11; U; Linux x86_64; el-GR; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10",
"Ip": "<REDACTED>"
},
"Channel": "browser",
"STREAMER_DEPLOYMENT_ID": "<REDACTED>",
"STREAMER_THREAD_NUMBER": 1,
"Timestamp": "2024-10-01T11:57:37.567002Z",
"SentAt": "2024-10-01T11:57:37.566997Z",
"STREAMER_RUN_ID": "<REDACTED>",
"Type": "identify",
"Version": "1.1",
"STREAMER_CLIENT_TIME_MS": 1727783857698,
"STREAMER_BATCH_NUMBER": 0,
"AnonymousId": "<REDACTED>",
"STREAMER_INSTANCE_NUMBER": 0,
"STREAMER_OFFSET": 0,
"ReceivedAt": "2024-10-01T11:57:37.566945Z",
"MessageId": "<REDACTED>"
}
]
转换为 protobuf(仅在我使用 JSONStreamWriter 为我进行转换时进行调试):
message SomeMessage {
message _context {
string _user_agent = 1;
string _ip = 2;
}
message Nested {
_context _context = 1;
string _channel = 2;
string _s_t_r_e_a_m_e_r__d_e_p_l_o_y_m_e_n_t__i_d = 3;
uint32 _s_t_r_e_a_m_e_r__t_h_r_e_a_d__n_u_m_b_e_r = 4;
google.protobuf.Timestamp _timestamp = 5;
google.protobuf.Timestamp _sent_at = 6;
string _s_t_r_e_a_m_e_r__r_u_n__i_d = 7;
string _type = 8;
string _version = 9;
uint64 _s_t_r_e_a_m_e_r__c_l_i_e_n_t__t_i_m_e__m_s = 10;
uint32 _s_t_r_e_a_m_e_r__b_a_t_c_h__n_u_m_b_e_r = 11;
string _anonymous_id = 12;
uint32 _s_t_r_e_a_m_e_r__i_n_s_t_a_n_c_e__n_u_m_b_e_r = 13;
uint32 _s_t_r_e_a_m_e_r__o_f_f_s_e_t = 14;
google.protobuf.Timestamp _received_at = 15;
string _message_id = 16;
}
repeated Nested items = 1;
}
最后我的桌子看起来像:
CREATE TABLE TEST.EVENTS (
STREAMER_THREAD_NUMBER INT,
MESSAGEID STRING(255),
STREAMER_BATCH_NUMBER INT,
STREAMER_INSTANCE_NUMBER INT,
STREAMER_OFFSET BIGINT,
STREAMER_CLIENT_TIME_MS INT,
CONTEXT JSON,
ANONYMOUSID STRING(255),
CHANNEL STRING(255),
RECEIVEDAT STRING(255),
TIMESTAMP STRING(255),
SENTAT STRING(255),
USERID STRING(255),
VERSION STRING(255),
STREAMER_RUN_ID STRING(255),
STREAMER_DEPLOYMENT_ID STRING(255),
TRAITS JSON,
TYPE STRING(255),
INTEGRATIONS JSON,
PROPERTIES JSON,
EVENT STRING(255),
ORIGINALTIMESTAMP STRING(255)
);
感谢大家的帮助!
这是因为 JSONStreamWriter 旨在将单个 JSON 对象序列化为流,而不是嵌套对象。您的 root.Context 是一个嵌套对象,导致 JSONStreamWriter 失败。
您可能可以先尝试展平 JSON 数组,这样可以确保每个对象在 BigQuery 中被视为单独的行。