JsonStreamWriter() 失败,并显示 {0=Field root.Context 无法转换为 JSON。错误:JSONObject 在 root.Context 处没有字符串字段

问题描述 投票:0回答:1

尝试使用存储 API 中的 JsonStreamWriter() 将 JSON 列写入 BigQuery JSON 列,但失败并显示:

com.google.cloud.bigquery.storage.v1.Exceptions$AppendSerializationError: INVALID_ARGUMENT: Append serialization failed : 
**{0=Field root.Context failed to convert to JSON. Error: JSONObject does not have a string field at root.Context.}**

root.Context 本身就是一个 JSON 字段,我喜欢直接存储在 BQ JSON 列中。当我使用 JSONStreamWriter() 时,完整的有效负载是 JSON,因此“Context”JSON 字段嵌套在 JSON 有效负载内,这可能是问题所在?是否可以使用 JSONStreamWriter() 来实现此目的,或者我做错了什么?

我的 JSON 格式的有效负载如下所示:

[
    {
        "Context": {
            "UserAgent": "Mozilla/5.0 (X11; U; Linux x86_64; el-GR; rv:1.9.2.10) Gecko/20100922 Ubuntu/10.10 (maverick) Firefox/3.6.10",
            "Ip": "<REDACTED>"
        },
        "Channel": "browser",
        "STREAMER_DEPLOYMENT_ID": "<REDACTED>",
        "STREAMER_THREAD_NUMBER": 1,
        "Timestamp": "2024-10-01T11:57:37.567002Z",
        "SentAt": "2024-10-01T11:57:37.566997Z",
        "STREAMER_RUN_ID": "<REDACTED>",
        "Type": "identify",
        "Version": "1.1",
        "STREAMER_CLIENT_TIME_MS": 1727783857698,
        "STREAMER_BATCH_NUMBER": 0,
        "AnonymousId": "<REDACTED>",
        "STREAMER_INSTANCE_NUMBER": 0,
        "STREAMER_OFFSET": 0,
        "ReceivedAt": "2024-10-01T11:57:37.566945Z",
        "MessageId": "<REDACTED>"
    }
] 

转换为 protobuf(仅在我使用 JSONStreamWriter 为我进行转换时进行调试):

message SomeMessage {

    message _context {
        string _user_agent = 1;
        string _ip = 2;
    }

    message Nested {
        _context _context = 1;
        string _channel = 2;
        string _s_t_r_e_a_m_e_r__d_e_p_l_o_y_m_e_n_t__i_d = 3;
        uint32 _s_t_r_e_a_m_e_r__t_h_r_e_a_d__n_u_m_b_e_r = 4;
        google.protobuf.Timestamp _timestamp = 5;
        google.protobuf.Timestamp _sent_at = 6;
        string _s_t_r_e_a_m_e_r__r_u_n__i_d = 7;
        string _type = 8;
        string _version = 9;
        uint64 _s_t_r_e_a_m_e_r__c_l_i_e_n_t__t_i_m_e__m_s = 10;
        uint32 _s_t_r_e_a_m_e_r__b_a_t_c_h__n_u_m_b_e_r = 11;
        string _anonymous_id = 12;
        uint32 _s_t_r_e_a_m_e_r__i_n_s_t_a_n_c_e__n_u_m_b_e_r = 13;
        uint32 _s_t_r_e_a_m_e_r__o_f_f_s_e_t = 14;
        google.protobuf.Timestamp _received_at = 15;
        string _message_id = 16;
    }

    repeated Nested items = 1;
}

最后我的桌子看起来像:

CREATE TABLE TEST.EVENTS (
    STREAMER_THREAD_NUMBER INT,
    MESSAGEID STRING(255),
    STREAMER_BATCH_NUMBER INT,
    STREAMER_INSTANCE_NUMBER INT,
    STREAMER_OFFSET BIGINT,
    STREAMER_CLIENT_TIME_MS INT,
    CONTEXT JSON,
    ANONYMOUSID STRING(255),
    CHANNEL STRING(255),
    RECEIVEDAT STRING(255),
    TIMESTAMP STRING(255),
    SENTAT STRING(255),
    USERID STRING(255),
    VERSION STRING(255),
    STREAMER_RUN_ID STRING(255),
    STREAMER_DEPLOYMENT_ID STRING(255),
    TRAITS JSON,
    TYPE STRING(255),
    INTEGRATIONS JSON,
    PROPERTIES JSON,
    EVENT STRING(255),
    ORIGINALTIMESTAMP STRING(255)   
);

感谢大家的帮助!

json google-bigquery protocol-buffers
1个回答
0
投票

这是因为 JSONStreamWriter 旨在将单个 JSON 对象序列化为流,而不是嵌套对象。您的 root.Context 是一个嵌套对象,导致 JSONStreamWriter 失败。

您可能可以先尝试展平 JSON 数组,这样可以确保每个对象在 BigQuery 中被视为单独的行。

© www.soinside.com 2019 - 2024. All rights reserved.