将带有dict属性的JSON写入Google Cloud Datastore

问题描述 投票:3回答:1

使用Apache Beam(Python 2.7 SDK)我试图将JSON文件作为实体写入Google Cloud Datastore。

示例JSON:

{
"CustId": "005056B81111",
"Name": "John Smith", 
"Phone": "827188111",
"Email": "[email protected]", 
"addresses": [
    {"type": "Billing", "streetAddress": "Street 7", "city": "Malmo", "postalCode": "CR0 4UZ"},
    {"type": "Shipping", "streetAddress": "Street 6", "city": "Stockholm", "postalCode": "YYT IKO"}
]
}

我写了一个主要有3个步骤的Apache Beam管道,

  1. beam.io.ReadFromText(input_file_path)
  2. beam.ParDo(CreateEntities())
  3. WriteToDatastore(PROJECT)

在第2步中,我将JSON对象(dict)转换为实体,

class CreateEntities(beam.DoFn):
  def process(self, element):
    element = element.encode('ascii','ignore')
    element = json.loads(element)
    Id = element.pop('CustId')
    entity = entity_pb2.Entity()
    datastore_helper.add_key_path(entity.key, 'CustomerDF', Id)
    datastore_helper.add_properties(entity, element)
    return [entity]

这适用于基本属性。但是,由于地址是一个dict对象本身,它失败了。我读过类似的post

但是没有得到转换dict - > entity的确切代码

尝试下面将地址元素设置为实体但不起作用,

element['addresses'] = entity_pb2.Entity()

其他参考文献:

python-2.7 google-cloud-datastore google-cloud-dataflow
1个回答
2
投票

您是否尝试将其存储为重复的结构化属性?

ndb.StructuredPropertys出现在数据流中,键被展平,对于重复的结构化属性,结构化属性对象中的每个单独属性都成为一个数组。所以我认为你需要这样写:

datastore_helper.add_properties(entity, {
    ...
    "addresses.type": ["Billing", "Shipping"],
    "addresses.streetAddress": ["Street 7", "Street 6"],
    "addresses.city": ["Malmo", "Stockholm"],
    "addresses.postalCode": ["CR0 4UZ", "YYT IKO"],
})

或者,如果您尝试将其另存为ndb.JsonProperty,则可以执行以下操作:

datastore_helper.add_properties(entity, {
        ...
        "addresses": json.dumps(element['addresses']),
    })
© www.soinside.com 2019 - 2024. All rights reserved.