需要从单列数据帧中提取Json数据(嵌套数组)-表以Null的形式读取,且已读取架构-Scala

问题描述 投票:0回答:2

我正在尝试从数据框中提取以下数据。具有嵌套数组的Json数据完全位于一列(_c1)中。我想将其拉出并使用有效的列名称将其创建为单独的数据框。一个示例记录如下。

|_c1                                                                                                                                                                                                                                                                                                                                                                   |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
|{"Id":"31279605299","Type":"12121212","client":"Checklist _API","eventTime":"2020-03-17T15:50:30.640Z","eventType":"Event","payload":{"sourceApp":"ios","questionnaire":{"version":"1.0","question":"How to resolve ? ","fb":"Na"}}} 

我正在将其读取为架构,

val schema=StructType(Array(
      StructField("Id", StringType, false),
      StructField("Type", StringType, false),
      StructField("client", StringType, false),
      StructField("eventTime", StringType, false),
      StructField("eventType", StringType, false),
      StructField("payload", ArrayType(StructType(Array(
        StructField("sourceApp", StringType, false),
        StructField("questionnaire", ArrayType(StructType(Array(
          StructField("version", StringType, false),
          StructField("question", StringType, false),
          StructField("fb", StringType, false)))))
      ))))
    ))

      val json_paral = DF.select(from_json(col("_c1"),schema))
`
Structure comes out as below,
`

 |-- jsontostructs(_c1): struct (nullable = true)
 |    |-- Id: string (nullable = true)
 |    |-- Type: string (nullable = true)
 |    |-- client: string (nullable = true)
 |    |-- eventTime: string (nullable = true)
 |    |-- eventType: string (nullable = true)
 |    |-- payload: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- sourceApp: string (nullable = true)
 |    |    |    |-- questionnaire: array (nullable = true)
 |    |    |    |    |-- element: struct (containsNull = true)
 |    |    |    |    |    |-- version: string (nullable = true)
 |    |    |    |    |    |-- question: string (nullable = true)
 |    |    |    |    |    |-- fb: string (nullable = true)

结构很好,但是当我检查数据帧时,所有数据都显示为NULL。读得还好吗?也没有任何解析问题。

json scala multidimensional-array apache-spark-sql
2个回答
0
投票

请检查是否有帮助-

1。加载数据


0
投票

不是将其读取为架构,而是尝试将其设置为as的值>


val Df = json_DF.map(r => r.getString(0))
© www.soinside.com 2019 - 2024. All rights reserved.