Spark Scala：访问数组内部的struct内部的数据

Question

架构看起来像这样

root
|-- orderitemlist: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- internal-material-code: string (nullable = true)
| | |-- lot-number: string (nullable = true)
| | |-- packaging-item-code: string (nullable = true)
| | |-- packaging-item-code-type: string (nullable = true)

我如何访问internal-material-code和lot-number的值

在创建数据框时，我这样做

df.withColumn("internalmaterialcode", col("orderitemlist")(0).getItem("internal-material-code"))

也

df.withColumn("internalmaterialcode", col("orderitemlist")(0)("internal-material-code"))

也如下

df.withColumn("orderitemlistarray", explode(col("orderitemlist"))) 
.withColumn("internalmaterialcode", col("orderitemlistarray").getItem("internal-material-code"))

也如下

df.withColumn("orderitemlistarray", explode(col("orderitemlist"))) 
.withColumn("internalmaterialcode", col("orderitemlistarray.internal-material-code"))

但它给出null

我在stackoverflow问题上看到了相似的架构，但是没有一个答案对我有用。有人可以回答它还是将我定向到正确的地方。

Answer 1

尝试使用此语法

Example:

val va="""{
    "orderitemlist": [{
        "internal-material-code": "123",
        "lot-number": "vv",
        "packaging-item-code": "pp",
        "packaging-item-code-type": "ll"
    },{
        "internal-material-code": "234",
        "lot-number": "vv",
        "packaging-item-code": "pp",
        "packaging-item-code-type": "ll"
    }]
}"""

val df=spark.read.json(Seq(va).toDS).toDF

df.withColumn("arr",explode(col("orderitemlist"))).select("arr.*").show()

Result:

+----------------------+----------+-------------------+------------------------+
|internal-material-code|lot-number|packaging-item-code|packaging-item-code-type|
+----------------------+----------+-------------------+------------------------+
|                   123|        vv|                 pp|                      ll|
|                   234|        vv|                 pp|                      ll|
+----------------------+----------+-------------------+------------------------+

现在您将从数组内的struct中获取所有列。.!!

Spark Scala：访问数组内部的struct内部的数据

问题描述投票：0回答：1

1个回答

最新问题

Spark Scala：访问数组内部的struct内部的数据

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1