Spark SQL Java - 无法创建嵌套的Row对象

问题描述 投票:1回答:1

这是我尝试使用Spark SQL实现的最终模式

|-- references: array (nullable = true)                                                                                                                                                                                                                                        
 |-- element: struct (containsNull = true)                                                                                                                                                                                                                                 
    |-- name: string (nullable = true)                                                                                                                                                                                                                                   
    |-- type: string (nullable = true)                                                                                                                                                                                                                                         
    |-- url: string (nullable = true) 

我正在尝试在Parquet中插入数据但是我无法创建与上述模式匹配的嵌套JSON Row对象。

以下是我尝试过但不起作用的事情 -

Tried inserting the data as - Object[] references = new Object[]{"1", "2", "3"}

Tried this Object[] references - new Object[0] (Only this works)

Tried this Object[] references - new Object[]{new Object[]{"1", "2", "3"}}

然后我把它归还

Row.createFactory(references)

我在哪里尝试返回Row对象

我需要帮助使用Spark SQL Java创建模式。我无法在线找到任何解决方案。

java apache-spark apache-spark-sql
1个回答
0
投票

看起来像数组列表可用,函数“array”和“struct”可用于创建所需的模式:

    List<Row> data = Lists.newArrayList(
            RowFactory.create(new String[]{"1", "2", "3"}),
            RowFactory.create(new String[]{"4", "5", "6"})
    );

    StructType schema = DataTypes.createStructType(
            new StructField[]{
                    DataTypes.createStructField("name", DataTypes.StringType, true),
                    DataTypes.createStructField("type", DataTypes.StringType, true),
                    DataTypes.createStructField("url", DataTypes.StringType, true),
            });
    Dataset<Row> plain = spark().createDataFrame(data, schema);

    Dataset<Row> result = plain.withColumn("references",
            array(
                    struct(col("name"), col("type"), col("url")))).
            select("references");
    result.show(false);
    result.printSchema();

输出是:

+----------+
|references|
+----------+
|[[1,2,3]] |
|[[4,5,6]] |
+----------+

root
 |-- references: array (nullable = false)
 |    |-- element: struct (containsNull = false)
 |    |    |-- name: string (nullable = true)
 |    |    |-- type: string (nullable = true)
 |    |    |-- url: string (nullable = true)
© www.soinside.com 2019 - 2024. All rights reserved.