java.lang.ClassCastException:类 java.lang.String 无法转换为类 org.apache.spark.unsafe.types.VariantVal

问题描述 投票:0回答:1

我评估 Spark 4

try_variant_get
处理变体类型数据的方法。首先我做sql语句示例。

CREATE TABLE family (
  id INT,
  data VARIANT
);


INSERT INTO family (id, data)
VALUES
(1, PARSE_JSON('{"name":"Alice","age":30}')),
(2, PARSE_JSON('[1,2,3,4,5]')),
(3, PARSE_JSON('42'));

执行SQL时,不会带来任何错误。那么下面的代码是使用

try_variant_get
方法

的选择命令
SELECT
  id,
  try_variant_get(data, '$.name', 'STRING') AS name,
  try_variant_get(data, '$.age', 'INT') AS age
FROM
  family
WHERE 
  try_variant_get(data, '$.name', 'STRING') IS NOT NULL;

SQL输出成功返回。然后我将这些 SQL 语句转换为 java api 代码。

SparkSession spark = SparkSession.builder().master("local[*]").appName("VariantExample").getOrCreate();

StructType schema = new StructType()
       .add("id", DataTypes.IntegerType)
       .add("data", DataTypes.VariantType);

Dataset<Row> df = spark.createDataFrame(
       Arrays.asList(
            RowFactory.create(1, "{\"name\":\"Alice\",\"age\":30}"),
            RowFactory.create(2, "[1,2,3,4,5]"),
            RowFactory.create(3, "42")
       ),
       schema
);

 Dataset<Row> df_sel = df.select(
            col("id"),
            try_variant_get(col("data"), "$.name", "String").alias("name"),
            try_variant_get(col("data"), "$.age", "Integer").alias("age")
        ).where("name IS NOT NULL");

df_sel.printSchema();
df_sel.show();

但是这些java代码抛出以下异常。

root
 |-- id: integer (nullable = true)
 |-- name: string (nullable = true)
 |-- age: integer (nullable = true)

Exception in thread "main" java.lang.ClassCastException: class java.lang.String cannot be cast to class org.apache.spark.unsafe.types.VariantVal (java.lang.String is in module java.base of loader 'bootstrap'; org.apache.spark.unsafe.types.VariantVal is in unnamed module of loader 'app')
        at org.apache.spark.sql.catalyst.expressions.variant.VariantGet.nullSafeEval(variantExpressions.scala:282)
        at org.apache.spark.sql.catalyst.expressions.BinaryExpression.eval(Expression.scala:692)
        at org.apache.spark.sql.catalyst.expressions.Alias.eval(namedExpressions.scala:159)
        at org.apache.spark.sql.catalyst.expressions.InterpretedMutableProjection.apply(InterpretedMutableProjection.scala:89)
        at org.apache.spark.sql.catalyst.optimizer.ConvertToLocalRelation$$anonfun$apply$48.$anonfun$applyOrElse$83(Optimizer.scala:2231)
        at scala.collection.immutable.List.map(List.scala:247)
        at scala.collection.immutable.List.map(List.scala:79).....

try_variant_get
方法的“String”参数存在一些问题。但我不知道这些java代码有什么问题。 请告诉我如何修复这些错误。

java apache-spark spark-java
1个回答
0
投票

在 Java 代码中,您正在构造一个 DataFrame,其中数据列为 String 而不是预期的 Variant 类型,从而导致 ClassCastException。

Spark 中的 try_variant_get 函数设计用于处理 Variant 数据类型,该类型特定于 Spark SQL 复杂数据(如 JSON)的内部表示,而不是纯字符串。

解决方案:

import org.apache.spark.sql.*;
import org.apache.spark.sql.types.*;
import static org.apache.spark.sql.functions.*;

public class VariantExample {
    public static void main(String[] args) {
        SparkSession spark = SparkSession.builder()
            .master("local[*]")
            .appName("VariantExample")
            .getOrCreate();

        // Define schema with String type for input data
        StructType schema = new StructType()
            .add("id", DataTypes.IntegerType)
            .add("data", DataTypes.StringType);

        // Create DataFrame with raw string data
        Dataset<Row> df = spark.createDataFrame(
            Arrays.asList(
                RowFactory.create(1, "{\"name\":\"Alice\",\"age\":30}"),
                RowFactory.create(2, "[1,2,3,4,5]"),
                RowFactory.create(3, "42")
            ),
            schema
        );

        // Convert 'data' column to Variant type
        Dataset<Row> dfWithVariant = df.withColumn("data", expr("to_variant(data)"));

        // Use try_variant_get to extract fields
        Dataset<Row> df_sel = dfWithVariant.select(
                col("id"),
                expr("try_variant_get(data, '$.name', 'STRING')").alias("name"),
                expr("try_variant_get(data, '$.age', 'INT')").alias("age")
            )
            .where("name IS NOT NULL");

        // Show results
        df_sel.printSchema();
        df_sel.show();
    }
}

祝你好运!

© www.soinside.com 2019 - 2024. All rights reserved.