在尝试过滤列以检查空数据集时,出现以下类型不匹配错误。
cannot resolve '(`sellers` = '[]')' due to data type mismatch: differing types in '(`sellers` = '[]')' (array<string> and string).;;
我试过下面的代码但它不工作并抛出以上错误:
var sellersDFSelectSellers = sellersDF.select("sellers")
sellersDFSelectSellers = sellersDFSelectSellers.filter(col("sellers") === "[]")
试试
sellersDFSelectSellers.filter(col("sellers") === typedLit(Seq()))
我成功重现了
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions.{col, typedLit}
val spark = SparkSession.builder
.master("local")
.appName("Spark app")
.getOrCreate()
import spark.implicits._
case class MyClass(sellers: Seq[String])
val sellersDF = Seq(MyClass(Seq("a", "b")), MyClass(Seq("c", "d", "e")), MyClass(Seq())).toDS()
sellersDF.show()
//+---------+
//| sellers|
//+---------+
//| [a, b]|
//|[c, d, e]|
//| []|
//+---------+
var sellersDFSelectSellers = sellersDF.select("sellers")
//org.apache.spark.sql.AnalysisException: cannot resolve '(sellers = '[]')' due to data type mismatch: differing types in '(sellers = '[]')' (array<string> and string)
//sellersDFSelectSellers = sellersDFSelectSellers.filter(col("sellers") === "[]")
sellersDFSelectSellers = sellersDFSelectSellers.filter(col("sellers") === typedLit(Seq()))
sellersDFSelectSellers.show()
//+-------+
//|sellers|
//+-------+
//| []|
//+-------+
这段代码解决了问题。
val sellersDFSelectSellers = sellersDF.select("sellers")
val emptySellers = sellersDFSelectSellers.filter(size(col("sellers")) === 0)