我很难将 ABC 类转换为 CustomClass。
逻辑是,我希望 CustomClass 中的计数是 a 和 b 的 groupyBy 的总数,其中日期与 1 年过滤器匹配,而 t30Ycount 和 t30Ncount 是 groupBy 的计数,但应用了 30 天和标志的过滤器。
但是通过以下逻辑我得到了
`[scalatest] org.apache.spark.sql.AnalysisException:表达式“flag”既不存在于分组依据中,也不是聚合函数。如果您不关心获得哪个值,请添加到 group by 或包含在first()(或first_value)中。;
case class ABC(a: String , b: Long, flag: String, date: Timestamp) extends Product {}
info
.filter(col(s"${abc.date}") > oneYear)
.groupBy(col(s"${abc.a}"), col(s"${abc.b}"))
.agg(
// Should be the count of total no of rows of groupBy above
count("a").as(s"${customClass.count}"),
// Should be the count of no of rows of group by above where flag is Y and date matches filter
when(lower(col(s"${abc.flag}")) === "Y".toLowerCase && col(s"${abc.date}") > thirtyDays,
count("a")).otherwise(lit(0)).as(s"${customClass.t30Ycount}"),
// Should be the count of no of rows of group by above where flag is N and date matches filter
when(lower(col(s"${abc.flag}")) === "N".toLowerCase && col(s"${abc.date}") > thirtyDays,
count("a")).otherwise(lit(0)).as(s"${customClass.t30Ncount}")
).as[CustomClass]
case class CustomClass(a: String
, b: Long
, count: Long
, t30Ycount: Long
, t30NCount: Long
)
这似乎有效,除非有人能找到更好的解决方案
info
.filter(col(s"${abc.date}") > oneYear)
.groupBy(col(s"${abc.a}"), col(s"${abc.b}"))
.agg(
// Should be the count of total no of rows of groupBy above
count("*").as(s"${customClass.count}"),
// Should be the count of no of rows of group by above where flag is Y and date matches filter
sum(when(lower(col(s"${abc.flag}")) === "Y".toLowerCase && col(s"${abc.date}") > thirtyDays,
, 1).otherwise(lit(0)).as(s"${customClass.t30Ycount}"),
// Should be the count of no of rows of group by above where flag is N and date matches filter
sum(when(lower(col(s"${abc.flag}")) === "N".toLowerCase && col(s"${abc.date}") > thirtyDays,
, 1).otherwise(lit(0)).as(s"${customClass.t30Ncount}")
).as[CustomClass]