如何使用groupby中的聚合函数

问题描述 投票:0回答:1

我很难将 ABC 类转换为 CustomClass。

逻辑是,我希望 CustomClass 中的计数是 a 和 b 的 groupyBy 的总数,其中日期与 1 年过滤器匹配,而 t30Ycount 和 t30Ncount 是 groupBy 的计数,但应用了 30 天和标志的过滤器。

但是通过以下逻辑我得到了

`[scalatest] org.apache.spark.sql.AnalysisException:表达式“flag”既不存在于分组依据中,也不是聚合函数。如果您不关心获得哪个值,请添加到 group by 或包含在first()(或first_value)中。;


case class ABC(a: String , b: Long, flag: String, date: Timestamp) extends Product {}


 info
      .filter(col(s"${abc.date}") > oneYear)
      .groupBy(col(s"${abc.a}"), col(s"${abc.b}"))
      .agg(
// Should be the count of total no of rows of groupBy above
        count("a").as(s"${customClass.count}"),
// Should be the count of no of rows of group by above where flag is Y and date matches filter
        when(lower(col(s"${abc.flag}")) === "Y".toLowerCase && col(s"${abc.date}") > thirtyDays,
              count("a")).otherwise(lit(0)).as(s"${customClass.t30Ycount}"),
// Should be the count of no of rows of group by above where flag is N and date matches filter
        when(lower(col(s"${abc.flag}")) === "N".toLowerCase && col(s"${abc.date}") > thirtyDays,
              count("a")).otherwise(lit(0)).as(s"${customClass.t30Ncount}")
      ).as[CustomClass]


  case class CustomClass(a: String
                       , b: Long
                       , count: Long
                       , t30Ycount: Long
                       , t30NCount: Long
                      )
scala apache-spark apache-spark-sql
1个回答
0
投票

这似乎有效,除非有人能找到更好的解决方案

 info
      .filter(col(s"${abc.date}") > oneYear)
      .groupBy(col(s"${abc.a}"), col(s"${abc.b}"))
      .agg(
// Should be the count of total no of rows of groupBy above
        count("*").as(s"${customClass.count}"),
// Should be the count of no of rows of group by above where flag is Y and date matches filter
        sum(when(lower(col(s"${abc.flag}")) === "Y".toLowerCase && col(s"${abc.date}") > thirtyDays,
              , 1).otherwise(lit(0)).as(s"${customClass.t30Ycount}"),
// Should be the count of no of rows of group by above where flag is N and date matches filter
        sum(when(lower(col(s"${abc.flag}")) === "N".toLowerCase && col(s"${abc.date}") > thirtyDays,
              , 1).otherwise(lit(0)).as(s"${customClass.t30Ncount}")
      ).as[CustomClass]
© www.soinside.com 2019 - 2024. All rights reserved.