如何在附加SQL星火列值?

问题描述 投票:0回答:1

我有如下表:

+-------+---------+---------+
|movieId|movieName|    genre|
+-------+---------+---------+
|      1| example1|   action|
|      1| example1| thriller|
|      1| example1|  romance|
|      2| example2|fantastic|
|      2| example2|   action|
+-------+---------+---------+

我试图做到的,是那里的ID和名称是相同的类型值附加在一起。像这样:

+-------+---------+---------------------------+
|movieId|movieName|    genre                  |
+-------+---------+---------------------------+
|      1| example1|   action|thriller|romance |
|      2| example2|   action|fantastic        |
+-------+---------+---------------------------+
scala apache-spark dataframe apache-spark-sql append
1个回答
2
投票

使用groupBycollect_list获得具有相同电影名称的项目清单。然后使用concat_ws(如果顺序很重要,第一次使用sort_array)组合这些为一个字符串。小例子与给定的样本数据帧:

val df2 = df.groupBy("movieId", "movieName")
  .agg(collect_list($"genre").as("genre"))
  .withColumn("genre", concat_ws("|", sort_array($"genre")))

给出结果:

+-------+---------+-----------------------+
|movieId|movieName|genre                  |
+-------+---------+-----------------------+
|1      |example1 |action|thriller|romance|
|2      |example2 |action|fantastic       |
+-------+---------+-----------------------+
© www.soinside.com 2019 - 2024. All rights reserved.