使用spark scala将行转换为列

问题描述 投票:0回答:2

我想使用spark数据帧将行转换为列。

我的桌子是这样的

Eno,Name
1,A
1,B
1,C
2,D
2,E

我想把它转换成

Eno,n1,n2,n3
1,A,B,C
2,D,E,Null

我用下面这段代码: -

val r = spark.sqlContext.read.format("csv").option("header","true").option("inferschema","true").load("C:\\Users\\axy\\Desktop\\abc2.csv")

val n =Seq("n1","n2","n3"

 r
    .groupBy("Eno")
    .pivot("Name",n).agg(expr("coalesce(first(Name),3)").cast("double")).show() 

但我得到的结果是 - >

+---+----+----+----+
|Eno|  n1|  n2|  n3|
+---+----+----+----+
|  1|null|null|null|
|  2|null|null|null|
+---+----+----+----+

任何人都可以帮助获得欲望的结果。

scala apache-spark hadoop hive bigdata
2个回答
0
投票
import org.apache.spark.sql.functions._
import spark.implicits._
val df= Seq((1,"A"),(1,"B"),(1,"C"),(2,"D"),(2,"E")).toDF("Eno","Name")
val getName=udf {(names: Seq[String],i : Int) => if (names.size>i)  names(i) else null}

val tdf=df.groupBy($"Eno").agg(collect_list($"name").as("names"))
val ndf=(0 to 2).foldLeft(tdf){(ndf,i) => ndf.withColumn(s"n${i}",getName($"names",lit(i))) }.
drop("names")
ndf.show()
+---+---+---+----+
|Eno| n0| n1|  n2|
+---+---+---+----+
|  1|  A|  B|   C|
|  2|  D|  E|null|
+---+---+---+----+

2
投票
val m= map(lit("A"), lit("n1"), lit("B"),lit("n2"), lit("C"), lit("n3"), lit("D"), lit("n1"), lit("E"), lit("n2"))
val df= Seq((1,"A"),(1,"B"),(1,"C"),(2,"D"),(2,"E")).toDF("Eno","Name")
df.withColumn("new", m($"Name")).groupBy("Eno").pivot("new").agg(first("Name"))


+---+---+---+----+
|Eno| n1| n2|  n3|
+---+---+---+----+
|  1|  A|  B|   C|
|  2|  D|  E|null|
+---+---+---+----+
© www.soinside.com 2019 - 2024. All rights reserved.