Pyspark-如何以以下格式重构输出？

Question

我有两个表，如下所示：

表1-

表2-

我希望我的输出像下面的表一样，循环进行以下计算，直到表2的所有星期都经过表1的所有天为止：

table3（week1_1）= table2（week1）* table1（day1 _ ratio）

table3（week1_2）= table2（week1）* table1（day2 _ ratio）

如何完成？

感谢所有帮助！

谢谢

Answer 1

尝试此-

用scala编写，但只需很少的改动就可以移植到pyspark

加载输入

   val table1 = Seq(
      ("o1", "i1", 1, 0.6),
      ("o1", "i1", 2, 0.4)
    ).toDF("outlet", "item", "day", "ratio")
    table1.show(false)
    /**
      * +------+----+---+-----+
      * |outlet|item|day|ratio|
      * +------+----+---+-----+
      * |o1    |i1  |1  |0.6  |
      * |o1    |i1  |2  |0.4  |
      * +------+----+---+-----+
      */

    val table2 = Seq(
      ("o1", "i1", 4, 5, 6, 8)
    ).toDF("outlet", "item", "week1", "week2", "week3", "week4")
    table2.show(false)
    /**
      * +------+----+-----+-----+-----+-----+
      * |outlet|item|week1|week2|week3|week4|
      * +------+----+-----+-----+-----+-----+
      * |o1    |i1  |4    |5    |6    |8    |
      * +------+----+-----+-----+-----+-----+
      */

用户火花功能

    table1.join(table2, Seq("outlet", "item"))
      .groupBy("outlet", "item")
      .pivot("day")
      .agg(
        first($"week1" * $"ratio").as("week1"),
        first($"week2" * $"ratio").as("week2"),
        first($"week3" * $"ratio").as("week3"),
        first($"week4" * $"ratio").as("week4")
      ).show(false)

    /**
      * +------+----+-------+-------+------------------+-------+-------+-------+------------------+-------+
      * |outlet|item|1_week1|1_week2|1_week3           |1_week4|2_week1|2_week2|2_week3           |2_week4|
      * +------+----+-------+-------+------------------+-------+-------+-------+------------------+-------+
      * |o1    |i1  |2.4    |3.0    |3.5999999999999996|4.8    |1.6    |2.0    |2.4000000000000004|3.2    |
      * +------+----+-------+-------+------------------+-------+-------+-------+------------------+-------+
      */

在python中

from pyspark.sql import functions as F
 table1.join(table2, ['outlet', 'item'])       
.groupBy("outlet", "item")       
.pivot("day")       
.agg(         
F.first(df.week1 * df.ratio).alias("week1"),         
F.first(df.week2 * df.ratio).alias("week2"),         
F.first(df.week3 * df.ratio).alias("week3"),         
F.first(df.week4 * df.ratio).alias("week4")       
).show(truncate=False)

Pyspark-如何以以下格式重构输出？

问题描述投票：0回答：1

1个回答

加载输入

用户火花功能

在python中

最新问题

Pyspark-如何以以下格式重构输出？

问题描述 投票：0回答：1

1个回答

加载输入

用户火花功能

在python中

最新问题

问题描述投票：0回答：1