使用窗口函数按组创建下一个和上一个列

Question

我有一个 pyspark 数据框，其中包含 id、日期、组、类

id, date, group, class
A, 2023-10-12, 1, p
A, 2023-10-13, 1, p
A, 2023-10-14, 2, c
A, 2023-10-15, 3, s
A, 2023-10-16, 3, s

我想计算prev_group_class和next_group_class，上一组的类和下一组的类

我的输出应该是

A, 2023-10-12, 1, p, null, c
A, 2023-10-13, 1, p, null, c
A, 2023-10-14, 2, c, p, s,
A, 2023-10-15, 3, s, c, null
A, 2023-10-16, 3, s, c, null

Answer 1

在 PySpark 中，您可以使用窗口函数来计算上一组和下一组类。

定义窗口规范： 使用窗口函数按
```
id
```
进行分区并按
```
date
```
进行排序以查看上一行和下一行。
滞后和超前函数：使用这些函数获取上一组和下一组的类值。

假设 Spark 会话已存在（将其命名为

spark

）。

from pyspark.sql.functions import col, lag, lead
from pyspark.sql.window import Window

# Sample data
data = [
    ('A', '2023-10-12', 1, 'p'),
    ('A', '2023-10-13', 1, 'p'),
    ('A', '2023-10-14', 2, 'c'),
    ('A', '2023-10-15', 3, 's'),
    ('A', '2023-10-16', 3, 's'),
]

# Create DataFrame
df = spark.createDataFrame(data, ["id", "date", "group", "class"])

# Define window specification
window_spec = Window.partitionBy("id").orderBy("date")

# Compute previous and next group class
df_with_prev_next = df.withColumn(
    "prev_group_class", 
    lag("class").over(window_spec)
).withColumn(
    "next_group_class", 
    lead("class").over(window_spec)
)

# Show the result
df_with_prev_next.show(truncate=False)

窗口规范：
```
Window.partitionBy("id").orderBy("date")
```
创建一个按 id 分区并按日期排序的窗口。这允许滞后和领先函数查看每个分区内的前一行和下一行。
滞后函数：
```
lag("class").over(window_spec)
```
从同一窗口分区内的前一行检索类值。
Lead Function：
```
lead("class").over(window_spec)
```
从同一窗口分区内的下一行检索类值。

这应该输出以下结果：

+---+----------+-----+-----+----------------+-----------------+
| id|      date|group|class|prev_group_class|next_group_class |
+---+----------+-----+-----+----------------+-----------------+
|  A|2023-10-12|    1|    p|            null|                c|
|  A|2023-10-13|    1|    p|            null|                c|
|  A|2023-10-14|    2|    c|               p|                s|
|  A|2023-10-15|    3|    s|               c|             null|
|  A|2023-10-16|    3|    s|               c|             null|
+---+----------+-----+----+----------------+------------------+

使用窗口函数按组创建下一个和上一个列

问题描述投票：0回答：1

1个回答

最新问题

使用窗口函数按组创建下一个和上一个列

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1