迭代 pyspark 数据框中的数组,并基于与数组中的值同名的列创建一个新列

问题描述 投票:0回答:1

我有一个这种格式的表格:

名字 水果 苹果 香蕉 橙色
爱丽丝 [“苹果”,“香蕉”,“橙子”] 5 8 3
鲍勃 [“苹果”] 2 9 1

我想创建一个包含这种格式的JSON包的新列,其中键是数组的元素,值是列名称的结果值:

名字 水果 苹果 香蕉 橙色 new_col
爱丽丝 [“苹果”,“香蕉”,“橙子”] 5 8 3 {“苹果”:5,“香蕉”:8,“橙子”:3}
鲍勃 [“苹果”] 2 9 1 {“苹果”:2}

对于如何进行有什么想法吗?我假设有一个 UDF,但我无法获得正确的语法。

这是我所掌握的代码:

from pyspark.sql.functions import udf, col
from pyspark.sql.types import MapType, StringType

# Create a Spark session
spark = SparkSession.builder.appName("example").getOrCreate()

# Sample data
data = [("Alice", ["apple", "banana", "orange"], 5, 8, 3),
        ("Bob", ["apple"], 2, 9, 1)]

# Define the schema
schema = ["name", "fruits", "apple", "banana", "orange"]

# Create a DataFrame
df = spark.createDataFrame(data, schema=schema)

# Show the initial DataFrame
print("Initial DataFrame:")
display(df)

# Define a UDF to create a dictionary
@udf(MapType(StringType(), StringType()))
def json_map(fruits):
    result = {}
    for i in fruits:
        result[i] = col(i)
    return result

# Apply the UDF to the 'fruits' column
new_df = df.withColumn('test', json_map(col('fruits')))

# Display the updated DataFrame
display(new_df)
apache-spark pyspark apache-spark-sql user-defined-functions
1个回答
0
投票

首先,使用arrays_zip将数组值组合成一个结构体数组,然后删除空值的键,代码如下:

data = [("Alice", ["apple", "banana", "orange"], 5, 8, 3),
        ("Bob", ["apple"], 2, 9, 1)]
schema = ["name", "fruits", "apple", "banana", "orange"]
df = spark.createDataFrame(data, schema=schema)

df.withColumn("new_col", arrays_zip(col("fruits"), array(col("apple"), col("banana"), col("orange"))))\
    .withColumn("new_col", expr("filter(new_col, x-> x.fruits IS NOT NULL)")).show(truncate=False)

结果:

+-----+-----------------------+-----+------+------+--------------------------------------+
|name |fruits                 |apple|banana|orange|new_col                               |
+-----+-----------------------+-----+------+------+--------------------------------------+
|Alice|[apple, banana, orange]|5    |8     |3     |[{apple, 5}, {banana, 8}, {orange, 3}]|
|Bob  |[apple]                |2    |9     |1     |[{apple, 2}]                          |
+-----+-----------------------+-----+------+------+--------------------------------------+
© www.soinside.com 2019 - 2024. All rights reserved.