用 pyspark 数据框中的列表解析 json 字符串

问题描述 投票:0回答:1

我需要解析下面的 json 字符串,其中包含 pyspark 数据帧中列中的列表。

在此输入图片描述

我希望在解析 pyspark dataframe 中的 json 字符串列后得到这样的结果

在此输入图片描述

感谢您提前的帮助。

json dataframe pyspark
1个回答
0
投票

请输入数据,而不是图像,以帮助检查json结构。

这是代码,它应该为您提供预期的输出

from pyspark.sql import SparkSession
import pyspark.sql.functions as F

# Create a SparkSession
spark = SparkSession.builder.appName("JSON_to_DataFrame").getOrCreate()

# Read the JSON data into a DataFrame
df = spark.read.json("your_json_file.json")

# Flatten the nested structure using a higher-order function
df = df.withColumn("Students", F.explode("Students")) \
    .select(
        "licence",
        "date",
        "Students.city",
        "Students.code",
        "Students.Details.refnumber",
        "Students.Details.refcolumn",
        "Students.More Details.rolenum",
        "Students.More Details.name",
        "Students.More Details.joiningdate"
    )

# Rename columns as needed
df = df.withColumnRenamed("Students.city", "city") \
    .withColumnRenamed("Students.code", "code") \
    .withColumnRenamed("Students.Details.refnumber", "refnumber") \
    .withColumnRenamed("Students.Details.refcolumn", "refcolumn") \
    .withColumnRenamed("Students.More Details.rolenum", "rolenum") \
    .withColumnRenamed("Students.More Details.name", "name") \
    .withColumnRenamed("Students.More Details.joiningdate", "joiningdate")

# Show the DataFrame
df.show()
© www.soinside.com 2019 - 2024. All rights reserved.