如何在 pyspark 中使用合并将 null 值替换为某个值

Question

我有两个文件：-orders_renamed.csv，customers.csv 我使用完整的外部连接将它们连接起来，然后删除同一列（customer_id）。我想将“order_id”列中的 null 值替换为“-1”。

我试过这个：

from pyspark.sql.functions import regexp_extract, monotonically_increasing_id, unix_timestamp, from_unixtime, coalesce from pyspark.sql.types import IntegerType, StructField, StructType, StringType

ordersDf = spark.read.format("csv").option("header", True).option("inferSchema", True).option("path", "C:/Users/Lenovo/Desktop/week12/week 12 dataset/orders_renamed.csv").load()

customersDf = spark.read.format("csv").option("header", True).option("inferSchema", True).option("path", "C:/Users/Lenovo/Desktop/week12/week 12 dataset/customers.csv").load()

joinCondition1 = ordersDf.customer_id == customersDf.customer_id

joinType1 = "outer"   


joinenullreplace = ordersDf.join(customersDf, joinCondition1, joinType1).drop(ordersDf.customer_id).select("order_id", "customer_id", "customer_fname").sort("order_id").withColumn("order_id",coalesce("order_id",-1))


joinenullreplace.show(50)

在最后一行中我使用了合并，但它给了我错误..我尝试了多种方法，例如将合并作为一个表达式并应用“expr” 但它不起作用。我也用过lit，但是没用。请回复解决方案。

Answer 1

0
投票

从 pyspark.sql.functions 导入 lit

如何在 pyspark 中使用合并将 null 值替换为某个值

问题描述投票：0回答：1

1个回答

最新问题

如何在 pyspark 中使用合并将 null 值替换为某个值

问题描述 投票：0回答：1

1个回答

最新问题

问题描述投票：0回答：1