dataframe withColumn 打印出列名称而不是值

问题描述 投票:0回答:1

嗨,我有以下带有派生列(withColumn)的数据框 使用月份的某一天,如果月份的某一天是 1-9,则在值前添加 0。

from pyspark.sql.functions import concat, to_date, expr, when

from pyspark.sql.types import DoubleType


df = (
spark.read.option("header", "True")
.option("delimiter", ",")
.option("inferSchema", "True")
.csv("dbfs:/databricks-datasets/airlines/part-00000")
)



#dff = df.withColumn("DayofMonthFormatted", 
when(df.DayofMonth.isin([1,2,3,4,5,6,7,8,9]), "Yes").otherwise("No"))

dff = df.withColumn("DayofMonthFormatted", 
when(df.DayofMonth.isin([1,2,3,4,5,6,7,8,9]), "0" + 
str(df.DayofMonth)).otherwise(df.DayofMonth))
    
display(dff['DayofMonth', 'DayofMonthFormatted'])



#dff =  df.withColumn('DayDate', 
to_date(concat('Year','Month','DayofMonth'),'yyyyMMdd'))

#display(dff)

我知道我已经非常接近了,但我很难知道我能做什么。

请参阅下面的输出

enter image description here

python pyspark
1个回答
0
投票

使用format_string解决

dff = df.withColumn("DayofMonthFormatted", when(df.DayofMonth.isin([1,2,3,4,5,6,7,8,9]), format_string("0%d", df. DayofMonth)).otherwise(df.DayofMonth))

© www.soinside.com 2019 - 2024. All rights reserved.