嗨,我有以下带有派生列(withColumn)的数据框 使用月份的某一天,如果月份的某一天是 1-9,则在值前添加 0。
from pyspark.sql.functions import concat, to_date, expr, when
from pyspark.sql.types import DoubleType
df = (
spark.read.option("header", "True")
.option("delimiter", ",")
.option("inferSchema", "True")
.csv("dbfs:/databricks-datasets/airlines/part-00000")
)
#dff = df.withColumn("DayofMonthFormatted",
when(df.DayofMonth.isin([1,2,3,4,5,6,7,8,9]), "Yes").otherwise("No"))
dff = df.withColumn("DayofMonthFormatted",
when(df.DayofMonth.isin([1,2,3,4,5,6,7,8,9]), "0" +
str(df.DayofMonth)).otherwise(df.DayofMonth))
display(dff['DayofMonth', 'DayofMonthFormatted'])
#dff = df.withColumn('DayDate',
to_date(concat('Year','Month','DayofMonth'),'yyyyMMdd'))
#display(dff)
我知道我已经非常接近了,但我很难知道我能做什么。
请参阅下面的输出
使用format_string解决
dff = df.withColumn("DayofMonthFormatted", when(df.DayofMonth.isin([1,2,3,4,5,6,7,8,9]), format_string("0%d", df. DayofMonth)).otherwise(df.DayofMonth))