如何向数据框中的所有列添加双引号并保存到 csv

问题描述 投票:0回答:1

我需要帮助来做一些与数据框相关的事情

我需要保存一个 csv 文件,其中所有列在值的开头和结尾都包含双引号。

此数据框是在读取一组镶木地板文件后创建的,例如:

emp = [(1,"Smith",-1,"2018","10","M",3000), \
    (2,"Rose",1,"2010","20","M",4000), \
    (3,"Williams",1,"2010","10","M",1000), \
    (4,"Jones",2,"2005","10","F",2000), \
    (5,"Brown",2,"2010","40","",-1), \
    (6,"lara",2,"2010","30","",-1), \
    (7,"mario",2,"2010","10","",-1), \
    (8,"bruno",2,"2010","40","",-1), \
    (9,"luis",2,"2010","20","",-1) \
  ]


empDF = spark.createDataFrame(data=emp)
empDF.show()


empDF.coalesce(1).write.format('csv').option('quote', '').option('header','true').option("delimiter","|").save(path_destination,mode='overwrite')

结果一定是这样的:

_1|_2|_3|_4|_5|_6|_7
"1"|"Smith"|"-1"|"2018"|"10"|"M"|"3000"
"2"|"Rose"|"1"|"2010"|"20"|"M"|"4000"
"3"|"Williams"|"1"|"2010"|"10"|"M"|"1000"
...
...
...

我正在使用 option('quote', '') ,但无法按我想要的方式保存 csv 文件。

有人可以帮助我吗?

python-3.x dataframe apache-spark pyspark aws-glue
1个回答
0
投票
© www.soinside.com 2019 - 2024. All rights reserved.