我需要帮助来做一些与数据框相关的事情
我需要保存一个 csv 文件,其中所有列在值的开头和结尾都包含双引号。
此数据框是在读取一组镶木地板文件后创建的,例如:
emp = [(1,"Smith",-1,"2018","10","M",3000), \
(2,"Rose",1,"2010","20","M",4000), \
(3,"Williams",1,"2010","10","M",1000), \
(4,"Jones",2,"2005","10","F",2000), \
(5,"Brown",2,"2010","40","",-1), \
(6,"lara",2,"2010","30","",-1), \
(7,"mario",2,"2010","10","",-1), \
(8,"bruno",2,"2010","40","",-1), \
(9,"luis",2,"2010","20","",-1) \
]
empDF = spark.createDataFrame(data=emp)
empDF.show()
empDF.coalesce(1).write.format('csv').option('quote', '').option('header','true').option("delimiter","|").save(path_destination,mode='overwrite')
结果一定是这样的:
_1|_2|_3|_4|_5|_6|_7
"1"|"Smith"|"-1"|"2018"|"10"|"M"|"3000"
"2"|"Rose"|"1"|"2010"|"20"|"M"|"4000"
"3"|"Williams"|"1"|"2010"|"10"|"M"|"1000"
...
...
...
我正在使用 option('quote', '') ,但无法按我想要的方式保存 csv 文件。
有人可以帮助我吗?
使用 option('quoteAll', 'true').
选项指南:
https://spark.apache.org/docs/latest/sql-data-sources-csv.html#data-source-option