我正在尝试将数据帧的值插入 Databricks 上的 SQL 表中。
问题是,数据框中没有(明显的)重复列。我检查了。这可能是什么?
|-- nr_cpf_cnpj: string (nullable = true)
|-- tp_pess: string (nullable = true)
|-- am_bacen: long (nullable = true)
|-- cd_moda: long (nullable = true)
|-- cd_sub_moda: long (nullable = true)
|-- vl_bacen: decimal(29,2) (nullable = true)
|-- clivenc: string (nullable = true)
|-- vl_envio: decimal(28,2) (nullable = true)
|-- nm_pess_empr: string (nullable = true)
|-- nr_cnae_prin: long (nullable = true)
spark.sql("INSERT INTO TABLE db.tb_jul_bcn SELECT * FROM tmpBcnView")
数据帧在 tmpBcnView 中作为临时视图
错误:
AnalysisException: Found duplicate column(s) in the data to save: nr_cnae_prin
---------------------------------------------------------------------------
AnalysisException Traceback (most recent call last)
<command-2987275027841731> in <cell line: 1>()
----> 1 spark.sql("INSERT INTO TABLE db.tb_jul_bcn SELECT * FROM tmpBcnView")
/databricks/spark/python/pyspark/instrumentation_utils.py in wrapper(*args, **kwargs)
46 start = time.perf_counter()
47 try:
---> 48 res = func(*args, **kwargs)
49 logger.log_success(
50 module_name, class_name, function_name, time.perf_counter() - start, signature
/databricks/spark/python/pyspark/sql/session.py in sql(self, sqlQuery, **kwargs)
1117 sqlQuery = formatter.format(sqlQuery, **kwargs)
1118 try:
-> 1119 return DataFrame(self._jsparkSession.sql(sqlQuery), self)
1120 finally:
1121 if len(kwargs) > 0:
我解决了!
发生的情况不是有重复的列,而是有一个多余的列。错误是声称有重复项,因此我正在搜索重复项。就是这样!