如何从 Azure Databricks 输出“底层 SQLException”而不是通用异常消息？

Question

我们从数据工厂管道调用 Azure Databricks 笔记本，该管道执行 Azure Synapse 的摄取。但每当笔记本运行失败时，它只会显示以下错误消息：

com.databricks.spark.sqldw.SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector.

但是当我们进入运行日志并向下滚动到此异常消息时，就在该消息下方，将会出现

Underlying SQLException(s):
  - com.microsoft.sqlserver.jdbc.SQLServerException: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: The column [4] is not nullable and also USE_DEFAULT_VALUE is false, thus empty input is not allowed. [ErrorCode = 107090] [SQLState = S0001]

有时会是这样的：

Underlying SQLException(s):
  - com.microsoft.sqlserver.jdbc.SQLServerException: HdfsBridge::recordReaderFillBuffer - Unexpected error encountered filling record reader buffer: HadoopExecutionException: String or Binary would be truncated

我们用来提取数据的代码是：

try:
    data.write.format('com.databricks.spark.sqldw').option("url", connection_string).option("dbTable", table) \
                .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
                .option("tempDir", Connection.storageaccount_path + 'store/dataload')
                .save(mode="append")
except Exception as e:
            raise Exception("error took place for table: " + table + " : " + str(e))

所以，这个

Underlying SQLException(s):

是实际的错误消息，告诉我们出了什么问题。但它永远不会显示在我们在 ADF 管道上看到的

runError

输出中。因此，我们不可能使用

Azure Log Analytics

来批量识别错误。我们总是必须手动向下滚动到错误日志，一次又一次的失败。

我们的生产环境每天都会发生数千次运行，并且许多管道经常出现故障。但由于查看确切错误消息的限制，我们无法有效地监控故障。

有没有办法让 Databricks 输出

Underlying SQLException(s):

而不是通用消息：

com.databricks.spark.sqldw.SqlDWSideException: Azure Synapse Analytics failed to execute the JDBC query produced by the connector.

Answer 1

要获取实际错误消息而不是通用错误消息，您需要使用适当的分隔符将其拆分，并根据索引号获取实际错误消息。

try:
    df1.write.format('com.databricks.spark.sqldw').option("url", "URL").option("dbTable", "demo4") \
                .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") \
                .option("tempDir", "tempdir") \
                .option("forwardSparkAzureStorageCredentials", "true") \
                .save(mode="append")
except Exception as e:
    error_message = str(e)
    error_parts = error_message.split('\n')
    print("Error occurred:",error_parts[2],error_parts[3],error_parts[4])

在这里，我用

\n

（新行）将错误消息溢出到数组中，为了获得预期结果，我调用了拆分数组的第二个、第三个、第四个索引元素。

我的执行：

enter image description here

Answer 2

如果您使用的是 DBR >= 12.2，您现在可以根据 Databricks 中的错误处理文档使用改进的错误处理类型

from pyspark.errors import PySparkException

try:
  spark.sql("SELECT * FROM does_not_exist").show()
except PySparkException as ex:
  print("Error Class       : " + ex.getErrorClass())
  print("Message parameters: " + str(ex.getMessageParameters()))
  print("SQLSTATE          : " + ex.getSqlState())
  print(ex)

其产量：

  Error Class       : TABLE_OR_VIEW_NOT_FOUND
  Message parameters: {'relationName': '`does_not_exist`'}
  SQLSTATE          : 42P01
  [TABLE_OR_VIEW_NOT_FOUND] The table or view `does_not_exist` cannot be found. Verify the spelling and correctness of the schema and catalog.
  If you did not qualify the name with a schema, verify the current_schema() output, or qualify the name with the correct schema and catalog.
  To tolerate the error on drop use DROP VIEW IF EXISTS or DROP TABLE IF EXISTS. SQLSTATE: 42P01; line 1 pos 14;
  'Project [*]
  +- 'UnresolvedRelation [does_not_exist], [], false

如何从 Azure Databricks 输出“底层 SQLException”而不是通用异常消息？

问题描述投票：0回答：2

2个回答

最新问题

如何从 Azure Databricks 输出“底层 SQLException”而不是通用异常消息？

问题描述 投票：0回答：2

2个回答

最新问题

问题描述投票：0回答：2