为了给出一些背景知识,我试图在Spark上使用和不使用Spark的催化剂优化器来运行TPCDS基准测试。对于较小数据集的复杂查询,我们可能花费更多时间来优化计划而不是实际执行计划。因此,想要衡量优化器对查询整体执行的性能影响
有没有办法禁用部分或全部火花催化剂优化规则?
此功能已作为Spark-2.4.0的一部分添加到SPARK-24802中。
val OPTIMIZER_EXCLUDED_RULES = buildConf("spark.sql.optimizer.excludedRules")
.doc("Configures a list of rules to be disabled in the optimizer, in which the rules are " +
"specified by their rule names and separated by comma. It is not guaranteed that all the " +
"rules in this configuration will eventually be excluded, as some rules are necessary " +
"for correctness. The optimizer will log the rules that have indeed been excluded.")
.stringConf
.createOptional
您可以找到优化程序规则列表here。 但理想情况下,我们不应该禁用规则,因为它们中的大多数都提供了性能优势。我们应该确定消耗时间的规则并检查是否对查询没用,然后禁用它们。
你必须关闭配置
sparkSession.conf.set("spark.sql.cbo.enabled",false)
要么
--conf spark.sql.cbo.enabled=false
在触发火花提交期间