我正在尝试从蜂巢中的db.abc中选择*,此蜂巢表是使用spark加载的
它不起作用显示错误:
错误:java.io.IOException:java.lang.IllegalArgumentException:bucketId超出范围:-1(状态=,代码= 0)
当我使用以下属性时,我可以查询配置单元:
set hive.mapred.mode=nonstrict;
set hive.optimize.ppd=true;
set hive.optimize.index.filter=true;
set hive.tez.bucket.pruning=true;
set hive.explain.user=false;
set hive.fetch.task.conversion=none;
现在,当我尝试使用spark读取相同的配置单元表db.abc时,出现以下错误:
仅当客户具有以下条件时,他们才能访问此表功能:CONNECTORREAD,HIVEFULLACIDREAD,HIVEFULLACIDWRITE,HIVEMANAGESTATS,HIVECACHEINVALIDATE,CONNECTORWRITE。该表可以是Hive管理的ACID表,也可以要求其他一些表Spark当前未实现的功能;在org.apache.spark.sql.catalyst.catalog.CatalogUtils $ .throwIfNoAccess(ExternalCatalogUtils.scala:280)在org.apache.spark.sql.hive.HiveTranslationLayerCheck $$ anonfun $ apply $ 1.applyOrElse(HiveTranslationLayerStrategies.scala:105)在org.apache.spark.sql.hive.HiveTranslationLayerCheck $$ anonfun $ apply $ 1.applyOrElse(HiveTranslationLayerStrategies.scala:85)在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1.apply(TreeNode.scala:289)在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ transformUp $ 1.apply(TreeNode.scala:289)在org.apache.spark.sql.catalyst.trees.CurrentOrigin $ .withOrigin(TreeNode.scala:70)在org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:288)在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 3.apply(TreeNode.scala:286)在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 3.apply(TreeNode.scala:286)在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 4.apply(TreeNode.scala:306)在org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)在org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)在org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 3.apply(TreeNode.scala:286)在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 3.apply(TreeNode.scala:286)在org.apache.spark.sql.catalyst.trees.TreeNode $$ anonfun $ 4.apply(TreeNode.scala:306)在org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)在org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:304)在org.apache.spark.sql.catalyst.trees.TreeNode.transformUp(TreeNode.scala:286)在org.apache.spark.sql.hive.HiveTranslationLayerCheck.apply(HiveTranslationLayerStrategies.scala:85)在org.apache.spark.sql.hive.HiveTranslationLayerCheck.apply(HiveTranslationLayerStrategies.scala:83)在org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1 $$ anonfun $ apply $ 1.apply(RuleExecutor.scala:87)在org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1 $$ anonfun $ apply $ 1.apply(RuleExecutor.scala:84)在scala.collection.LinearSeqOptimized $ class.foldLeft(LinearSeqOptimized.scala:124)在scala.collection.immutable.List.foldLeft(List.scala:84)在org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1.apply(RuleExecutor.scala:84)在org.apache.spark.sql.catalyst.rules.RuleExecutor $$ anonfun $ execute $ 1.apply(RuleExecutor.scala:76)在scala.collection.immutable.List.foreach(List.scala:392)在org.apache.spark.sql.catalyst.rules.RuleExecutor.execute(RuleExecutor.scala:76)在org.apache.spark.sql.catalyst.analysis.Analyzer.org $ apache $ spark $ sql $ catalyst $ analysis $ Analyzer $$ executeSameContext(Analyzer.scala:124)在org.apache.spark.sql.catalyst.analysis.Analyzer.execute(Analyzer.scala:118)在org.apache.spark.sql.catalyst.analysis.Analyzer.executeAndCheck(Analyzer.scala:103)在org.apache.spark.sql.execution.QueryExecution.analyzed $ lzycompute(QueryExecution.scala:57)在org.apache.spark.sql.execution.QueryExecution.analyzed(QueryExecution.scala:55)在org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:47)在org.apache.spark.sql.Dataset $ .ofRows(Dataset.scala:74)在org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)... 49淘汰
我需要在spark-submit或shell中添加任何属性吗?还是使用spark读取此hiv e表的另一种方法?
配置单元表示例格式:
CREATE TABLE `hive``( |
| `c_id` decimal(11,0),etc.........
ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' |
| WITH SERDEPROPERTIES (
STORED AS INPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' |
LOCATION |
| path= 'hdfs://gjuyada/bbts/scl/raw' |
| TBLPROPERTIES ( |
| 'bucketing_version'='2', |
| 'spark.sql.create.version'='2.3.2.3.1.0.0-78', |
| 'spark.sql.sources.provider'='orc', |
| 'spark.sql.sources.schema.numParts'='1', |
| 'spark.sql.sources.schema.part.0'='{"type":"struct","fields":
[{"name":"Czz_ID","type":"decimal(11,0)","nullable":true,"metadata":{}},
{"name":"DzzzC_CD","type":"string","nullable":true,"metadata":{}},
{"name":"C0000_S_N","type":"decimal(11,0)","nullable":true,"metadata":{}},
{"name":"P_ _NB","type":"decimal(11,0)","nullable":true,"metadata":{}},
{"name":"C_YYYY","type":"string","nullable":true,"metadata":{}},"type":"string","nullable":true,"metadata":{}},{"name":"Cv_ID","type":"string","nullable":true,"metadata":{}},
| 'transactional'='true', |
| 'transient_lastDdlTime'='1574817059')
我正在尝试从蜂巢中的db.abc中选择*,此蜂巢表是使用spark加载的,它不起作用显示了错误:错误:java.io.IOException:java.lang.IllegalArgumentException:bucketId ...] >
您正在尝试将[Transactional table
(transactional = true)
读入Spark的问题。