首先,我对 Hive 和 Druid 比较陌生。 我已经设置了一个连接到 Druid 数据源的 Hive 外部表。我可以像查询简单的 SELECTS 一样。示例:
SELECT id FROM druidtable;
result:
+------------+
| id |
+------------+
| 10001 |
| 10000 |
+------------+
现在我想添加一个 order by id 语句。但这会导致某种连接错误?
堆栈跟踪:
INFO : Compiling command(queryId=hive_20190730090350_28947166-ba7e-418a-bcaa-e548c3bd333d): SELECT id FROM druidtable order by id
INFO : Semantic Analysis Completed (retrial = false)
INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:id, type:string, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20190730090350_28947166-ba7e-418a-bcaa-e548c3bd333d); Time taken: 0.364 seconds
INFO : Executing command(queryId=hive_20190730090350_28947166-ba7e-418a-bcaa-e548c3bd333d): SELECT id FROM druidtable order by id
INFO : Query ID = hive_20190730090350_28947166-ba7e-418a-bcaa-e548c3bd333d
INFO : Total jobs = 1
INFO : Launching Job 1 out of 1
INFO : Starting task [Stage-1:MAPRED] in serial mode
INFO : Subscribed to counters: [] for queryId: hive_20190730090350_28947166-ba7e-418a-bcaa-e548c3bd333d
INFO : Session is already open
INFO : Dag name: SELECT id FROM druidtable...id (Stage-1)
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1564380041255_0060_17_00, diagnostics=[Vertex vertex_1564380041255_0060_17_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: druidtable initializer failed, vertex=vertex_1564380041255_0060_17_00 [Map 1], java.io.IOException: java.io.IOException: org.apache.hive.druid.org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
at org.apache.hadoop.hive.druid.DruidStorageHandlerUtils.submitRequest(DruidStorageHandlerUtils.java:326)
at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.fetchLocatedSegmentDescriptors(DruidQueryBasedInputFormat.java:262)
at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.distributeScanQuery(DruidQueryBasedInputFormat.java:225)
at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.getInputSplits(DruidQueryBasedInputFormat.java:166)
at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.getSplits(DruidQueryBasedInputFormat.java:100)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:779)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hive.druid.org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
at org.apache.hive.druid.com.metamx.http.client.NettyHttpClient.go(NettyHttpClient.java:143)
at org.apache.hive.druid.com.metamx.http.client.AbstractHttpClient.go(AbstractHttpClient.java:14)
at org.apache.hadoop.hive.druid.DruidStorageHandlerUtils.submitRequest(DruidStorageHandlerUtils.java:324)
... 20 more
Caused by: java.net.ConnectException: Connection refused: localhost/127.0.0.1:8082
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hive.druid.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.apache.hive.druid.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.apache.hive.druid.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.apache.hive.druid.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.apache.hive.druid.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.apache.hive.druid.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.apache.hive.druid.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
... 3 more
at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.fetchLocatedSegmentDescriptors(DruidQueryBasedInputFormat.java:264)
at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.distributeScanQuery(DruidQueryBasedInputFormat.java:225)
at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.getInputSplits(DruidQueryBasedInputFormat.java:166)
at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.getSplits(DruidQueryBasedInputFormat.java:100)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524)
at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:779)
at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
]
它打印了 3 次连接错误。
好的,我现在明白了。 不知何故,会话配置错误。 hive.druid.broker.address.default 设置为 localhost,而它应该是代理的实际 IP。
当我从 hive beeline 运行 druid 表时,我遇到了同样的问题。 错误示例如下
url[http://localhost:8082/druid/v2/] because of [org.apache.hive.druid.org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
完整跟踪如下
0: jdbc:hive2://odp01.ubuntu.ad.ce:2181,odp02> select count(*) from druid_table_1;
INFO : Compiling command(queryId=hive_20231030215514_7536b3d8-9a8a-4a50-9c19-7fb8896c3c93): select count(*) from druid_table_1
INFO : Semantic Analysis Completed (retrial = false)
INFO : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:$f0, type:bigint, comment:null)], properties:null)
INFO : Completed compiling command(queryId=hive_20231030215514_7536b3d8-9a8a-4a50-9c19-7fb8896c3c93); Time taken: 0.422 seconds
INFO : Operation QUERY obtained 0 locks
INFO : Executing command(queryId=hive_20231030215514_7536b3d8-9a8a-4a50-9c19-7fb8896c3c93): select count(*) from druid_table_1
INFO : Completed executing command(queryId=hive_20231030215514_7536b3d8-9a8a-4a50-9c19-7fb8896c3c93); Time taken: 0.02 seconds
Error: Unable to get the next row set with exception: org.apache.hive.druid.org.apache.druid.java.util.common.RE: Failure getting results for query[TimeseriesQuery{dataSource='wikiticker', querySegmentSpec=LegacySegmentSpec{intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z]}, descending=false, virtualColumns=[], dimFilter=null, granularity='AllGranularity', aggregatorSpecs=[CountAggregatorFactory{name='$f0'}], postAggregatorSpecs=[], limit=2147483647, context={queryId=hive_20231030215514_7536b3d8-9a8a-4a50-9c19-7fb8896c3c93, skipEmptyBuckets=false}}] from locations[[Ljava.lang.String;@434f866f] because of [Failure getting results for query[TimeseriesQuery{dataSource='wikiticker', querySegmentSpec=LegacySegmentSpec{intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z]}, descending=false, virtualColumns=[], dimFilter=null, granularity='AllGranularity', aggregatorSpecs=[CountAggregatorFactory{name='$f0'}], postAggregatorSpecs=[], limit=2147483647, context={queryId=hive_20231030215514_7536b3d8-9a8a-4a50-9c19-7fb8896c3c93, skipEmptyBuckets=false}}] url[http://localhost:8082/druid/v2/] because of [org.apache.hive.druid.org.jboss.netty.channel.ChannelException: Faulty channel in resource pool]] (state=,code=0)
为了避免这种情况,我添加了来自 beeline 的经纪商地址。
解决方案:
0: jdbc:hive2://odp01.ubuntu.ad.ce:2181,odp02> SET hive.druid.broker.address.default=10.90.6.51:8082;
然后重新运行查询,这次能够成功了
0: jdbc:hive2://odp01.ubuntu.ad.ce:2181,odp02> select count(*) from druid_table_1;
+--------+
| $f0 |
+--------+
| 24433 |
+--------+