如何从 Hive 外部 Druid 表中排序查询结果?

问题描述 投票:0回答:2

首先,我对 Hive 和 Druid 比较陌生。 我已经设置了一个连接到 Druid 数据源的 Hive 外部表。我可以像查询简单的 SELECTS 一样。示例:

SELECT id FROM druidtable;
result:
+------------+
| id         |
+------------+
| 10001      |
| 10000      |
+------------+

现在我想添加一个 order by id 语句。但这会导致某种连接错误?

堆栈跟踪:

INFO  : Compiling command(queryId=hive_20190730090350_28947166-ba7e-418a-bcaa-e548c3bd333d): SELECT id FROM druidtable order by id
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:id, type:string, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20190730090350_28947166-ba7e-418a-bcaa-e548c3bd333d); Time taken: 0.364 seconds
INFO  : Executing command(queryId=hive_20190730090350_28947166-ba7e-418a-bcaa-e548c3bd333d): SELECT id FROM druidtable order by id
INFO  : Query ID = hive_20190730090350_28947166-ba7e-418a-bcaa-e548c3bd333d
INFO  : Total jobs = 1
INFO  : Launching Job 1 out of 1
INFO  : Starting task [Stage-1:MAPRED] in serial mode
INFO  : Subscribed to counters: [] for queryId: hive_20190730090350_28947166-ba7e-418a-bcaa-e548c3bd333d
INFO  : Session is already open
INFO  : Dag name: SELECT id FROM druidtable...id (Stage-1)
ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1564380041255_0060_17_00, diagnostics=[Vertex vertex_1564380041255_0060_17_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: druidtable initializer failed, vertex=vertex_1564380041255_0060_17_00 [Map 1], java.io.IOException: java.io.IOException: org.apache.hive.druid.org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
        at org.apache.hadoop.hive.druid.DruidStorageHandlerUtils.submitRequest(DruidStorageHandlerUtils.java:326)
        at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.fetchLocatedSegmentDescriptors(DruidQueryBasedInputFormat.java:262)
        at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.distributeScanQuery(DruidQueryBasedInputFormat.java:225)
        at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.getInputSplits(DruidQueryBasedInputFormat.java:166)
        at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.getSplits(DruidQueryBasedInputFormat.java:100)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:779)
        at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hive.druid.org.jboss.netty.channel.ChannelException: Faulty channel in resource pool
        at org.apache.hive.druid.com.metamx.http.client.NettyHttpClient.go(NettyHttpClient.java:143)
        at org.apache.hive.druid.com.metamx.http.client.AbstractHttpClient.go(AbstractHttpClient.java:14)
        at org.apache.hadoop.hive.druid.DruidStorageHandlerUtils.submitRequest(DruidStorageHandlerUtils.java:324)
        ... 20 more
Caused by: java.net.ConnectException: Connection refused: localhost/127.0.0.1:8082
        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
        at org.apache.hive.druid.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
        at org.apache.hive.druid.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
        at org.apache.hive.druid.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
        at org.apache.hive.druid.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
        at org.apache.hive.druid.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
        at org.apache.hive.druid.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
        at org.apache.hive.druid.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
        ... 3 more

        at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.fetchLocatedSegmentDescriptors(DruidQueryBasedInputFormat.java:264)
        at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.distributeScanQuery(DruidQueryBasedInputFormat.java:225)
        at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.getInputSplits(DruidQueryBasedInputFormat.java:166)
        at org.apache.hadoop.hive.druid.io.DruidQueryBasedInputFormat.getSplits(DruidQueryBasedInputFormat.java:100)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:524)
        at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:779)
        at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:243)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
        at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
        at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108)
        at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41)
        at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
]

它打印了 3 次连接错误。

hive hiveql druid
2个回答
0
投票

好的,我现在明白了。 不知何故,会话配置错误。 hive.druid.broker.address.default 设置为 localhost,而它应该是代理的实际 IP。


0
投票

当我从 hive beeline 运行 druid 表时,我遇到了同样的问题。 错误示例如下

url[http://localhost:8082/druid/v2/] because of [org.apache.hive.druid.org.jboss.netty.channel.ChannelException: Faulty channel in resource pool

完整跟踪如下


0: jdbc:hive2://odp01.ubuntu.ad.ce:2181,odp02> select count(*) from druid_table_1;
INFO  : Compiling command(queryId=hive_20231030215514_7536b3d8-9a8a-4a50-9c19-7fb8896c3c93): select count(*) from druid_table_1
INFO  : Semantic Analysis Completed (retrial = false)
INFO  : Created Hive schema: Schema(fieldSchemas:[FieldSchema(name:$f0, type:bigint, comment:null)], properties:null)
INFO  : Completed compiling command(queryId=hive_20231030215514_7536b3d8-9a8a-4a50-9c19-7fb8896c3c93); Time taken: 0.422 seconds
INFO  : Operation QUERY obtained 0 locks
INFO  : Executing command(queryId=hive_20231030215514_7536b3d8-9a8a-4a50-9c19-7fb8896c3c93): select count(*) from druid_table_1
INFO  : Completed executing command(queryId=hive_20231030215514_7536b3d8-9a8a-4a50-9c19-7fb8896c3c93); Time taken: 0.02 seconds
Error: Unable to get the next row set with exception: org.apache.hive.druid.org.apache.druid.java.util.common.RE: Failure getting results for query[TimeseriesQuery{dataSource='wikiticker', querySegmentSpec=LegacySegmentSpec{intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z]}, descending=false, virtualColumns=[], dimFilter=null, granularity='AllGranularity', aggregatorSpecs=[CountAggregatorFactory{name='$f0'}], postAggregatorSpecs=[], limit=2147483647, context={queryId=hive_20231030215514_7536b3d8-9a8a-4a50-9c19-7fb8896c3c93, skipEmptyBuckets=false}}] from locations[[Ljava.lang.String;@434f866f] because of [Failure getting results for query[TimeseriesQuery{dataSource='wikiticker', querySegmentSpec=LegacySegmentSpec{intervals=[1900-01-01T00:00:00.000Z/3000-01-01T00:00:00.000Z]}, descending=false, virtualColumns=[], dimFilter=null, granularity='AllGranularity', aggregatorSpecs=[CountAggregatorFactory{name='$f0'}], postAggregatorSpecs=[], limit=2147483647, context={queryId=hive_20231030215514_7536b3d8-9a8a-4a50-9c19-7fb8896c3c93, skipEmptyBuckets=false}}] url[http://localhost:8082/druid/v2/] because of [org.apache.hive.druid.org.jboss.netty.channel.ChannelException: Faulty channel in resource pool]] (state=,code=0)

为了避免这种情况,我添加了来自 beeline 的经纪商地址。

解决方案:

0: jdbc:hive2://odp01.ubuntu.ad.ce:2181,odp02> SET hive.druid.broker.address.default=10.90.6.51:8082;

然后重新运行查询,这次能够成功了

0: jdbc:hive2://odp01.ubuntu.ad.ce:2181,odp02> select count(*) from druid_table_1;
+--------+
|  $f0   |
+--------+
| 24433  |
+--------+
© www.soinside.com 2019 - 2024. All rights reserved.