查询计划显示的是对mongo集合中所有行进行的集合扫描。因此,我在where子句列上创建了一个索引,期望Drill选择基于索引的访问计划。但是钻取继续使用全表扫描。要使钻取使用索引还有其他事情要做吗?
下面给出了实际查询,生成的查询计划和mongo索引。
SQL:
Select j.user as User, TO_DATE(j.created_at) as submitted_on
from mongo.example.jobs j
where j.user = '[email protected]' and j.created_at BETWEEN timestamp '2020-03-25 13:12:55' AND timestamp '2020-04-24 13:12:55'
物理计划(通过钻取UI)
00-00 Screen : rowType = RecordType(ANY User, ANY submitted_on): rowcount = 121.2375, cumulative cost = {6720.59875 rows, 23532.19875 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10468
00-01 Project(User=[$0], submitted_on=[TO_DATE($1)]) : rowType = RecordType(ANY User, ANY submitted_on): rowcount = 121.2375, cumulative cost = {6708.475 rows, 23520.075 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10467
00-02 SelectionVectorRemover : rowType = RecordType(ANY user, ANY created_at): rowcount = 121.2375, cumulative cost = {6587.2375 rows, 22913.8875 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10466
00-03 Filter(condition=[AND(=($0, '[email protected]'), >=($1, 2020-03-25 13:12:55), <=($1, 2020-04-24 13:12:55))]) : rowType = RecordType(ANY user, ANY created_at): rowcount = 121.2375, cumulative cost = {6466.0 rows, 22792.65 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10465
00-04 Scan(table=[[mongo, example, jobs]], groupscan=[MongoGroupScan [MongoScanSpec=MongoScanSpec [dbName=example, collectionName=jobs, filters=null], columns=[`user`, `created_at`]]]) : rowType = RecordType(ANY user, ANY created_at): rowcount = 3233.0, cumulative cost = {3233.0 rows, 6466.0 cpu, 895541.0 io, 0.0 network, 0.0 memory}, id = 10464
在MongoDB中创建的索引
{
"v" : 2,
"key" : { "user" : 1, "created_at" : 1, "method_map_id" : 1 },
"name" : "user_1_created_at_1_method_map_id_1",
"ns" : "example.jobs"
}
此外,在钻取文档中,我看到钻取仅支持MapR DB的索引。这是否意味着将不使用诸如mongo之类的其他数据源的索引?
https://drill.apache.org/docs/querying-indexes-introduction/
问题在于处理时间戳过滤器谓词的mongo-storage插件。筛选谓词将按给定顺序在以下模块中进行评估。
MongoPushDownFilterForScan-> MongoFilterBuilder-> MongoCompareFunctionProcessor.process()-> MongoCompareFunctionProcessor.visitSchemaPath()
visitSchemaPath方法的作用类似于值表达式类的getter方法。我看到没有TimestampExpression的处理程序。在下面添加了一段代码,对其进行了重建和测试。
if (valueArg instanceof TimeStampExpression) {
Long unixseconds = ((TimeStampExpression) valueArg).getTimeStamp();
this.value = new Date(unixseconds);
this.path = path;
return true;
}
这使时间戳过滤器传递到mongo查询的过滤器部分。