我有以下查询,我正在Couchbase企业(6.0.2内部版本2413)中针对约10亿个文档运行。根据此查询创建的性能最高的索引是什么? (希望在特定时间段内完成报告,因此实现索引的最大速度是主要目标)
select LogJobID, LoggingType as LoggingTypeID, count(*) as AffectedLineCount
from (
select Max([CreateDate, SequenceID, a])[2].LoggingType, Max([CreateDate, SequenceID, a])[2].LogJobID
from `LogBucket` a
where LoggingType in [3001, 4004, 6002]
group by LogFileID, RowKey) as a
group by a.LoggingType, a.LogJobID
我尝试创建以下索引:
CREATE INDEX `data_job_productivity_index1`
ON `LogBucket`(`LogFileID`,`RowKey`,`LoggingType`,`LogJobID`,`CreateDate`,`SequenceID`)
PARTITION BY hash((meta().`id`)) WHERE (`LoggingType` in [3001, 4004, 6002])
但是当我进行解释时,它使用了不同的索引(一个专用于不同的报告查询的索引。)>
{ "plan": { "#operator": "Sequence", "~children": [ { "#operator": "Sequence", "~children": [ { "#operator": "IndexScan3", "as": "a", "index": "analyst_log_LogJob_activity", "index_id": "f85999b9b7cc0d3f", "index_projection": { "primary_key": true }, "keyspace": "LogBucket", "namespace": "default", "spans": [ { "exact": true, "range": [ { "high": "3001", "inclusion": 3, "low": "3001" } ] }, { "exact": true, "range": [ { "high": "4004", "inclusion": 3, "low": "4004" } ] }, { "exact": true, "range": [ { "high": "6002", "inclusion": 3, "low": "6002" } ] } ], "using": "gsi" }, { "#operator": "Fetch", "as": "a", "keyspace": "LogBucket", "namespace": "default" }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "Filter", "condition": "((`a`.`LoggingType`) in [3001, 4004, 6002])" }, { "#operator": "InitialGroup", "aggregates": [ "max([(`a`.`CreateDate`), (`a`.`SequenceID`), `a`])" ], "group_keys": [ "(`a`.`LogFileID`)", "(`a`.`RowKey`)" ] } ] } }, { "#operator": "IntermediateGroup", "aggregates": [ "max([(`a`.`CreateDate`), (`a`.`SequenceID`), `a`])" ], "group_keys": [ "(`a`.`LogFileID`)", "(`a`.`RowKey`)" ] }, { "#operator": "FinalGroup", "aggregates": [ "max([(`a`.`CreateDate`), (`a`.`SequenceID`), `a`])" ], "group_keys": [ "(`a`.`LogFileID`)", "(`a`.`RowKey`)" ] }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "InitialProject", "result_terms": [ { "expr": "((max([(`a`.`CreateDate`), (`a`.`SequenceID`), `a`])[2]).`LoggingType`)" }, { "expr": "((max([(`a`.`CreateDate`), (`a`.`SequenceID`), `a`])[2]).`LogJobID`)" } ] }, { "#operator": "FinalProject" } ] } } ] }, { "#operator": "Alias", "as": "a" }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "InitialGroup", "aggregates": [ "count(*)" ], "group_keys": [ "(`a`.`LoggingType`)", "(`a`.`LogJobID`)" ] } ] } }, { "#operator": "IntermediateGroup", "aggregates": [ "count(*)" ], "group_keys": [ "(`a`.`LoggingType`)", "(`a`.`LogJobID`)" ] }, { "#operator": "FinalGroup", "aggregates": [ "count(*)" ], "group_keys": [ "(`a`.`LoggingType`)", "(`a`.`LogJobID`)" ] }, { "#operator": "Parallel", "~child": { "#operator": "Sequence", "~children": [ { "#operator": "InitialProject", "result_terms": [ { "expr": "(`a`.`LogJobID`)" }, { "as": "LoggingTypeID", "expr": "(`a`.`LoggingType`)" }, { "as": "AffectedLineCount", "expr": "count(*)" } ] }, { "#operator": "FinalProject" } ] } } ] }, "text": "select LogJobID, LoggingType as LoggingTypeID, count(*) as AffectedLineCount\nfrom (\n select Max([CreateDate, SequenceID, a])[2].LoggingType, Max([CreateDate, SequenceID, a])[2].LogJobID\n from `LogBucket` a\n where LoggingType in [3001, 4004, 6002]\n group by LogFileID, RowKey) as a\ngroup by a.LoggingType, a.LogJobID" }
它选择使用的索引是这样创建的:
CREATE INDEX `analyst_log_LogJob_activity` ON `LogBucket`(`LoggingType`,`LogJobID`) PARTITION BY hash((meta().`id`))
第二个索引的问题在于,该索引下包含所有10亿个文档,而由于LoggingType where子句,我试图为该新报表创建/专用的文档将大大减少。
我有以下查询,我正在Couchbase企业(6.0.2内部版本2413)中针对约10亿个文档运行。根据此查询创建的性能最高的索引是什么? (...
您可以如下创建覆盖索引。仅当所有查询使用相同的LoggingType值时,才使用索引WHERE子句。