我有一个 mongo 聚合查询,如下所示:
db.someCollection.aggregate([
{
$match: { taskId: "qy7u17-xunwqu" }
},
// Group by "tracklet_id" and calculate count for each group
{
$group: {
_id: '$tracklet_id',
count: { $sum: 1 },
representativeImage: { $first: '$img' }, // when I remove this, the query is done in a split second
timestamp: { $max: '$timestamp' },
},
},
{
$project: {
_id: 0,
trackletId: '$_id',
image: '$representativeImage', // but at the end, I want one representative image for a tracklet, doesn't matter which one.
timestamp: 1,
count: 1,
},
},
{
$sort: {
timestamp: -1
}
},
{
$limit: 20
},
], {allowDiskUse: true})
图像字段包含长b64字符串,占用大量内存,导致查询的分组和排序阶段移动到磁盘。 有没有一种方法可以在项目之前添加管道步骤,以便为每个 tracklet 重新包含图像字段?
我能想到的一种替代方法是随后执行单独的查询来获取图像并组合结果,但我希望有一种更优雅的方法可以在同一个聚合查询中执行此操作。
经OP验证。
不要在
$group
阶段获取图像,而是从自我查找中获取图像,在该自我查找中,您只限制 1 个与相同 trackletid
匹配的文档
db.someCollection.aggregate([
{ $match: { taskId: "qy7u17-xunwqu" } },
{
$group: {
_id: "$tracklet_id",
count: { $sum: 1 },
timestamp: { $max: "$timestamp" }
}
},
{ $project: { _id: 0, trackletId: "$_id", timestamp: 1, count: 1 } },
{ $sort: { timestamp: -1 } },
{ $limit: 20 },
{
$lookup: {
from: "someCollection",
let: { trackletId: "$trackletId" },
pipeline: [
{ $match: { $expr: { $eq: [ "$tracklet_id", "$$trackletId" ] } } },
{ $limit: 1 },
{ $project: { _id: 0, img: 1 } }
],
as: "image"
}
},
{ $addFields: { image: { $arrayElemAt: [ "$image.img", 0 ] } } }
])