我有一个 mongo 集合,以多种语言存储城市/国家数据。例如,以下查询:
db.cities_database.find({ "name.pl.country": "Węgry" }).pretty().limit(10);
返回以下格式的数据:
[
{
_id: ObjectId('67331d2a9566994a18c505aa'),
geoname_id_city: 714073,
latitude: 46.91667,
longitude: 21.26667,
geohash: 'u2r4guvvmm4m',
country_code: 'HU',
population: 7494,
estimated_radius: 400,
feature_code: 'PPL',
name: {
pl: { city: 'Veszto', admin1: null, country: 'Węgry' },
ascii: { city: 'veszto', admin1: null, country: null },
lt: { city: 'Veszto', admin1: null, country: 'Vengrija' },
ru: { city: 'Veszto', admin1: null, country: 'Венгрия' },
hu: { city: 'Veszto', admin1: null, country: 'Magyarország' },
en: { city: 'Veszto', admin1: null, country: 'Hungary' },
fr: { city: 'Veszto', admin1: null, country: 'Hongrie' }
}
}
...
]
我希望能够在仅使用英文字符时使用相同的查询,因此对于这个示例,我想通过
"name.pl.country": "Wegry"
进行查询(而不是字符 ę
我希望 Mongo 将其视为 e
,而执行此查询)。
有可能实现这个目标吗?
到目前为止,我尝试使用这样的排序规则:
db.cities_database.find({ "name.pl.country": "Wegry" }).collation({ locale: "pl", strength: 1 }).pretty().limit(10);
但此查询不返回任何内容。
icuFolding
设置自定义分析器来执行不区分变音符号的搜索。
索引:
{
"analyzer": "diacriticFolder",
"mappings": {
"fields": {
"name": {
"type": "document",
"fields": {
"pl": {
"type": "document",
"fields": {
"country": {
"analyzer": "diacriticFolder",
"type": "string"
}
}
}
}
}
}
},
"analyzers": [
{
"name": "diacriticFolder",
"charFilters": [],
"tokenizer": {
"type": "keyword"
},
"tokenFilters": [
{
"type": "icuFolding"
}
]
}
]
}
$search
查询:
[
{
$search: {
"text": {
"query": "Wegry",
"path": "name.pl.country"
}
}
}
]