非英语语言的词干提取有错误行为？

问题描述投票：0回答：1

我正在做一个西班牙语文本的项目，总结一下，我在西班牙语文档中看到的词干分析器都没有给我带来好的结果（只有 2 个，雪球和普通的），举个例子。

{
  "tokenizer": "standard",
  "filter": [ 
    {
      "type": "snowball",
      "language": "spanish"
    }
  ],
  "text": "alimento, alimentacion"
}

上一个查询返回以下内容：

{
  "tokens" : [
    {
      "token" : "aliment",
      "start_offset" : 0,
      "end_offset" : 8,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "alimentacion",
      "start_offset" : 10,
      "end_offset" : 22,
      "type" : "<ALPHANUM>",
      "position" : 1
    }
  ]

当明确“alimento”和“alimentacion”应该具有相同的词根时，有没有办法寻找其他词干提取器？

elasticsearch stemming spanish

1个回答

0
投票

“营养”的西班牙语单词是 alimentación。您提供的“alimentacion”是错误的，因此不会被正确地词干。

最新问题

© www.soinside.com 2019 - 2024. All rights reserved.