我正在使用hibernate-search和hibernate-search-elasticsearch版本5.10.3.Final。我想在某些领域应用ICU转换。以下是elasticsearch文档中的过滤器:
https://www.elastic.co/guide/en/elasticsearch/plugins/5.6/analysis-icu-transform.html
但我找不到hibernate-search依赖关系使用的lucene版本中的TokenFilterFactory。在TokenFilterDef中,工厂属性是必需的。有人知道如何通过hibernate-search实现音译吗?
您可以使用注释并依赖org.hibernate.search.elasticsearch.analyzer.ElasticsearchTokenFilterFactory
来创建JSON令牌过滤器定义:
@AnalyzerDef(
name = "myAnalyzer",
tokenizer = ...,
filter = @TokenFilterDef(
name = "myLatinTransform",
factory = ElasticsearchTokenFilterFactory.class,
params = {
@Parameter(name = "type", value = "'icu_transform'"),
@Parameter(name = "id", value = "'Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC'")
}
)
)
注意:参数值被解释为JSON,因此必须引用字符串值。但是,为方便起见,允许使用单引号。
或者,您可以通过编程方式定义分析器,并从更自然的API中受益:
# In hibernate.properties
hibernate.search.elasticsearch.analysis_definition_provider com.acme.CustomAnalyzerProvider
public class CustomAnalyzerProvider implements ElasticsearchAnalysisDefinitionProvider {
@Override
public void register(ElasticsearchAnalysisDefinitionRegistryBuilder builder) {
builder.analyzer( "myAnalyzer" )
.withTokenizer( "whitespace" )
.withTokenFilter( "myLatinTransform" );
builder.tokenFilter( "myLatinTransform" )
.type( "icu_transform" )
.param( "id", "Any-Latin; NFD; [:Nonspacing Mark:] Remove; NFC" );
}
}