我想只在查询期间使用同义词文件(而不是在索引期间)。
谁能告诉我怎么做?
我使用的是Solr 8.5.0。
我能够让同义词工作,但只有在索引期间才会这样做,就像这样。
% cat docs/csv/3.csv
id,text
1,happy to be here
2,here is where you want to be
3,aaafoo
4,aaabar
5,bbbfoo
% echo "randomWord => here" > server/solr/configsets/_default/conf/synonyms.txt
% bin/./solr create -c tmpCollection -s 2 -rf 2; bin/./post -c tmpCollection -type text/csv -out yes docs/csv/3.csv
% curl "http://localhost:8983/solr/tmpCollection/select?q=text:randomword"
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":42,
"params":{
"q":"text:randomword"}},
"response":{"numFound":2,"start":0,"maxScore":0.28847915,"docs":[
{
"id":"2",
"text":["here is where you want to be"],
"_version_":1666549776061038592},
{
"id":"1",
"text":["happy to be here"],
"_version_":1666549777231249408}]
}}
但如果我在创建索引后更新synonyms.txt文件 那就完全不会影响查询了
% echo "anotherWord => here" >> server/solr/configsets/_default/conf/synonyms.txt
% curl "http://localhost:8983/solr/tmpCollection/select?q=text:anotherword"
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":10,
"params":{
"q":"text:anotherword"}},
"response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
}}
% bin/./solr stop -all; bin/./solr start -e cloud -noprompt
% curl "http://localhost:8983/solr/tmpCollection/select?q=text:anotherword"
{
"responseHeader":{
"zkConnected":true,
"status":0,
"QTime":73,
"params":{
"q":"text:anotherword"}},
"response":{"numFound":0,"start":0,"maxScore":0.0,"docs":[]
}}
只是为了确保我使用的文件是正确的。
% find . -name "synonyms.txt" |xargs grep -i randomword
./server/solr/configsets/_default/conf/synonyms.txt:randomWord => here
%
我曾想过在索引前添加同义词 但当添加的同义词有几百个时,就会拖慢索引的速度。
任何帮助都将是非常感激的!谢谢!
POST-EDIT:另一种问法是:我有一个现有的索引,所有的文档都已经在里面了。我有一个新的同义词.txt文件,我想附加查询时间。
这可能吗?怎么做?
查询时间同义词过滤器的例子。
<fieldType class="solr.TextField" name="text_general" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
</analyzer>
<analyzer type="query">
<!-- Query time synonym filter -->
<filter class="solr.SynonymGraphFilterFactory" expand="true" ignoreCase="true" synonyms="synonyms.txt"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>