确定具有 1007861 个状态和 2653488 个转换的自动机将需要超过 10000 次努力

问题描述 投票:0回答:1

尝试搜索时出现以下错误...

 org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton with 1007861 states and 2653488 transitions would require more than 10000 effort. at
 org.apache.lucene.util.automaton.Operations.determinize(Operations.java:735) at
 org.apache.lucene.search.suggest.analyzing.FuzzySuggester.toLevenshteinAutomata(FuzzySuggester.java:254) at
 org.apache.lucene.search.suggest.analyzing.FuzzySuggester.getFullPrefixPaths(FuzzySuggester.java:195) at
 org.apache.lucene.search.suggest.analyzing.AnalyzingSuggester.lookup(AnalyzingSuggester.java:798) at
 org.apache.lucene.search.suggest.Lookup.lookup(Lookup.java:240) at
 org.apache.solr.spelling.suggest.SolrSuggester.getSuggestions(SolrSuggester.java:248) at
 org.apache.solr.handler.component.SuggestComponent.process(SuggestComponent.java:266) at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:368) at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:216) at
 org.apache.solr.core.SolrCore.execute(SolrCore.java:2637) at
 org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:794) at
 org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:567) at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:357) at
 org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:201) at
 org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601) at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548) at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:602) at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at
 org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624) at
 org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1435) at
 org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501) at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594) at
 org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1350) at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at
 org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191) at
 org.eclipse.jetty.server.handler.InetAccessHandler.handle(InetAccessHandler.java:177) at
 org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146) at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at
 org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322) at
 org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763) at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at
 org.eclipse.jetty.server.Server.handle(Server.java:516) at
 org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:388) at
 org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:633) at
 org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:380) at
 org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) at
 org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at
 org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) at
 org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) at
 org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336) at
 org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313) at
 org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) at
 org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129) at
 org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:383) at
 org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882) at
 org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036) at
 java.base/java.lang.Thread.run(Thread.java:833) 

我认为这与我的同义词文件中的数据有关,如下所示:

accs,access
ally,alley
alwy,alleyway
ambl,amble
app,approach
arc,arcade
artl,arterial
arty,artery
awlk,airwalk
ba,banan
bnk,bank
by,bay
bch,beach
bnd,bend
bwlk,boardwalk
bvde,boulevarde
bwl,bowl
br,brace
bran,brnch,branch
brk,break
bret,brett
bdge,brg,bridge
brdwlk,broadwalk
bdwy,bwy,broadway
brw,brow
bll,bull
bswy,busway
bypa,bypass
bywy,byway
cswy,causeway
ctr,cntr,centre
cnwy,centreway
ch,chase
cir,circle
clt,circlet
crcs,circus
clr,clust,er
clde,colonnade
cmmn,cmn,common
cmmns,cmns,commons
cncd,concord
con,concourse
cntn,connection
cps,copse
cnr,corner
cso,corso
crse,course
ctyd,courtyard
cv,cve,cove
crst,crest
crf,crief
crk,crook,creek
ck,creek
crss,cross
crsg,crossing
cuwy,cruiseway
csac,cul-de-sac
cutt,cutting
dle,dale
de,deviation
dstr,distributor
div,divide
dck,dock
dom,domain
dwn,down
dwns,downs
dvwy,driveway
esmt,easement
edg,edge
elb,elbow
ent,entrance
esp,esplanade
est,estate
exp,expressway
extn,ex,extension
fawy,fairway
fbrk,firebreak
flne,fireline
ftrk,firetrack
fitr,firetrail
flts,flats
folw,follow
ftwy,footway
fshr,foreshore
form,formation
fwy,freeway
frnt,front
frtg,frontage
gdn,garden
gdns,gardens
gte,gate
gwy,gateway
glde,glade
gra,grange
grn,green
gly,gully
hrbr,harbour
hvn,haven
hth,heath
hts,heights
hird,highroad
hllw,hollow
inlt,inlet
intg,interchange
id,island
jnc,junction
knol,knoll
ladr,ladder
ldg,landing
lnwy,laneway
ledr,leader
lkt,lookout
lynn,lynne
manr,manor
mz,maze
mndr,meander
mtwy,motorway
n,nth,north
otlt,outlet
otlk,outlook
plms,palms
prds,paradise
pwy,pkwy,parkway
psge,passage
pway,pathway
psla,peninsula
piaz,piazza
plza,plaza
pkt,pocket
pnt,point
prec,precinct
prom,promenade
prst,pursuit
qdrt,quadrant
qy,quay
qys,quays
rmbl,ramble
rnge,range
rch,reach
res,reserve
rtt,retreat
rtn,return
rdge,ridge
rofw,right of way
rsng,rising
rvr,river
rds,roads
rdwy,roadway
rty,rotary
rnd,round
rte,route
svwy,serviceway
shun,shunt
skln,skyline
slpe,slope
s,sth,south
sq,square
stps,steps
strt,straight
stai,strait
stra,strnd,strand
st,str,street,saint,rd,road,av,ave,avenue,dr,drv,drive,ct,crt,court,cr,crs,cresc,crescent,pl,plc,place,wy,way,cl,close,pde,parade,hwy,highway,cct,circuit,ln,lane,tce,terrace,bvd,blv,blvd,boulevard,gr,grv,grove
strp,strip
sbwy,subway
thfr,thoroughfare
thru,throughway
tlwy,tollway
trk,track
trl,trail
tmwy,tramway
tvse,traverse
tkwy,trunkway
tunl,tunnel
upas,underpass
vlly,valley
viad,viaduct
vw,view
vws,views
vlla,villa
vlge,village
vlls,villas
vsta,vista
wkwy,walkway
wtrs,waters
wtwy,waterway
whrf,wharf
wd,wood
wds,woods
mt,mount
lt,little
e,east
w,west
cn,central
lr,lower
ml,mall
ne,north east
nw,north west
sw,south west
up,upper
in,inner
op,overpass
ot,outer
unti,unit,shp,sh,shop,suite,su,apt,aptmt,aprtmt,aprt,apartment,flt,flat,sit,site
cars,carspace,vlla,vil,villa,offc,off,office,tnhs,tnh,thse,townhouse,sute
se,south east
ant,antenna
blck,blk,block
bldg,bld,building
bngw,bung,bungalow
btsd,boatsd,boatshed
cg,cge,cage
carp,cpk,carpark
clb,club
cool,coolroom
ctge,ctg,cottage
dupl,dup,dpl,duplex
fcty,fact,factory
grge,gar,garg,garage
hl,hall
hse,hous,hs,hou,house
ksk,kio,kiosk
lbby,lby,lob,lobby
lft,lof,loft
lse,ls,lease
mbth,marbth,marine berth
msnt,maisonette
pths,pent,penthouse
r,rr,rear
resv,res,reserve
rm,room
sec,sc,section
shd,shed
shr,shrm,showroom
sgn,sn,sign
stll,stall
stor,store
stu,stud,std,studio
subs,substation
tncy,tnc,tency,tenancy
twr,tr,tw,tower
u,un,unt,unit
vlt,vault
wrd,ward
whs,whse,warehouse
wksh,wks,wksp,workshop
one,1
two,2
three,3
four,4
five,5
six,6
seven,7
eight,8
nine,9
ten,10

搜索字段的架构定义如下:

<field name="street_name_code" type="text_exact_fuzzy" indexed="true" stored="true"/>

<fieldType name="text_exact_fuzzy" class="solr.TextField" omitNorms="false">
 <analyzer type="index">
   <filter class="solr.PatternReplaceFilterFactory" pattern="([|]+)" replacement=" " replace="all"/>      
   <tokenizer class="solr.StandardTokenizerFactory"/>
   <filter class="solr.LowerCaseFilterFactory"/>
   <filter class="solr.FlattenGraphFilterFactory"/>        
 </analyzer>
 <analyzer type="query">
   <filter class="solr.PatternReplaceFilterFactory" pattern="([,]+)" replacement=" " replace="all"/>      
   <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
   <tokenizer class="solr.StandardTokenizerFactory"/>
   <filter class="solr.LowerCaseFilterFactory"/>
   <filter class="solr.SynonymGraphFilterFactory" synonyms="mysynonyms.txt"/>
   <filter class="solr.FlattenGraphFilterFactory"/>        
 </analyzer>
</fieldType>

suggest组件的定义如下:

  <searchComponent name="suggest" class="solr.SuggestComponent">
    <lst name="suggester">
      <str name="name">mySuggester</str>
      <str name="lookupImpl">FuzzyLookupFactory</str>
      <str name="exactMatchFirst">true</str>
      <str name="dictionaryImpl">DocumentDictionaryFactory</str>
      <str name="field">street_name</str>
      <str name="suggestAnalyzerFieldType">text_exact_fuzzy</str>
      <str name="buildOnCommit">true</str>
      <str name="maxEdits">1</str>
      <str name="storeDir">fuzzy_suggestions</str>
    </lst>    
  </searchComponent> 

我试图匹配的数据是:

NELSON BAY|RD||NELSON BAY

抛出错误的查询是:

http://localhost:8986/solr/street_names/suggest?suggest.dictionary=mySuggester&wt=xml&suggest.q=NELSON+BAY+DRIVE+DRIVE+ERN+BAY

有效的方法是(区别仅在于以下查询中的一个驱动器):

http://localhost:8986/solr/street_names/suggest?suggest.dictionary=mySuggester&wt=xml&suggest.q=NELSON+BAY+DRIVE+ERN+BAY

谁能告诉我如何排列同义词文件,以便消除这个错误?

solr
1个回答
0
投票

要解决您遇到的错误,确保您的同义词文件经过优化以防止过于复杂的自动机至关重要。

TooComplexToDeterminizeException
表示同义词扩展正在生成大量状态和转换。以下是解决该问题的分步方法:

  1. 查看同义词组合:简化同义词文件以降低复杂性。例如,将具有多个同义词的条目分解为单独的行或减少同义词的数量。

  2. 调整

    maxEdits
    参数:建议器配置中的
    maxEdits
    参数设置为
    1
    。如果合适,请考虑增加该值,因为它控制模糊程度。

  3. 增量测试:通过小批量添加同义词并检查错误来逐步测试同义词。这有助于识别导致问题的特定条目。

  4. 优化分析器:确保模式中应用的分析器能够有效地处理同义词。微调过滤器和分词器设置可能会有所帮助。

以下是如何简化同义词文件的示例:

accs,access
ally,alley
alwy,alleyway
ambl,amble
app,approach
arc,arcade
artl,arterial
arty,artery
awlk,airwalk
ba,banan
bnk,bank

此外,查看有关 Solr 同义词处理的详细文档可以提供进一步的见解。有关优化 Solr 设置的更全面的指导,请考虑查看 teò 等资源。

这种方法应该可以帮助您简化同义词处理并减轻您遇到的错误。如果问题仍然存在,可能需要进一步分析特定同义词条目及其引入的复杂性。

© www.soinside.com 2019 - 2024. All rights reserved.