solr - WordDelimiterFilterFactory not including all permutations -


i have solr index has deal part numbers - worddelimiterfilterfactory seems ideally suited for. example part number "ch2300-100". i'm expecting following queries match field (and do):

  • ch
  • ch2300-100
  • ch2300100

but following query doesn't match:

  • ch2300

looking @ debugging output - combination of word parts isn't generated. expected catenatewords and/or catenatenumbers attribute handle case seems not work. missing in configuration allow permutations of tokenized fragments matched?

<schema version="1.5" name="test">   <types>     <fieldtype name="text" class="solr.textfield">       <analyzer type="index">         <tokenizer class="solr.whitespacetokenizerfactory" />         <filter class="solr.worddelimiterfilterfactory" generatewordparts="1" generatenumberparts="1" catenatewords="1" catenatenumbers="1" catenateall="1" splitoncasechange="0" preserveoriginal="1" />       </analyzer>       <analyzer type="query">         <tokenizer class="solr.whitespacetokenizerfactory" />       </analyzer>     </fieldtype>   </types>   <fields>     <field stored="true" name="id" type="text" />     <field stored="true" indexed="true" name="catnum" type="text" />   </fields>   <uniquekey>id</uniquekey> </schema> 

i suspect 'ch2300' not indexed token because splitonnumerics="1". @ split phase, separates ch , 2300 , applies of generators individually (as catenated tokens).

one option add splitonnumerics="0" filter factory. however, may keep 'ch' matching. option add filter factory @ query time splits on numerics.

edit

a third , possibly better option use shingle filter factory , leave splitonnumerics="1" of ch, 2300, , ch2300 indexed. adding line after word delimiter filter factory should solve problem:

<filter class="solr.shinglefilterfactory" tokenseparator=""/>


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -