solr - WordDelimiterFilterFactory not including all permutations -
i have solr index has deal part numbers - worddelimiterfilterfactory
seems ideally suited for. example part number "ch2300-100". i'm expecting following queries match field (and do):
- ch
- ch2300-100
- ch2300100
but following query doesn't match:
- ch2300
looking @ debugging output - combination of word parts isn't generated. expected catenatewords
and/or catenatenumbers
attribute handle case seems not work. missing in configuration allow permutations of tokenized fragments matched?
<schema version="1.5" name="test"> <types> <fieldtype name="text" class="solr.textfield"> <analyzer type="index"> <tokenizer class="solr.whitespacetokenizerfactory" /> <filter class="solr.worddelimiterfilterfactory" generatewordparts="1" generatenumberparts="1" catenatewords="1" catenatenumbers="1" catenateall="1" splitoncasechange="0" preserveoriginal="1" /> </analyzer> <analyzer type="query"> <tokenizer class="solr.whitespacetokenizerfactory" /> </analyzer> </fieldtype> </types> <fields> <field stored="true" name="id" type="text" /> <field stored="true" indexed="true" name="catnum" type="text" /> </fields> <uniquekey>id</uniquekey> </schema>
i suspect 'ch2300' not indexed token because splitonnumerics="1". @ split phase, separates ch , 2300 , applies of generators individually (as catenated tokens).
one option add splitonnumerics="0" filter factory. however, may keep 'ch' matching. option add filter factory @ query time splits on numerics.
edit
a third , possibly better option use shingle filter factory , leave splitonnumerics="1" of ch, 2300, , ch2300 indexed. adding line after word delimiter filter factory should solve problem:
<filter class="solr.shinglefilterfactory" tokenseparator=""/>
Comments
Post a Comment