Queries equality in Solr / Lucene -


the problem:

trying recognize 2 different queries same.

for example:

field1:[1 3] or field1:5 

is same query as:

field1:5 or field1:1 or field1:3 or field1:2 

the idea:

is there way normalize query kind of canonical form after being normalized, simple string comparison trick?

for example, above example, both queries become:

field1:1 or field1:2 or field1:3 or field1:5 

and can compare determine whether equal.

or maybe there exists kind of service able determine if 2 queries equal. not find any.

thanks helping.

the main problem aren't identical.

field1:[1 3] range query, , may represent lexicographic range on field, in case match field1:2abcde, or may represent numeric range on floating point field, in case match field1:1.234. other query, field1:1 field1:2 field1:3, can match 3 specified values, neither of 2 examples matched.

also, since fields may multi-valued, more 1 of field1:1 field1:2 field1:3 may have match in same document, make scoring of each different.

to consider simpler case though, how 2 queries can reasonably identical, like:

  • field2:this field1:that
  • field1:that field2:this

those identical, @ least standardqueryparser!

once have run queries through query parser, you'll have query. transforming final query string doesn't tend work well, since query parser syntax isn't capable of expressing type of query object (query.tostring() best used debugging, really).

so you'll need compare query objects.

the output of query.rewrite() readily comparable, believe. provide set of primitive queries dig into. provide needed termqueries range query, gets past issues related initial query not knowing field contents.

neither query nor indexreader implement form of direct comparison between queries. far know, need provide comparator. involve comparing arbitrarily complex nested set of primitive queries (primitive queries include: booleanquery, constantscorequery, customscorequery, disjunctionmaxquery, filteredquery, matchalldocsquery, multiphrasequery, multitermquery, phrasequery, spanquery, termquery, valuesourcequery)


really question not whether queries inherantly identical, we've established aren't. more meaningful question, think, identical regards data in index. in mind, simpler implementation search each query, , compare doc numbers (and possibly scores?) in each result set (topdocs).


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -