lucene - Languages in Apache Solr -


i looking solution expanding current apache solr (4.x) such can used support large amount of languages. take multicore approach, , have set solr has english core japanese core (for starters). challenge things, given n .xml files contain data solr use index. clear:

i have n languages , have n .xml files (one .xml per language). each .xml file identical in terms of markups, raw text different.

my issue can't seem figure out how post english.xml file strictly english core , japanese.xml file strictly japanese core, when visit page at:

www.example.com/us/index.html, looking @ english.xml indexed results, and

www.example.com/jp/index.html gives me japanese.xml indexed results.

there needs 1 schema because different language .xml files structured identically tagwise, duplicated of them because each schema file optimized it's respective language.

if (tldr) {

how independently post: english.xml -> core-english japanese.xml -> core-japanese   or better approach gives me facet , search independent groups can localize pages? 

}

obviously don't want have n different instance of solr running.

benjamin, approach perfect. multicore great way it.

suppose server @ ip 10.10.10.10, , solr running under port 8983, multicore should like:

10.10.10.10:8983/solr/us  10.10.10.10:8983/solr/jp  10.10.10.10:8983/solr/fr 

...and on

couple of things keep in mind:

  • each core have own conf folder in it
  • inside each conf folder, have solrconfig.xml, schema.xml, synonyms.txt , other config files specific country
  • field definition different every country, specified in schema.xml
  • eg: title field of fieldtype text_general while text_fr france

posting xml

this how post content of various xml files different countries:

us:

curl http://10.10.10.10:8983/solr/us/update?commit=true -h "content-type: text/xml" --data-binary '<add><doc><field name="id">1</field><field name="title">first item</field></doc><doc><field name="id">2</field><field name="title">second item</field></doc></add>' 

fr:

curl http://10.10.10.10:8983/solr/fr/update?commit=true -h "content-type: text/xml" --data-binary '<add><doc><field name="id">1</field><field name="title">premier article</field></doc><doc><field name="id">2</field><field name="title">deuxième article</field></doc></add>' 

jp:

curl http://10.10.10.10:8983/solr/jp/update?commit=true -h "content-type: text/xml" --data-binary '<add><doc><field name="id">1</field><field name="title">最初の項目</field></doc><doc><field name="id">2</field><field name="title">番目の項目</field></doc></add>' 

searching

you can search each country independently querying core:

search query us:

http://10.10.10.10:8983/solr/us/select?query=john 

search query jp:

http://10.10.10.10:8983/solr/jp/select?query=ジョン 

Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -