lucene - Languages in Apache Solr -
i looking solution expanding current apache solr (4.x) such can used support large amount of languages. take multicore approach, , have set solr has english core japanese core (for starters). challenge things, given n .xml files contain data solr use index. clear:
i have n languages , have n .xml files (one .xml per language). each .xml file identical in terms of markups, raw text different.
my issue can't seem figure out how post english.xml file strictly english core , japanese.xml file strictly japanese core, when visit page at:
www.example.com/us/index.html, looking @ english.xml indexed results, and
www.example.com/jp/index.html gives me japanese.xml indexed results.
there needs 1 schema because different language .xml files structured identically tagwise, duplicated of them because each schema file optimized it's respective language.
if (tldr) {
how independently post: english.xml -> core-english japanese.xml -> core-japanese or better approach gives me facet , search independent groups can localize pages?
}
obviously don't want have n different instance of solr running.
benjamin, approach perfect. multicore great way it.
suppose server @ ip 10.10.10.10
, , solr running under port 8983, multicore should like:
10.10.10.10:8983/solr/us 10.10.10.10:8983/solr/jp 10.10.10.10:8983/solr/fr
...and on
couple of things keep in mind:
- each core have own conf folder in it
- inside each conf folder, have solrconfig.xml, schema.xml, synonyms.txt , other config files specific country
- field definition different every country, specified in schema.xml
- eg: title field of fieldtype text_general while text_fr france
posting xml
this how post content of various xml files different countries:
us:
curl http://10.10.10.10:8983/solr/us/update?commit=true -h "content-type: text/xml" --data-binary '<add><doc><field name="id">1</field><field name="title">first item</field></doc><doc><field name="id">2</field><field name="title">second item</field></doc></add>'
fr:
curl http://10.10.10.10:8983/solr/fr/update?commit=true -h "content-type: text/xml" --data-binary '<add><doc><field name="id">1</field><field name="title">premier article</field></doc><doc><field name="id">2</field><field name="title">deuxième article</field></doc></add>'
jp:
curl http://10.10.10.10:8983/solr/jp/update?commit=true -h "content-type: text/xml" --data-binary '<add><doc><field name="id">1</field><field name="title">最初の項目</field></doc><doc><field name="id">2</field><field name="title">番目の項目</field></doc></add>'
searching
you can search each country independently querying core:
search query us:
http://10.10.10.10:8983/solr/us/select?query=john
search query jp:
http://10.10.10.10:8983/solr/jp/select?query=ジョン
Comments
Post a Comment