Mahout Recommendation performance issues -


i have been working mahout create recommendation engine based on following data:

  • 100k users
  • 10k items
  • 4m ratings

i'm running on tomcat following jvm arguments :

-xms1024m -xmx1024m -da -dsa -xx:newratio=9 -server 

recommendations took 6s, seems slow ! how improve mahout performances ?

i'm using following code :

this part run once @ startup :

jdbcdatamodel jdbcdatamodel = new mysqljdbcdatamodel(datasource); datamodel = new reloadfromjdbcdatamodel(jdbcdatamodel);  itemsimilarity similarity = new cachingitemsimilarity(new euclideandistancesimilarity(model), model); samplingcandidateitemsstrategy strategy = new samplingcandidateitemsstrategy(10, 5); recommender = new cachingrecommender(new genericitembasedrecommender(model, similarity, strategy, strategy)); 

and, every user request :

recommender.recommend(userid, howmany); 

i suggest different approach. use nightly job, pre-calculate recommendations users, , load results nightly mysql table. make showing recommendations nothing more simple db call.

since have 10k items, calculating recommendations single user mahout has internally multiply (10k x 10k) matrix (10k x 1) matrix. , 6 seconds seems quite fast considering size. reference

now if use recommenderjob on hadoop , aws emr, take ~ <10 mins process data on scale. or can same job in non-distributed way, using loop , pre-calculating users sequentially. downside recommendations behind 1 day or 6 hours or whatever frequency choose job.


Comments

Popular posts from this blog

php - Calling a template part from a post -

Firefox SVG shape not printing when it has stroke -

How to mention the localhost in android -