Mahout Recommendation performance issues -
i have been working mahout create recommendation engine based on following data:
- 100k users
- 10k items
- 4m ratings
i'm running on tomcat following jvm arguments :
-xms1024m -xmx1024m -da -dsa -xx:newratio=9 -server
recommendations took 6s, seems slow ! how improve mahout performances ?
i'm using following code :
this part run once @ startup :
jdbcdatamodel jdbcdatamodel = new mysqljdbcdatamodel(datasource); datamodel = new reloadfromjdbcdatamodel(jdbcdatamodel); itemsimilarity similarity = new cachingitemsimilarity(new euclideandistancesimilarity(model), model); samplingcandidateitemsstrategy strategy = new samplingcandidateitemsstrategy(10, 5); recommender = new cachingrecommender(new genericitembasedrecommender(model, similarity, strategy, strategy));
and, every user request :
recommender.recommend(userid, howmany);
i suggest different approach. use nightly job, pre-calculate recommendations users, , load results nightly mysql table. make showing recommendations nothing more simple db call.
since have 10k items, calculating recommendations single user mahout has internally multiply (10k x 10k) matrix (10k x 1) matrix. , 6 seconds seems quite fast considering size. reference
now if use recommenderjob on hadoop , aws emr, take ~ <10 mins process data on scale. or can same job in non-distributed way, using loop , pre-calculating users sequentially. downside recommendations behind 1 day or 6 hours or whatever frequency choose job.
Comments
Post a Comment