Recommendation model ensemble
PySpark implementation of recommender ensemble (Bagging ensemble for now).
More details about the best practices for building recommendation systems can be found at Recommenders GitHub.
Bagging (Bootstrap aggregating) is
a machine learning (ML) ensemble method designed to improve the stability and accuracy of
ML algorithms used in statistical classification and regression.
One of the most successful application of Bagging is Random Forest.
The method implemented here uses the exact same approach as the conventional Bagging ensemble:
Train M recommender models (base models) with bootstrapping of a training set
To predict item ratings, generate M predictions by using the base models and then average
the predicted ratings for each item
For recommending top k items, on the other hand, generate M recommendation lists of k items
with the base models, combine the list.
Currently, this repo implements three combining methods average
, sum
, and count
.
For more details about how to use the module, see the example notebook
which utilizes multiple ALS
for movie recommendation
Top-k (=10) recommendation performance metrics on MovieLens 100k dataset