项目作者: Gaurav-Pande

项目描述 :
Auto encoders based recommendation system
高级语言: Python
项目地址: git://github.com/Gaurav-Pande/Recommendation_systems.git
创建时间: 2020-08-08T16:34:40Z
项目社区:https://github.com/Gaurav-Pande/Recommendation_systems

开源协议:

下载


Recommendation Systems

AutoEncoders Based Recommendation System
This project is based on building a recommendation system based on Deep learning based methods. This repo contains Autoencoders based learning framework to recommend top 10 items for a specific provided user.

Layers Underneath:

image

In the above you can see that while the user see the relevant content, but there are several hidden layers behind the scene. We can visualize these layers as the following steps:

  • The first step is to solve the cold start problem, where when a user logs in for the first time, then what recommendations to show. In that case it make sense to recommend the content based on the profile of the user. so we maintain a profile of a user and then based on that user profile we can look for similar labs from the database.

  • The second step is the collaborating filtering to find correlations between different users. To understand the collborating filtering better consider the follwowing figure:

image

The main problem in collaborating filtering is that we have a sparse matrix of size mn or users_lenitems_len, because usually any user interact with few of the items in the catalog. so the goal of the collaborating filtering is to fill the sparse matrix with the ratings. For each user we would like to prdict what would be the rating that this user will give for all the items in the catalog, and this is where the model needs to learn the embeddings for user and items. Using these embeddings/weights one can easily predict the ratings of the user of items as shown in the above figure. You can think of these embeddings as the learning the weights for each genres in a movie(hidden features).

  • The third step is the implementation of these algorithms, for cold start it can be a simple lookup on database. for collaborating filtering Auto encoders can be used to learn these latent features and then using those weights we can predict the rating and complete the sparse matrix.

  • The final layer in the figure 1 is data layer, which is the most crucial. The data can be categorize into 2 parts: Implicit feedback which can be in the form of clickstream data and the other one is the explicit rating which a user give once he or she finshed evaluating the item/product/movie.

Data:

The data used for this project is genereted synthetically. all the userids, rating, and itemids are generated synthetically. To have a good mixture of interaction the feedback is sampled from a normally distributed data, and the typical graph for it looks like this:

image

The reason to do that to have a good mixture of all the rating. These ratings can be interpreted either as implicit feedback data(like a clickstream data) or as explicit(ratings provided by users explicitly)

There are total 5000+ users and 252 product_ids. Since this data is not a lot, any ML based algorithms like svd,knn, etc will work well.

Dependencies:

To install all the project dependencies, please run:

  1. pip install -r requirements.txt

Directory structure:

This repo contains the following directory structure:
deep-learning/data ==> all csv files should be here
deep-learning/model ==> model definition is written here
model/autoencoders.py ==> model definition
model/train_model.py ==> evaluation metrics definitoins
deep-learning/preprocess_data.py ==> this contains the preprocessing of data. It varies from synthetically generating datapoints, to normalize data, divide the data into train, validation, and test sets.
deep-learning/train.py ==> this contains the main code to run the training for the model

The Model:

image
It is a simplistic Auto Encoders based model as shown the above figure with:

  • 2 encoder and 2 decoder layers
  • Sigmoid functio at outpur layer to get the probabilistic output.
  • Loss function used is Mean Square Error.
  • Optimizer used for regularization is ADAM. You can try other as well like RMSprop,etc.

Hyperparameter:

  1. args = {
  2. 'epochs':50,
  3. 'learning_rate':2e-3,
  4. 'batch_size':64,
  5. 'l2_reg':False,
  6. 'lambda_value':0.01,
  7. 'input_size':252,
  8. 'hidden_dim1':128,
  9. 'hidden_dim2':64,
  10. 'train_data_size':200
  11. }

You can adjust these hypereparameters to tune the model more accurately.

Train the model:

To train the model simply run:

  1. cd deep-learning
  2. python train.py

The output should look like:

  1. ============
  2. Epoch : 1
  3. ============
  4. Train error: 1.5576547754777443 || Validation error: 1.4711113996338772
  5. ============
  6. Epoch : 2
  7. ============
  8. Train error: 1.5225750103190139 || Validation error: 1.4570075812492562
  9. ============
  10. Epoch : 3
  11. ============
  12. Train error: 1.5150998307240975 || Validation error: 1.4545075721020162
  13. ============
  14. Epoch : 4
  15. ============
  16. Train error: 1.5124895757920034 || Validation error: 1.4522907999160264
  17. ============
  18. Epoch : 5
  19. ============
  20. Train error: 1.50946706533432 || Validation error: 1.4519942571722035
  21. ============
  22. Epoch : 6
  23. ============
  24. Train error: 1.5079724981978133 || Validation error: 1.4479610284334004
  25. ============
  26. Epoch : 7
  27. ============
  28. Train error: 1.504904461068076 || Validation error: 1.4444621093951029
  29. ============
  30. Epoch : 8
  31. ============
  32. Train error: 1.5038418584578745 || Validation error: 1.4449347356752453
  33. ============
  34. Epoch : 9
  35. ============
  36. Train error: 1.5031312373844352 || Validation error: 1.444345810070406
  37. ============
  38. Epoch : 10
  39. ============
  40. Train error: 1.5014506908687386 || Validation error: 1.4420791728544855
  41. --------------------------------------------------------------------------
  42. Test error 1.9924600295060275
  43. Training done, Please enter the User name to show recommendations : nle
  44. Top 10 recommendations for the user: nle are :
  45. ITEM_ID PROB_RECOM(Probability)
  46. 835926 HOL-2002-02-CMP-HOL 1.000000
  47. 835934 HOL-2004-01-SDC-HOL 1.000000
  48. 835912 HOL-2001-01-CMP-HOL 0.934366
  49. 835923 HOL-2001-91-ISM-HOL 0.815368
  50. 836094 HOL-2081-01-HBD-HOL 0.367434
  51. 835931 HOL-2003-01-NET-HOL 0.325930
  52. 835983 HOL-2013-01-ISM-HOL 0.000783
  53. 836033 HOL-2037-01-NET-HOL 0.000479
  54. 836093 HOL-2080-01-ISM-HOL 0.000115
  55. 836080 HOL-2052-01-ISM-HOL 0.000060

[ NOTE ]: Currentely the code is not integrated to apis. so currently it runs in a demo style.

[Future Work]:

  • CUDA/ GPU support to the code
  • Data integration from supercollider
  • Mode checkpoints save
  • Hyperparameter tunning
  • Implement variational encoders

License:

MIT