Adversarial learning framework to enhance long-tail recommendation in Neural Collaborative Filtering
This repository contains the training and testing codes for the Generative Adversarial learning framework for Neural Collaborative Filtering (NCF) models, which aims to enhance long-tail item recommendations.
If this code helps you in your research, please cite the following publication:
Krishnan, Adit, et al. “An Adversarial Approach to Improve Long-Tail Performance in Neural Collaborative Filtering.” Proceedings of the 27th ACM International Conference on Information and Knowledge Management. ACM, 2018.
Getting Started
These instructions will help you setup the proposed model on your local machine.
Our framework can be compiled on Python 2.7+ environments with the following modules installed:
These requirements may be satisified with an updated Anaconda environment as well - https://www.anaconda.com/
You will need the following files for running our model:
item_counts.csv: CSV file containing userId, itemId, and rating (given by user to item) separated by comma (,)
item_list.txt: List of item ids.
unique_item_id.txt: Items to use for training and testing (say, only use items rated by atleast 5 users)
item2id.txt: Mapping which makes item ids in unique_item_id sequential (0 to num_item), tab-separated
profile2id.txt: Mapping which makes user ids sequential (0 to num_user), tab-separated
niche_items.txt: Items which are niche (original ids)
train_GAN.csv: CSV file containing pairs of userId (mapped), itemId (mapped) with rating greater than an application-specific threshold
train_GAN_popular.csv: userId (mapped), itemId (mapped) pairs of niche items
train_GAN_niche.csv: userId (mapped), itemId (mapped) pairs of popular items (unique_items - niche items)
validation_tr.csv: Training data for Validation (userId (mapped), itemId (mapped) pairs)
validation_te.csv: Test Data for Validation (userId (mapped), itemId (mapped) pairs)
test_tr.csv: Training data for Testing (userId (mapped), itemId (mapped) pairs)
test_te.csv: Test Data for Testing (userId (mapped), itemId (mapped) pairs)
A set of input files for a sampled version of Askubuntu dataset are present in the Dataset folder. Note that we use the set of tags assigned to the posts of a user as items; the posts correspond to the questions asked by the user, the answers given by the user, the posts liked by the user, and the posts to which the user commented.
Refer to the following ipython notebook for details regarding creation of these files for movielens dataset: ml-parse-vaecf. The movies rated by the users are the items.
The model can be configured using the file config.ini present inside the Codes folder. The parameters h0_size, h1_size, h2_size, and h3_size are the sizes of the hidden layers as defined in the architecture of our discriminator in the GAN framework (see figure).
The other parameters to be configured are:
GANLAMBDA: Weight provided to the Adversary's Loss Term (Default = 1.0)
NUM_EPOCH: Number of Epochs for training (Default = 80)
BATCH_SIZE: Size of each batch (Default = 100)
LEARNING_RATE: Learning Rate of the Model (Default = 0.0001)
model_name: Name by which model is saved (Default = "LT_GAN")
The repo uses VAE-CF as the base recommender (generator in our architecture) by default. You can also replace this with your own recommender models (or other recommenders) to be trained with the GAN loss and long-tail strategy proposed by us. Follow the below instructions:
For training the model, run the following command:
$ python2.7 train.py <path/to/input/folder>
Model parameters are set to the values provided in the config file. By default, the trained model is checkpointed and saved to path/to/input/folder/chkpt/ after every epoch.
For testing the model, run the following command:
$ python2.7 test.py <path/to/input/folder> <path/to/saved/model>
where Path to saved model is the path to the saved model file inside chkpt folder (will be model_