Build a personalized movie recommendation system based on paddle and Milvus
This repo will no longer be maintained, please visit https://github.com/milvus-io/bootcamp
The following table lists recommended configurations, which have been tested:
Component | Recommended Configuration |
---|---|
CPU | Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz |
GPU | GeForce GTX 1050 Ti 4GB |
Memory | 32GB |
OS | Ubuntu 18.04 |
Software | Milvus 0.10.0 pymilvus 0.2.13 PaddlePaddle 1.6.1 |
The data source is MovieLens million-scale dataset (ml-1m), created by GroupLens Research. Refer to ml-1m-README for more information.
Follow the steps below to build a recommender system:
Train the model.
# run train.py
$ python3 train.py
This command generates a model file recommender_system.inference.model
in the same folder.
Generate test data.
# Download movie data movies_origin.txt to the same folder
$ wget https://raw.githubusercontent.com/milvus-io/bootcamp/0.5.3/demo/recommender_system/movies_origin.txt
# Generate test data. The -f parameter is followed by the movie data filename.
$ python3 get_movies_data.py -f movies_origin.txt
The above commands generate movies_data.txt
in the same folder.
Use Milvus for personalized recommendation by running the following command:
# Milvus performs personalized recommendation based on user status
$ python3 infer_milvus.py -a <age> -g <gender> -j <job> [-i]
# Example 1
$ python3 infer_milvus.py -a 0 -g 1 -j 10 -i
# Example 2
$ python3 infer_milvus.py -a 6 -g 0 -j 16
The following table describes arguments of infer_milvus.py
.
| Parameter | Description |
| —————- | —————————————————————————————— |
| -a
/--age
| Age distribution
0: “Under 18”
1: “18-24”
2: “25-34”
3: “35-44”
4: “45-49”
5: “50-55”
6: “56+” |
| -g
/--gender
| Gender
0:male
1:female |
| -j
/--job
| Job
0: “other” or not specified
1: “academic/educator”
2: “artist”
3: “clerical/admin”
4: “college/grad student”
5: “customer service”
6: “doctor/health care”
7: “executive/managerial”
8: “farmer”
9: “homemaker”
10: “K-12 student”
11: “lawyer”
12: “programmer”
13: “retired”
14: “sales/marketing”
15: “scientist”
16: “self-employed”
17: “technician/engineer”
18: “tradesman/craftsman”
19: “unemployed”
20: “writer” |
| -i
/--infer
| (Optional) Converts test data to vectors and import to Milvus. |
Note:
-i
/--infer
is required when you use Milvus for personalized recommendation for the first time or when you start another training and regenerate the model.
The result displays top 5 movies that the specified user might be interested in:
get infer vectors finished!
Server connected.
Status(code=0, message='Create table successfully!')
rows in table recommender_demo: 3883
Top Ids Title Score
0 3030 Yojimbo 2.9444923996925354
1 3871 Shane 2.8583481907844543
2 3467 Hud 2.849525213241577
3 1809 Hana-bi 2.826111316680908
4 3184 Montana 2.8119677305221558
Run
python3 infer_paddle.py
. You can see that Paddle and Milvus generate the same result.