项目作者: zilliz-bootcamp

项目描述 :
Build a personalized movie recommendation system based on paddle and Milvus
高级语言: Python
项目地址: git://github.com/zilliz-bootcamp/personalized_recommender_system.git


:exclamation::exclamation: This repo will no longer be maintained, please visit https://github.com/milvus-io/bootcamp :exclamation: :exclamation:

Personalized Recommender System Based on Milvus

Prerequisites

Environment requirements

The following table lists recommended configurations, which have been tested:

Component Recommended Configuration
CPU Intel(R) Core(TM) i7-7700K CPU @ 4.20GHz
GPU GeForce GTX 1050 Ti 4GB
Memory 32GB
OS Ubuntu 18.04
Software Milvus 0.10.0
pymilvus 0.2.13
PaddlePaddle 1.6.1

Data source

The data source is MovieLens million-scale dataset (ml-1m), created by GroupLens Research. Refer to ml-1m-README for more information.

Build a personalized recommender system based on Milvus

Follow the steps below to build a recommender system:

  1. Train the model.

    1. # run train.py
    2. $ python3 train.py

    This command generates a model file recommender_system.inference.model in the same folder.

  2. Generate test data.

    1. # Download movie data movies_origin.txt to the same folder
    2. $ wget https://raw.githubusercontent.com/milvus-io/bootcamp/0.5.3/demo/recommender_system/movies_origin.txt
    3. # Generate test data. The -f parameter is followed by the movie data filename.
    4. $ python3 get_movies_data.py -f movies_origin.txt

    The above commands generate movies_data.txt in the same folder.

  3. Use Milvus for personalized recommendation by running the following command:

    1. # Milvus performs personalized recommendation based on user status
    2. $ python3 infer_milvus.py -a <age> -g <gender> -j <job> [-i]
    3. # Example 1
    4. $ python3 infer_milvus.py -a 0 -g 1 -j 10 -i
    5. # Example 2
    6. $ python3 infer_milvus.py -a 6 -g 0 -j 16

    The following table describes arguments of infer_milvus.py.

    | Parameter | Description |
    | —————- | —————————————————————————————— |
    | -a/--age | Age distribution
    0: “Under 18”
    1: “18-24”
    2: “25-34”
    3: “35-44”
    4: “45-49”
    5: “50-55”
    6: “56+” |
    | -g/--gender | Gender
    0:male
    1:female |
    | -j/--job | Job
    0: “other” or not specified
    1: “academic/educator”
    2: “artist”
    3: “clerical/admin”
    4: “college/grad student”
    5: “customer service”
    6: “doctor/health care”
    7: “executive/managerial”
    8: “farmer”
    9: “homemaker”
    10: “K-12 student”
    11: “lawyer”
    12: “programmer”
    13: “retired”
    14: “sales/marketing”
    15: “scientist”
    16: “self-employed”
    17: “technician/engineer”
    18: “tradesman/craftsman”
    19: “unemployed”
    20: “writer” |
    | -i/--infer | (Optional) Converts test data to vectors and import to Milvus. |

    Note: -i/--infer is required when you use Milvus for personalized recommendation for the first time or when you start another training and regenerate the model.

    The result displays top 5 movies that the specified user might be interested in:

    1. get infer vectors finished!
    2. Server connected.
    3. Status(code=0, message='Create table successfully!')
    4. rows in table recommender_demo: 3883
    5. Top Ids Title Score
    6. 0 3030 Yojimbo 2.9444923996925354
    7. 1 3871 Shane 2.8583481907844543
    8. 2 3467 Hud 2.849525213241577
    9. 3 1809 Hana-bi 2.826111316680908
    10. 4 3184 Montana 2.8119677305221558

    Run python3 infer_paddle.py. You can see that Paddle and Milvus generate the same result.