项目作者: xinyandai

项目描述 :
🙃Implementation of vector quantization algorithms, codes for Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search..
高级语言: Python
项目地址: git://github.com/xinyandai/product-quantization.git
创建时间: 2018-10-17T11:43:37Z
项目社区:https://github.com/xinyandai/product-quantization

开源协议:

下载


product-quantization

A general framework of vector quantization with python.

NEQ, AAAI 2020, Oral

Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search.

  • Abstract

    Vector quantization (VQ) techniques are widely used in similarity search for
    data compression, fast metric computation and etc. Originally designed for
    Euclidean distance, existing VQ techniques (e.g., PQ, AQ) explicitly or
    implicitly minimize the quantization error. In this paper, we present a new
    angle to analyze the quantization error, which decomposes the quantization
    error into norm error and direction error. We show that quantization errors in
    norm have much higher influence on inner products than quantization errors in
    direction, and small quantization error does not necessarily lead to good
    performance in maximum inner product search (MIPS). Based on this observation,
    we propose norm-explicit quantization (NEQ) —- a general paradigm that
    improves existing VQ techniques for MIPS. NEQ quantizes the norms of items in a
    dataset explicitly to reduce errors in norm, which is crucial for MIPS. For the
    direction vectors, NEQ can simply reuse an existing VQ technique to quantize
    them without modification. We conducted extensive experiments on a variety of
    datasets and parameter configurations. The experimental results show that NEQ
    improves the performance of various VQ techniques for MIPS, including PQ, OPQ,
    RQ and AQ.

Datasets

The netflix dataset is contained in this repository, you can download more datasets from
here,
then you can calculate the ground truth with the script

  1. python run_ground_truth.py --dataset netflix --topk 50 --metric product

Run examples

  1. python run_pq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256
  2. python run_opq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256
  3. python run_rq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256
  4. python run_aq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256 # very slow

Reproduce results of NEQ

  1. python run_norm_pq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256
  2. python run_norm_opq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256
  3. python run_norm_rq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256
  4. python run_norm_aq.py --dataset netflix --topk 20 --metric product --num_codebook 4 --Ks 256 # very slow

Reference

If you use this code, please cite the following paper

  1. @article{xinyandai,
  2. title={Norm-Explicit Quantization: Improving Vector Quantization for Maximum Inner Product Search},
  3. author={Dai, Xinyan and Yan, Xiao and Ng, Kelvin KW and Liu, Jie and Cheng, James},
  4. journal={arXiv preprint arXiv:1911.04654},
  5. year={2019}
  6. }