nnieqat-pytorch

Nnieqat is a quantize aware training package for Neural Network Inference Engine(NNIE) on pytorch, it uses hisilicon quantization library to quantize module’s weight and activation as fake fp32 format.

nnieqat-pytorch

Installation

Supported Platforms: Linux
Accelerators and GPUs: NVIDIA GPUs via CUDA driver 10.1 or 10.2.
Dependencies:
- python >= 3.5, < 4
- llvmlite >= 0.31.0
- pytorch >= 1.5
- numba >= 0.42.0
- numpy >= 1.18.1
Install nnieqat via pypi:
```
$ pip install nnieqat
```
Install nnieqat in docker(easy way to solve environment problems)：
```
$ cd docker
$ docker build -t nnieqat-image .
```

Install nnieqat via repo：

$ git clone https://github.com/aovoc/nnieqat-pytorch
$ cd nnieqat-pytorch
$ make install

Usage

add quantization hook.

quantize and dequantize weight and data with HiSVP GFPQ library in forward() process.


from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
  register_quantization_hook(model)
...

merge bn weight into conv and freeze bn

suggest finetuning from a well-trained model, merge_freeze_bn at beginning. do it after a few epochs of training otherwise.

from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
    model.train()
    model = merge_freeze_bn(model)  #it will change bn to eval() mode during training
...

Unquantize weight before update it

from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
    model.apply(unquant_weight)  # using original weight while updating
    optimizer.step()
...

Dump weight optimized model

from nnieqat import quant_dequant_weight, unquant_weight, merge_freeze_bn, register_quantization_hook
...
...
    model.apply(quant_dequant_weight)
    save_checkpoint(...)
    model.apply(unquant_weight)
...

Using EMA with caution(Not recommended).

Code Examples

Cifar10 quantization aware training example (add nnieqat into pytorch_cifar10_tutorial)

python test/test_cifar10.py
ImageNet quantization finetuning example (add nnieqat into pytorh_imagenet_main.py)

python test/test_imagenet.py --pretrained path_to_imagenet_dataset

Results

ImageNet

python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1  --lr 0.001 --pretrained --epoch 10   # nnie_lr_e-3_ft
python pytorh_imagenet_main.py /data/imgnet/ --arch squeezenet1_1  --lr 0.0001 --pretrained --epoch 10  # lr_e-4_ft
python test/test_imagenet.py /data/imgnet/ --arch squeezenet1_1  --lr 0.0001 --pretrained --epoch 10  # nnie_lr_e-4_ft

finetune result：

| | trt_fp32 | trt_int8 | nnie |
| ———— | ———— | ———— | ———— |
| torchvision | 0.56992 | 0.56424 | 0.56026 |
| nnie_lr_e-3_ft | 0.56600 | 0.56328 | 0.56612 |
| lr_e-4_ft | 0.57884 | 0.57502 | 0.57542 |
| nnie_lr_e-4_ft | 0.57834 | 0.57524 | 0.57730 |

coco

net: simplified yolov5s

train 300 epoches, hi3559 test result:

finetune 20 epoches, hi3559 test result:

Todo

Generate quantized model directly.

Reference

HiSVP 量化库使用指南

Quantizing deep convolutional networks for efficient inference: A whitepaper

8-bit Inference with TensorRT

Distilling the Knowledge in a Neural Network