SIN

Structure Inference Net: Object Detection Using Scene-level Context and Instance-level Relationships. In CVPR 2018.(http://vipl.ict.ac.cn/uploadfile/upload/2018041318013480.pdf)

Requirements: software

Requirements for Tensorflow 1.3.0 (see: Tensorflow)
Python packages you might not have: cython, python-opencv, easydict

Installation (sufficient for the demo)

Clone the SIN repository

# Make sure to clone with --recursive
git clone --recursive https://github.com/choasUp/SIN.git

Build the Cython modules
```
cd $SIN_ROOT/lib
make
```

Demo

After successfully completing basic installation, you’ll be ready to run the demo.

Wait …

Training Model

Download the training, validation, test data and VOCdevkit

 wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtrainval_06-Nov-2007.tar
 wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCtest_06-Nov-2007.tar
 wget http://host.robots.ox.ac.uk/pascal/VOC/voc2007/VOCdevkit_08-Jun-2007.tar

Extract all of these tars into one directory named VOCdevkit

 tar xvf VOCtrainval_06-Nov-2007.tar
 tar xvf VOCtest_06-Nov-2007.tar
 tar xvf VOCdevkit_08-Jun-2007.tar

It should have this basic structure

   $VOCdevkit/                           # development kit
   $VOCdevkit/VOCcode/                   # VOC utility code
   $VOCdevkit/VOC2007                    # image sets, annotations, etc.
   # ... and several other directories ...

Create symlinks for the PASCAL VOC dataset

 cd $SIN_ROOT/data
 ln -s $VOCdevkit VOCdevkit

Download the pre-trained ImageNet models [Google Drive] [Dropbox]

 mv VGG_imagenet.npy $SIN_ROOT/data/pretrain_model/VGG_imagenet.npy

[optional] Set learning rate and max iter

    vim experiments/scripts/faster_rcnn_end2end.sh         # ITERS
 vim lib/fast/config.py                     # LR
 cd lib                             # if you edit the code, make best
 make

Set your GPU id, then run script to train and test model

 cd $SIN_ROOT
 export CUDA_VISIBLE_DEVICSE=0
 ./train.sh

Test your dataset
```
 ./test_all.sh
```

The result of testing on PASCAL VOC 2007 (VGG net)

AP for aeroplane = 0.7853
AP for bicycle = 0.8045
AP for bird = 0.7456
AP for boat = 0.6657
AP for bottle = 0.6144
AP for bus = 0.8424
AP for car = 0.8663
AP for cat = 0.8894
AP for chair = 0.5803
AP for cow = 0.8466
AP for diningtable = 0.7171
AP for dog = 0.8578
AP for horse = 0.8626
AP for motorbike = 0.7802
AP for person = 0.7857
AP for pottedplant = 0.4869
AP for sheep = 0.7599
AP for sofa = 0.7351
AP for train = 0.8199
AP for tvmonitor = 0.7683
Mean AP = 0.7607

References

Faster R-CNN caffe version

Faster R-CNN tf version

Citation

Yong Liu, Ruiping Wang, Shiguang Shan, and Xilin Chen. Structure Inference Net: Object Detection Using Scene-level Context and Instance-level Relationships. In CVPR 2018.