Robust Attribution Regularization
This project is for the paper: Robust Attribution Regularization. Some codes are from MNIST Challenge, CIFAR10 Challenge, Deep traffic sign classification, tflearn oxflower17 and Interpretation of Neural Network is Fragile.
This project is to solve an emerging problem in trustworthy machine learning: train models that produce robust interpretations for their predictions. See the example below:
It is tested under Ubuntu Linux 16.04.1 and Python 3.6 environment, and requires some packages to be installed:
mkdir models
before training models. config.json
Model configuration:
model_dir
: contains the path to the directory of the currently trained/evaluated model.Data configuration:
data_path
: contains the path to the directory of dataset. Training configuration:
tf_random_seed
: the seed for the RNG used to initialize the network weights.numpy_random_seed
: the seed for the RNG used to pass over the dataset in random order.max_num_training_steps
: the number of training steps.num_output_steps
: the number of training steps between printing progress in standard output.num_summary_steps
: the number of training steps between storing tensorboard summaries.num_checkpoint_steps
: the number of training steps between storing model checkpoints.training_batch_size
: the size of the training batch.step_size_schedule
: learning rate schedule array.weight_decay
: weight decay rate. momentum
: momentum rate.m
: m in the gradient step.continue_train
: whether continue previous training. Should be True or False.lambda
: lambda in IG-NORM or beta in IG-SUM-NORM. approx_factor
: (m / approx_factor) = (m in the attack step). training_objective
: ‘ar’ for IG-NORM and ‘adv_ar’ for IG-SUM-NORM. Evaluation configuration:
num_eval_examples
: the number of examples to evaluate the model on.eval_batch_size
: the size of the evaluation batches.Adversarial examples configuration:
epsilon
: the maximum allowed perturbation per pixel.num_steps
or k
: the number of PGD iterations used by the adversary.step_size
or a
: the size of the PGD adversary steps.random_start
: specifies whether the adversary will start iterating from the natural example or a random perturbation of it.loss_func
: the loss function used to run pgd on. xent
corresponds to the standard cross-entropy loss, cw
corresponds to the loss function of Carlini and Wagner, ar_approx
corresponds to the regularization term of our IG-NORM objective, adv_ar_approx
corresponds to our IG-SUM-NORM objective. Integrated gradient configuration:
num_IG_steps
: the number of segments for summation appproximation of IG. Attribution robustness configuration:
attribution_attack_method
: can be random
, topK
, mass_center
and target
.attribution_attack_measure
: can be kendall
, intersection
, spearman
and mass_center
.saliency_type
: can be ig
or simple_gradient
.k_top
: the k used for topK attack.eval_k_top
: the k used for evaluation metric — TopK intersection.attribution_attack_step_size
: step size of attribution attack.attribution_attack_steps
: the number of iterations used by the attack. attribution_attack_times
: the number of iterations to repeat the attack. Please cite our work if you use the codebase:
@inproceedings{chen2019robust,
title={Robust attribution regularization},
author={Chen, Jiefeng and Wu, Xi and Rastogi, Vaibhav and Liang, Yingyu and Jha, Somesh},
booktitle={Advances in Neural Information Processing Systems},
pages={14300--14310},
year={2019}
}
Please refer to the LICENSE.