Network Dissection: Quantifying Interpretability of Deep Visual Representations David Bau∗, Bolei Zhou∗, Aditya Khosla, Aude Oliva, and Antonio Torralba CSAIL, MIT {davidbau, bzhou, khosla, oliva, torralba}@csail.mit.edu Abstract We propose a general framework called Network Dissec- tion for quantifying the interpretability of latent representa- tions of CNNs by evaluating the alignment between individ- ual hidden units and a set of semantic concepts. Given any CNN model, the proposed method draws on a broad data set of visual concepts to score the semantics of hidden units at each intermediate convolutional layer. The units with semantics are given labels across a range of objects, parts, scenes, textures, materials, and colors. We use the proposed method to test the hypothesis that interpretability of units is equivalent to random linear combinations of units, then we apply our method to compare the latent representations of various networks when trained to solve di