Compile: Li Ke, Zhang, Liu Junhuan
Interpretability is still one of the biggest challenges in the application of modern deep learning. The latest advances in computational models and deep learning research enable us to create extremely complex models, including thousands of hidden layers and tens of millions of neurons. The amazing frontier deep neural network models are relatively simple to construct, but it is still a challenge to understand how these models create and use knowledge.
Recently, researchers from Google Brain team published a paper and proposed a new method called Concept Activation Vectors (CAV), which provided a new perspective for the interpretability of deep learning models.
In order to understand CAV technology, we need to understand the essence of interpretable problems in deep learning model. In today's generation of deep learning technology, there is an eternal contradiction between the accuracy and interpretability of the model. The contradiction between interpretability and accuracy exists between the ability to complete complex knowledge tasks and the ability to understand how these tasks are completed. Knowledge and control, performance and verifiability, efficiency and simplicity ... Any choice is actually a trade-off between accuracy and interpretability.
Do you care about getting the best results, or how the results are produced? This is a question that data scientists need to answer in every deep learning scenario. Many deep learning techniques are very complex in nature. Although they are accurate in many scenarios, they are very difficult to explain. If we draw some of the most famous deep learning models in the accuracy-interpretability chart, we will get the following results:
Interpretability in deep learning model is not a single concept. We can understand it from multiple levels:
In order to obtain the interpretability of each layer definition in the above figure, several basic building blocks are needed. In a recent paper, Google researchers outlined some basic components that they thought could be explained.
Google summed up the following interpretable principles:
-Understand the function of hidden layer: Most of the knowledge in the deep learning model is formed in the hidden layer. Understanding the macro-level functions of different hidden layers is very important to explain the deep learning model.
-Understanding the activation pattern of nodes: The key to interpretability is not to understand the function of each neuron in the network, but to understand the interconnected neuron groups that are excited together in the same spatial position. Segmentation of neural network by interconnected neuron groups can make us understand its function from a simpler abstract level.
-Understanding the process of concept formation: Understanding how the deep neural network forms a single concept that constitutes the final output is another key building block for interpretability.
These principles are the theoretical basis behind Google's new CAV technology.
Following the ideas discussed above, interpretability is generally considered to describe the prediction of deep learning model through its input characteristics. The logistic regression classifier is a typical example, and its coefficient weight is usually interpreted as the importance of each feature. However, most deep learning models operate on features such as pixel values, which do not correspond to advanced concepts that human beings can easily understand. In addition, the internal values of the model (for example, neuron activation) are also difficult to understand. Although technologies such as saliency map can effectively measure the importance of specific pixel areas, they cannot be associated with higher-level concepts.
The core idea behind CAV is to measure the correlation of concepts in model output. The CAV of a concept is a vector, which consists of a set of values (for example, activation) of concept instances in different directions. In this paper, the Google research team outlined a linear interpretable method called CAV(TCAV) test, which uses partial derivatives to quantify the sensitivity of predicting the potential advanced concepts represented by CAV. They believe that the definition of TCAV has four objectives:
-Easy to understand: users hardly need machine learning expertise.
Individualization: Adapt to any concept (such as gender), not limited to those who participate in the training.
-Plug and Play: It can run without retraining or modifying the machine learning model.
-Global quantification: A single quantitative measurement can be used to explain all categories or all instances, not just a single data input.
To achieve the above objectives, the TCAV method is divided into three basic steps:
1) defines related concepts for the model.
2) Understand the sensitivity of prediction to these concepts.
3) Infer the global quantitative explanation of the relative importance of each concept to each model prediction class.
The first step of TCAV method is to define related concepts. In order to do this, TCAV chooses a set of examples representing concepts, or looks for an independent data set to mark as concepts. We can learn CAV by training a linear classifier, and distinguish the activation generated by concept instances from the instances in each layer.
The second step is to generate a TCAV score, which can be used to quantify the sensitivity of prediction to specific concepts. TCAV uses partial derivatives to measure the sensitivity of ML prediction value to input at a certain conceptual direction and activation level.
The last step is to try to evaluate the global relevance of the learned CAV and avoid relying on irrelevant CAV. After all, one disadvantage of TCAV technology is that it is possible to learn meaningless CAV, because CAV can still be obtained by using a group of randomly selected images, and it is unlikely to be meaningful to test this random concept. In order to deal with this problem, TCAV introduces statistical significance test, and evaluates CAV with random training times (usually 500 times). The basic idea is that meaningful concepts should get consistent TCAV scores in multiple trainings.
The team conducted several experiments to evaluate the efficiency of TCAV compared with other interpretable methods. In one of the most compelling tests, the team used saliency maps to try to predict the relevance of taxi concepts to headlines or images. The output of saliency map is as follows:
Using these images as test data sets, the Google Brain team invited 50 people to do experiments on Amazon Mechanical Turk. Each experimenter performs a series of * * * six random sequential tasks (3 objects x 2 saliency maps) on a single model.
In each task, the experimenter will first see four pictures and corresponding salience masks. Then, they will evaluate the importance of the image to the model (10), the importance of the title to the model (10) and their confidence in the answer (5). The experimenter evaluated 60 different images (120 different saliency maps).
The basic fact of the experiment is that the concept of image is more relevant than the concept of title. However, when looking at the saliency map, people will think that the concept of title is more important (model with 0% noise), or they can't tell the difference (model with 100% noise). In contrast, the results of TCAV correctly show that the concept of image is more important.
TCAV is one of the most innovative neural network interpretation methods in recent years. The original code can be seen on GitHub. Many mainstream deep learning frameworks may adopt these ideas in the near future.
Related reports:
/this-new-Google-technique-help-us-how-understand-neural-networks-thinking-229 f 783300