Residual Network (ResNet for short) was proposed in 2015 after Alexnet Googlenet VGG three classic CNN networks, and in the ImageNet competitionclassification task on the top, ResNet because of its simple and practical advantages. ResNet is now widely used in the fields of detection, segmentation, and recognition.
ResNet is arguably the most groundbreaking work in computer vision and deep learning over the past few years, effectively solving the problem of declining accuracy in the training set as the network deepens, as shown in the following diagram:
Students who have done deep learning should know that one of the reasons for the deterioration in training as the number of network layers increases is the gradient dispersion and the gradient explosion problem (voxelization). vanishing/exploding gradients problem, which inhibits the convergence of shallow network parameters. But this problem has been better solved by some parameter initialization techniques, and interested students can see the following articles in the references:[2][3][4][5][6].
But even so, at higher network depths (such as the 56-layer network in the figure) there is still the problem of deteriorating results, we can see in the previous Alexnet Googlenet VGG three models, the depth of the network in the recognition of the picture has a crucial role, the deeper the depth can automatically learn the different levels of features may be more, the more. So what exactly is the reason for the deterioration?
Fig. 3
The 19-layer VGG model on the left has 19.6 billion FLOPs The 34-layer ordinary convolutional network in the middle has 3.6 billion FLOPs
The 34-layer ResNet on the right has 3.6 billion FLOPs, and the solid line in the figure. arrows are direct mappings without dimension changes, and the dashed lines are mappings with dimension changes. The comparison shows that VGG has a lot of computation even though there are not many layers, and later we can see from the experimental data that the 34-layer ResNet will perform better than the 19-layer one.
From the figure, we can see that in terms of effect, the 34-layer residual network is better than both VGG and GoogleNet, and among the three schemes of A, B, and C, the C scheme has the best effect, but the B and C schemes are much more computationally intensive than the A scheme, and the effect of the enhancement is very little, so the authors of the paper suggest that it is more practical to use the A scheme.
Next, we introduce the architecture of residual networks with 50 or more layers: Deeper Bottleneck Architectures, which were designed by the authors to reduce training time, and a comparison of the architectures is shown in the following figure:
ResNet solves the degradation of deep networks through residual learning, allowing us to train deeper networks. deeper networks, which qualifies as a historical breakthrough for deep networks, I guess. Maybe there will be a better way to train deeper networks soon, so let's look forward to that!
For now, you can find a sample 34-layer tensorflow-based implementation of ResNet on the CIFAR-10 (CIFAR decile 10 dataset) at Mo, the AI modeling platform, which is 90% accurate on the test set and 98% accurate on the validation set. The main program is in ResNet_Operator.py, the block structure of the network is in ResNet_Block.py, and the trained model is saved in results folder.
Project source code address: /explore/5d1b0a031afd944132a0797d?type=app
References:
[1] _K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.
[2] Y. LeCun, L. Bottou, G. B. Orr, and K.-R. M¨uller. Efficient backprop. in Neural Networks: Tricks of the Trade, pages 9-50. Springer, 1998.
[3] X. Glorot and Y. Bengio. Understanding the difficulty of training deep feedforward neural networks. In AISTATS, 2010.
[4] A. M. Saxe, J. L. McClelland, and S. Ganguli. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv:1312.6120, 2013.
[5] K. He, X. Zhang, S. Ren, and J. Sun. Delving deep into rectifiers:Surpassing human-level performance on imagenet classification. In ICCV, 2015.
[6] S. Ioffe and C. Szegedy. Batch normalization: accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
Mo (URL: momodel.cn) is a Python-enabled Artificial Intelligence online modeling platform that helps you quickly develop, train, and deploy models.
Mo Artificial Intelligence Club is a club initiated by the site's R&D and product design teams to lower the barriers to AI development and use. The team has experience in big data processing and analysis, visualization and data modeling, and has undertaken intelligence projects in multiple domains, with the ability to design and develop from the bottom to the front end. The main research direction is big data management analysis and artificial intelligence technology, and as a way to promote data-driven scientific research.