Random gradient descent-SGD, small batch gradient descent-small batch GD momentum gradient descent-momentum, root mean square prop) ——RMSprop, adaptive moment estimation-Adam.
Different from traditional shallow learning, deep learning is different in that:
(1) emphasizes the depth of the model structure, and there are usually 5, 6 or even 10 hidden layer nodes.
(2) The importance of feature learning is expounded. That is to say, through layer-by-layer feature transformation, the feature representation of samples in the original space is transformed into a new feature space, thus making classification or prediction easier. Compared with the method of constructing features by artificial rules, learning features by using big data can better describe the rich internal information of data.
By designing and establishing an appropriate number of neuron computing nodes and multi-layer computing levels, the appropriate input layer and output layer are selected, and the functional relationship between input and output is established through network learning and optimization.