Current location - Loan Platform Complete Network - Big data management - tensorflow optimization does not move
tensorflow optimization does not move
Reason analysis:

numpy or TensorFlow piece must have a in addition to the problem, respectively debugging TensorFlow code and numpy code executed separately. Found that when converting arrays np.array() execution efficiency is extremely low.

Solution:

Change numpy version

tensorflow accelerated optimization method

1. stochastic gradient decent (CGD)

Put data into neural network in small batches for calculation

W += - Learning rate * dx

Disadvantages: difficult to choose the right learning rate

Slow

Easily converges to a local optimum, and can be trapped in saddle points in some cases

2. momentum

Mimics the notion of momentum in physics, accumulating the previous momentum to replace the true gradient. (Exploits slope inertia)

m = b1 * m - Learning rate * dx

W += m

Features: accelerates SGD in the direction of interest, suppresses oscillations, and thus speeds up convergence

Relying on manually setting the global learning rate, the accumulation of the gradient squared on the denominator in the mid-to-late stages will be larger and larger, making the training end early

3. adagrad

Each parameter update has its own learning rate (bad shoes to walk in)

v += dx^2

W += -Learning rate * dx / √v

Characteristics: pre-amplification of the gradient, late constraints on the gradient, suitable for dealing with sparse gradients

4. RMSProp

Combines the advantages of momentum and adagrad

v = b1 * v + (1 - b1) * dx^2

W += -Learning rate * dx / √v

Characteristics: Depends on the global learning rate

Good for dealing with non-smooth targets - for RNNs Good for RNN

5. Adam (fast and good)

m = b1 * m + (1 - b1) * dx

v = b2 * v + (1 - b2) * dx^2

W += -Learning rate * m / √v

Characteristics: Combines Adagrad's ability to deal with sparse gradients with RMSprop's ability to deal with non-smooth targets. RMSprop is good at dealing with non-smooth objectives

Low memory requirements

Compute different adaptive learning rates for different parameters

Also suitable for mostly non-convex optimization - for large datasets and high-dimensional spaces

6. Optimizers

Used to vary learning efficiency