tensorflow optimization does not move

Reason analysis:

numpy or TensorFlow piece must have a in addition to the problem, respectively debugging TensorFlow code and numpy code executed separately. Found that when converting arrays np.array() execution efficiency is extremely low.

Solution:

Change numpy version

tensorflow accelerated optimization method

1. stochastic gradient decent (CGD)

Put data into neural network in small batches for calculation

W += - Learning rate * dx

Disadvantages: difficult to choose the right learning rate

Slow

Easily converges to a local optimum, and can be trapped in saddle points in some cases

2. momentum

Mimics the notion of momentum in physics, accumulating the previous momentum to replace the true gradient. (Exploits slope inertia)

m = b1 * m - Learning rate * dx

W += m

Features: accelerates SGD in the direction of interest, suppresses oscillations, and thus speeds up convergence

Relying on manually setting the global learning rate, the accumulation of the gradient squared on the denominator in the mid-to-late stages will be larger and larger, making the training end early

3. adagrad

Each parameter update has its own learning rate (bad shoes to walk in)

v += dx^2

W += -Learning rate * dx / √v

Characteristics: pre-amplification of the gradient, late constraints on the gradient, suitable for dealing with sparse gradients

4. RMSProp

Combines the advantages of momentum and adagrad

v = b1 * v + (1 - b1) * dx^2

W += -Learning rate * dx / √v

Characteristics: Depends on the global learning rate

Good for dealing with non-smooth targets - for RNNs Good for RNN

5. Adam (fast and good)

m = b1 * m + (1 - b1) * dx

v = b2 * v + (1 - b2) * dx^2

W += -Learning rate * m / √v

Characteristics: Combines Adagrad's ability to deal with sparse gradients with RMSprop's ability to deal with non-smooth targets. RMSprop is good at dealing with non-smooth objectives

Low memory requirements

Compute different adaptive learning rates for different parameters

Also suitable for mostly non-convex optimization - for large datasets and high-dimensional spaces

6. Optimizers

Used to vary learning efficiency

Excuse me, how to check foreign literature?

What function does a tourist attraction have

What is the height standard for a five-year-old boy?

Graduation thesis for business administration majors

Tmall with Taobao Assistant 5 Show upload products failed, sub error: Invalid arguments:binds Why is it?

Big Data Development Platform for Future Medicine

BMW and Alibaba Establish Joint Innovation Base to Promote "Creation in China

My brother's academic performance in Shanxi is not satisfactory. What's good about learning technology for him?

How much damage is normal for a 109 level datang? I'm not a RMB player, and I don't have to do any repair work.

In the era of artificial intelligence is so developed, where do you think the opportunities for young people?