What is backpropagation really doing? | Chapter 3, Deep learning
Jun 03, 2021Mechanisms to make this happen. What
backpropagation
actually does. By the way, in practice, it takes computers a long time to add the effect of each training example in each gradient descent step. So, this iswhat
is normally done. You can shuffle your training data and split it into a whole batch of mini-batches, let's say each one contains 100 training examples. Then you calculate the step according to the small batch. It is not the actual gradient of the cost function, which is based on all the training data, nor on this small subset, so it is not the most efficient step down, but each small batch gives you a good approximation and, More importantly, it gives you significant computational speedup.If you were to route your network below the relevant cost surface, you would be more like a drunken man stumbling aimlessly down a hill but taking quick steps, rather than a man carefully calculating and determining the exact direction of the slope. every step. . Before taking a very slow and careful step in this direction. This technique is known as stochastic gradient descent. There's a lot going on here, so let's summarize it ourselves, shall we? Backpropagation is an algorithm for determining how a training example wants to boost its weights and biases, not just in terms of whether they should go up or down, but in terms of the relative proportions of those changes that cause the most rapid decrease in costs.
A true gradient descent step would involve
doing
this for all tens and thousands of training examples and averaging the desired changes that result, but this is computationally slow, so instead you randomly split the data into mini-batches and calculate each step with respect to the mini lot. By repeatedly going through all the mini-batches and making these adjustments, you will converge towards the local minimum of the cost function, which means that your network will eventually do a very good job on the training examples. So, all that being said, virtually every line of code that could be included in a backprop implementation corresponds to something you just saw, at least in informal terms.But sometimes knowing
what
the math does is only half the battle, and simply representing the damn thing is what gets complicated and confusing. So for those of you who want to digdeep
er, the following video goes over the same ideas that were just presented here, but in terms of basic calculus, which will hopefully make it a little more familiar as you see the topic in other places. sources. . Before that, one thing worth emphasizing is that for this algorithm to work, and this applies to all types of machinelearning
outside of neural networks, a lot of training data is needed.In our case, the only thing that makes handwritten digits a great example is the existence of the MNIST database, with many human-labeled examples. So a common challenge that people in machine
learning
will be familiar with is simply getting the labeled training data youreally
need, whether that's getting people to label tens of thousands of images or any other type of data you can come up with. be tryingIf you have any copyright issue, please Contact