MIT Introduction to Deep Learning | 6.S191

Apr 22, 2024

it gives you what you want to work well and even when it gives you completely new data, to address this problem let's briefly talk about what is called regularization. Regularization is a technique you can introduce into your training process to discourage

learning

complex models now. We've seen before, this is really critical because neural networks are extremely large models and are extremely prone to overfitting, so regularization and having techniques for regularization has extreme implications for the success of neural networks and their generalization, beyond of the training data in our tests. domain, the most popular technique for regularization in

deep

learning

is called Dropout and the idea of Dropout is actually very simple.

Let's check it out by drawing this image of

deep

neural networks that we saw earlier in today's lecture on Dropout during training, we essentially select at random. some subset of neurons in this neural network and we try to eliminate them with some random probabilities, so for example, we can select this subset of neural neurons, we can randomly select them with a probability of 50 percent and with that probability we randomly change they turn off or they turn on in different iterations of our training, so this essentially forces the neural network to learn that it can think about a set of different models in each iteration, it will be exposed to a different type of model internally than it had in the last one. iteration, so you have to learn to build internal pathways to process the same information and you can't rely on the information you learned in previous iterations, so it forces you to capture some deeper meaning within the neural pathways. network and this can be extremely powerful because, first of all, it significantly reduces the capacity of the neural network (you are reducing it by about 50 percent in this example), but also because it makes them easier to train because the number of weights that have gradients in this case are also reduced, so it is actually much faster to train them as well.

More Interesting Facts About,

mit introduction to deep learning 6 s191...

Now, as I mentioned, in each iteration, we randomly remove a different set of neurons and that helps the data generalize better and the second regularization techniques, which are actually very extensive. The regularization technique that goes far beyond neural networks is simply called early stopping. We now know that the definition of overfitting is simply when our model starts to basically represent the training data more than the test data. That's really what overfitting essentially boils down to if we leave it out. some of the training data is used separately and we don't train it, we can use it as a kind of test data set, synthetic test data set, somehow we can monitor how our network is learning on this invisible chunk of data , so For example, over the course of training, we can basically plot the performance of our network on both the training set and our performed test set, and as the network trains, we will see that, first of all , both decrease, but there will be a point where the loss plateaus and starts to increase, the training loss will actually start to increase.

This is exactly the point where you start to overfit because now you are starting to regret that that was the loss in the test. it actually starts to increase because you are now starting to overfit your training data. This pattern basically continues for the rest of the workout and this is the point I want you to focus on. This midpoint is where we should stop training because after this point, assuming this test set is a valid representation of the true test set, this is the place where the model's accuracy will only get worse, so this is where we would want to stop. advance our model and regularize the performance, and we can If you see that stopping at any time before this point is also not good, we are going to produce an insufficient model in which we could have had a better model with the test data, but it is a right of compensation that cannot be stopped too late and cannot be.

We also shouldn't stop too soon, so I'll conclude this lecture by simply summarizing these three key points that we've covered in today's lecture so far, so first we've covered these fundamental building blocks of all neural networks, which is the single neuron than perceptron, we have built them into larger neural layers and then from their neural networks and deep neural networks, we have learned how we can train them, apply them to data sets and propagate them through them, and we have seen some tips and tricks for travel. Optimizing these systems from start to finish in the next lecture we'll hear from Ava about deep sequence modeling using rnns and specifically this very exciting new type of model called transformative architecture and attention mechanisms, so maybe we'll resume class in about five minutes after that I have the opportunity to exchange speakers and thank you very much for all your attention, thank you.

Watch Video & Subscribe

If you have any copyright issue, please Contact