What is AdaBoost (BOOSTING TECHNIQUES)

Apr 08, 2024

hello everyone my name is Krishna and welcome to my youtube channel today basically we are going to see

what

is the boost technique and we are going to see the first algorithm which is called

adaboost

algorithm in the previous video I have already shown you how to bag The

techniques

really they work and I also took an example of regression and random forest classifier, so let's go ahead and try to understand

what

is momentum now. Let's understand how momentum

techniques

work now. Let's consider this to be our data set. Now what we do is that. We create sequence based learners, so first of all, initially, some of these particular records will be passed to this particular base learner and this base learner can be any model at this time, once he is trained.

Well, after the training, what we will do is pass everyone. these records and we'll see how this particular model has basically performed. Now suppose you consider that this record, this record and this record were currently classified incorrectly or correctly classified, then what will happen now if it is classified incorrectly in the next model that is created? sequentially, only these records will be passed to the next model, which is basically created sequentially, which is my base learner. Okay, so when this particular record is passed to the base learner, that basically means that most of this particular record will be trained against this. particular base student and then simultaneously assume that this particular base student has provided more incorrect records like this here then this will be an error which will be passed to the next base student and this will be a problem and this will continue continuously so this is my base student 3, this will continue continuously and will continue unless and until we specify a number of base students that we want to create and this is the process of how a boost technique works best, but when we talk about

adaboost

, adaboost is a little bit different, there are something called "as waits" that will be assigned here and that we are going to discuss right now, so let's start the discussion about adaboost.

More Interesting Facts About,

what is adaboost boosting techniques...

Let's take an example, suppose in our data set I have features F 1 F 2 F 3. and 1 output is fine, so suppose I have multiple data sets like this, so suppose I have 7 records. Okay, I have 7 records, so in the first step what will happen is all the records will get a sample, wait, so I'm going to create. another column which is called sample wait now why we require this sample wait. I'm just telling no one, so initially to assign the sample wait we basically apply the formula W is equal to 1 times n now when I say that W is equal to 1 times n this is nothing but n is nothing more than the number of records, so basically I have 1 by 7 initially, all records are assigned basically the same weights, ok, all records are assigned basically the same weights.

Now this was about step one in Step two, what we do is we create our first base learner, which we need to create sequentially, so we will create our first base learner with the help of decision trees to boost here, all the base learners They are decision trees now that we are. talking about decision trees it is not like we create decision trees and random forests but here decision tree is simply created with the help of single depth these decision trees are basically called stumps so whenever you consider that if it is a decision tree it is just having one depth with two leaf nodes or you can have many leaf nodes these are basically called stumps now what we do is consider f1 and we create a stump then we consider f2 and then we create another stump for each and every feature will create this stump, and finally for F 3 we also create this particular stop, but from this particular stump I first have to select my first decision tree based learning model, okay?

How do I select it? We have two properties. called entropy, entropy or dame coefficient, we can use both if the entropy is less for this particular stump, then basically we are going to select this particular decision tree as my learning model based on the first sequence based learning model, of Similarly. I'll compare the entropy of f1 of this particular one-time stop term with stump three, whichever has the smallest. I will select that decision tree as my Lona based model Now, in the second step, let's assume that this is my selected stump that I have and let's assume that this has been curved correctly classified for records well and incorrectly classified only one aspect, so let's assume that has correctly classified for records and incorrectly classified as a record how it basically classifies my output basically has yes or no.

I'm just considering a binary classification. here okay, so let's assume that it has been correctly classified for classification and a wrong classification, then what we have to do is that for this internet classification we have to find out the total error. Well, how can we find out the total error that we will check? So let's assume that this particular record has been classified incorrectly. We then calculate the total error by adding all the sample weights. In this case I only have one error, so my total sample error will basically be 1 times 7. Now this is the second one. step where we have actually calculated the total error now in step 3 what we do is we try to find out the performance of the stump which basically means how the stump has basically been classified to do that we have a formula called 1 by 2 log to the base e 1 minus the total error divided by the total error, so what we're going to do is take this total error, okay, once we take this total error it's going to be log of e base e 1 divided by 1 times 7 divided by 1 times 7, so this will be nothing, it will be 1 times log e in base 6, ok, so log e multiplied by Bay 6, so this particular value will be around 0.896, this is how the stump performance is calculated now you must be thinking why I calculated the total error and this stump performance is because we need to update this weight as I told you in the boost technique, what happens is that only the wrong records of this particular decision The tree one or the one of stop will be passed to the next decision tree or the next stump, so what I have to do is increase the weights of the incorrectly classified records, while I have to decrease the weights of the correctly classified ones. records now in the fourth step we need to update the weights, we know that our performance of that particular stump is 0.895, so what we are going to do is based on this particular performance, we are going to update the weights now to update the weights there.

They are two simple formulas, first of all we will try to update the incorrectly classified points, so to update the weights of the incorrectly classified records, what we are going to do is use a very simple formula, the formula can basically be given by a new weight of sample is equal to the previous weight multiplied by e raised to the power of the yield, let's say the yield is basically this particular value now what we do is we know what our previous sample weight is, so we have 1 times 7 multiplied by e the power of 0.895, the output is 0.34 9 and you can also calculate with the help of calculator.

Now understand guys, this formula is basically to update the weight of the incorrectly classified point, so here initially my weight was 1 by 7 now. You can see that your initial weight was 1 times 7, now it's updated to 0.34 9, which basically increased, so this is the formula to update the incorrectly classified points, but to update the correctly classified points we just have to make a simple change. in the formula and the formula will simply be e raised to the power of least yield' and when we do the same formula for this here, when I get minus 0.895, this result is basically point zero five, that's it so you can understand that now my updated weights looks like Oh dot zero five zero dot three four nine dot zero five zero dot zero five dot zero five and so on, I hope these are seven records one two three four five six seven perfect now this is the step where I actually do it we have found out or updated weights for this particular record, but we must observe one more thing guys that when I do the sum of all this particular weight, the total value is not one, but in the case of sample weight, when I do the sum of all these values, usually I get one, so for this what we do is divide by a number which is the sum of all these values, so if I do the sum of all these values, somewhere I will get approximately 0.68.

Well, now that I divide all these values with with the help of 0.68, that time will basically get our normalized values. Now when we divide by 0.68, we're basically going to get our normalized weight. Now suppose I divided Oh, point zero five by 0.68. I get point zero seven similarly when I'm dividing the point. three four nine with 0.68 I get the point one five one three and similarly all these values will be filled in here. Again I will get dot zero seven dot zero seven dot zero seven and similarly dot zero seven here and when I do send Of all these values, that value will be equal to one.

What will be the next step? We will discuss it. I have removed the sample weight and the updated weight and instead just took the normalized weights, considering this. normalized weight what we will do is that in our next step we will create a new data set and that data set based on these particular updated values will probably select the wrong records for constraint purposes, so every time we create our second tree of decision, which is basically It hurt a lot, so let's understand how we create that new data set. So what I'm going to do is take that same data set as a new data set F 1 F 2 F 3 and this is basically my result.

Now what will happen? is that based on this normalized weight we will try to divide this into groups so that the first group is 0.0 0.0 0.0 7 and then this will be from point zero seven two points now, if I add point zero seven to this particular value, it will be in somewhere around 0.5 eight and similarly this will continue at 0.58 two point six five then point six five to 0.72 and similarly it will continue now after that, what it will do is our algorithm will run eight iterations to select different different records of this particular headline data set so suppose in the first iteration you selected a random value of 0.4 3 now it will go and see which bucket it falls into, suppose it falls into this bucket, the wrong record, understand guys this was my record Wrong, right?

It was incorrectly classified by decision tree one, so what I do is I select this record and fill it in here. Well, again for the second nitration, let's assume that the eyes were selected as 0.31. Now I'll go see where this point is. I fall into the bucket and again it falls into this particular bucket where my long oh man why the court was actually classified incorrectly so I'm going to take this particular record again and similarly this will continue fine now the probability When all this eight records will be selected most of the time. The wrong record will also be selected.

Now this is our new data set that I have based on this particular new data set. I'll create my new decision tree now when I'm creating the new one. Decision tree stump again, you will use f1 create a separate stump f2 create a separate stop f3 create the separate things and then based on the entropy select which entropy is very, very less for which decision tree you will step on and select that and again it same The process will continue, suppose in that decision tree we discover again that two records were classified incorrectly, then what will happen again?

This normalized weight will be updated. Let's assume this has been identified. Suppose this record was classified incorrectly. Then all the steps will restart again. where we update this weight, first we will go and find out what the total error is, then we will try to find out how the model has actually performed, which is basically my performance, let's say from the second stump of the decision tree and simultaneously from all the process will continue over and over. time and after that you will also see that after calculating the total error and performance, let's say the error weights will also be updated, which is my normalized weight that will also be updated and after the update again.

It will be normalized so that the process continues unless and until it goes through all the sequential decision trees and finally it will consider that there will be less error compared to this normalized weight that we had in the initial stages, now we assume with respect to our data set , we hadbuilt decision tree one, decision tree two, decision tree three, which are basically my stumps sequentially and how the classification will be done for the test data, you know, suppose I have a data set of proof that it will be approved. to this particular record and suppose this is a binary classification suppose this gives us one this gives us zero this gives us one well you know like random forest how most would basically happen similarly in adaboost most would basically happen between the stumps now in In short, here you can see that we are combining weak students, so we are actually turning it into strong students, and when multiple weak students are combined, they become strong students.

In short, now this is all about this particular video. I hope you understood what Adaboost techniques are. Understand all the steps. The weight update is the major step that will basically happen in my next video. I will explain about gradient

boosting

technique. I hope you like this particular video. Subscribe to the channel if you haven't already. I'm already subscribed I'll see you in the next video have a great day, thank you very much

Watch Video & Subscribe

If you have any copyright issue, please Contact