The covariance matrix

Jun 10, 2021

Hello my name is Luis Serrano and this video is about the

covariance

matrix

for a set of data the

covariance

matrix

tells us a lot of information about it so let's see it but first let's start with other measurements the first is the center of mass The center of mass is the point where this data set would be balanced. If each point had the same weight, we also call it the mean or average, then we also have the variance x, which tells us how much this data set is distributed in the horizontal direction. and the variance y, which tells us how much the data set is spread in the vertical direction, as you can see this data tells a little more spread in the horizontal direction than vertical, so we expect this variance x to be greater than its variance and yet that doesn't tell us everything because the data set could be an oval or a circle or different things.

Something that will tell us a little more about the shape in the data set is the covariance, in this case it would tell us that it is elongated and points in this diagonal direction, which tells us a lot about the shape of our data set and then we can do things like fit a Gaussian to it, etc., so we have something called a covariance matrix that encompasses a lot of this information and is simply the matrix made up of the variance in the x direction, the variance in the y direction that goes on the diagonal and the covariance that goes in places outside the diagram now this data set is two dimensional so the covariance matrix is two by two but if we had, for example, a 100 dimensional data set, then we have a covariance matrix of 100 by 100.

More Interesting Facts About,

the covariance matrix...

Now, in some cases, we will have weighted points, so we don't have the full point in the data set, but we have half of it. a point or a third of a point or a tenth of a point we can still find the center of mass this time it's going to be a little bit higher and to the right because the heavy points are up and to the right we still have an experience and we still have a variance and and we still have a covariance, so in this video I'm going to show you how to calculate it.

This is useful in algorithms like Gaussian mixture models, for example it uses chunks of points etc. and we can also find a way. of this data set, so let's start with the average and let's start with a very small data set, these four points, so what is the center of mass? Well, the center of mass or average or mean is right here, what are the coordinates? Well, first we'll see look at the first coordinate, the x coordinate, and take the average of those four, so it's one plus three plus three plus five divided by four and then let's do the same with the second coordinate, the y coordinate, and we get This point if we summarize it. we get point three point two, so three point two is the center of mass of the data set and that's expected given the points now.

What we normally want to do to calculate the covariant matrix is much easier is look at the point 0. 0 and we take the data set and move the center to the point 0 0 recalculating each weight, so we subtract 3 from each x coordinate and two from each y coordinate to finish with a new data set, we can see it here and now I'm going to show you how to find the x variance to measure how spread out in the horizontal direction this data set is. For this we just need to look at the first coordinate, which is the x coordinate, and we just take the average. of the squares of these variables why the squares because we want to check how far it is from the origin, so we don't want a positive 2 to negate a negative 2.

We want them both to add up because the farther points are in the x direction, the more variance there is and that is why we take the average of the squares, in this case it is a quarter of two squared plus zero squared plus zero squared plus minus two squared, which is two. Now, to find the variance y, it's exactly the same thing. now we take the second coordinate, then the y coordinate and we take the average of the squares, so one quarter times negative one squared plus negative one squared plus one squared plus one squared and that's equal to 1.

So let's see some data examples. states that this one here is pretty centered, so we expect it to have a small x variance and a small y variance. This one here is pretty spread out but only in the horizontal direction, so it's expected to have a lot of experience and a little variance. one here is quite centered in the horizontal direction but quite extended in the vertical direction, so it is expected to have a small x variation but a large y variation and this one is extended in both directions, so it is expected to have a large experience and a large The y variance, however, the x variance and the y variance do not tell us the whole story, for example, let's look at these two data sets.

Let's notice that if we calculate the x and y variance, they have exactly the same x and y variance, which is two and one, however, they are very different, one belongs to one diagonal, the ion corresponds to the other diagonal, so how do we differentiate them? And the way we're going to differentiate them is with something called covariance, so how would you do that? distinguish these two sets, is there an equation other than the sum of squares that we know doesn't work that you would use to differentiate them? One way is to look at these two points on the left, they appear to form a diagonal. which goes left and right is the opposite diagonal, so what equation would work?

Feel free to pause the video and think about it and I'll tell you the answer. It is the product of the two coordinates. Look at this one less. 2 times -1 is -2 and two times negative one is negative two that's on the left while on the right is two times one equals two and negative two times negative one equals two, so on the left the points on this backward diagonal we go have the property that the product of the coordinates is negative while those on the right the product of the coordinates is positive so all we have to do is calculate the product of the coordinates for all other points and the covariance will be the average of all the coordinate products, so on the left will be the average of negative two zero zero and negative two which is negative one and on the right will be the average of two zero zero and two which is one , so that's what we're going to differentiate between them and that's covariance just out of curiosity, let's calculate the covariance of this data set, what do you think it will be? so that this data set is centered and not skewed on either diagonal, so we would hopefully be something smaller than zero, let's see that the product of the coordinates are these and they are always zero, so the average of four zeros is zero so this set has zero covariance so in general if you have a data set like that like the backward diagonal well you would say it has negative queries if it's something like this that doesn't skews on no diagonal, it probably has zero covariance, so at least a very small number and these like this that go on a positive diagonal have positive covariance if you're thinking about correlation, covariance and correlation are very similar, they're not exactly the same formula but very similar, so you would say that if there is a variable on the x-axis and a variable on the y-axis, then in the left graph they are negatively correlated in the middle graph they are not correlated or perhaps not they are independent and in the graph on the right they are positively correlated because as x increases y also increases, so let's look at the actual formulas for these quantities that we learned, if our data set is this, we will call the points x i and i because i is equal to 1 n because we have n points in our data set so the mean or the center of mass are two numbers mu x and mu y and each one is just the average of x i and the average of y i now the variance of x well, What we have to do is subtract from each x the mu x, which is the average of the the square of everything and then we take the average and the covariance is simply the average of the products of x i minus mu x and y i minus mu and finally the covariance matrix is the one that on the main diagonal has all the variances and outside the main diagonal it has the covariances now that's almost the whole story but let's remember that we talked at the beginning about data sets where the points appear as a fraction, so the whole point does not appear, but maybe 10 or half or 80, etc. for those we can still calculate the average and the covariance matrix and we do it exactly the same way, except now we weight the point, so let's look at it in the example that we saw at the beginning, so for these four points the mean or the average or the center of mass is calculated this way and notice that I took the average in a Otherwise, I didn't divide it by four, but I divided it by one plus one plus one plus one, so it's one unit for each of the points and you'll see why, but now let's say we have a set of weighted data, so instead of the bottom left point we have one third of that point, so what now happens to the center of mass?

Well, now it won't be where it used to be, now it will be a little bit more to the right and up, how do we calculate it right? we simply add 1 3 in the corresponding weight and then the bottom in the corresponding unit to get 3.4 point 2.2, so that's our new average. The next thing we do is center this data set and then subtract it. to each x coordinate 3.4 and to each y coordinate 2.2 so that we center it at the point 0 0 and that will make it much easier for us to calculate the variance x variance and variance and the covariance, so let's start by calculating the variance , the sum of the squares of the x coordinates divided by four, which is the number of points, so the average, however the point at the bottom left is only about a third there, it's not completely there, so so we have to multiply its corresponding term by 1 3 squared the reason we square is because in the variance formula you are adding a bunch of squares, so we calculate this and we get 0.88 for the new variance x, the same for the variance y to its corresponding term in the average of coordinate squares and we multiply by a third square and we obtain 0.72 note that both the variant x and the variant y are smaller than the previous one for the covariance the same we are taking the average of the product of the coordinates but for the corresponding term of that point we multiply it by a third square to get 0.64 as our covariance and if you like formulas, here are our points x 1 y 1 x 2 y 2 up to x n y n and each point comes with a corresponding weight which is a number between 0 and 1 that tells us how much of the point is in the data set, so if it is for example 30 then alpha 1 is 0.3 now for the mean we calculate just like before, except instead of dividing by n we divide by the sum of alpha i and each x i and y i are weighted by alpha i, so it's just a weighted average of the variance x.

The same thing, but now it is weighted by the square, so we divide by the sum of alpha i squared and each term is multiplied by alpha i squared the same for the variance and and the same for the covariance and now we have our matrix of covariance and this is pretty much it, this tells us a lot about the data set if the data set is weighted by some percentages, so thank you. Thank you very much for your attention. I would like to remind you that I have a book called Rocking Machine Learning and in this book I explain supervised learning of most algorithms and various techniques to apply machine learning to many real-life problems with code and Python.

You can see it on this website which is also linked in the comments and you can use a discount code called serrano yt to get 40 off the price so thank you very much for your attention, if you like this video please subscribe for more . content or like it or share it with your friends and I love it when you comment, so please give me a comment, tell me what you like and what you don't, and also if you have any ideas for future videos, feel free to put them in the comments. many videos have come up as an idea someone suggested in the comments, you can also tweet me.

My username is louis likesmath and if you want to see a repository of all this information, all these videos, the book, blog posts, etc., please take a look at this page it's serrano.academy so that's all for Thank you very much today and see you in the next video

Watch Video & Subscribe

If you have any copyright issue, please Contact