YTread Logo
YTread Logo

What are Gaussian Mixture Models? | Soft clustering | Unsupervised Machine Learning | Data Science

Apr 24, 2024
Hello and welcome in this video we are going to talk about guardian mix

models

, why do we need them? The journey we've had so far went something like this. We have seen groups that are very well separated and each point is assigned a unique one. group label which is known as hard grouping, which means that each point is so well separated from the other points that it can only be given one membership and it is actually feasible for this type of

data

seen on the screen, but

what

if our

data

was something like this, it's all mixed up, so if you look at it carefully, there's a chance to see a group like this, which is more of a group for these orange dots, and another group like this, which is a more superior group for these blue points. points, but the reason why a conventional technique like k-means would not work here is because the center of these two groups overlaps and we know that k-means is a centroid based technique, so it decides the sending of the membrane of the group based on proximity to the centers.
what are gaussian mixture models soft clustering unsupervised machine learning data science
What happens if the centers overlap like this? In such cases, Gaussian

mixture

models

come to our rescue. Let's say we have some data points in a one-dimensional space, so these are different data points that we have and they are all real numbers, which means they are numerical values, they are not classes like zero and one, they are all real numbers, so whenever real numbers appear before us, the first thing that comes to mind is a normal distribution or a Gaussian distribution, so the Gaussian

mixture

model assumes that these data points have been generated using some kind of Gaussian distribution and to begin with we also need to decide how many Gaussian distributions we are talking about, so in a way this is similar to deciding the number of K groups in the case of k-means, here we are deciding the number of Gaussian distributions, let's say that We start by assuming that there are two Gaussian distributions, so as part of step one we randomly initialize the Gaussian node, 2 is just a number we choose in this example, we don't have to do it.
what are gaussian mixture models soft clustering unsupervised machine learning data science

More Interesting Facts About,

what are gaussian mixture models soft clustering unsupervised machine learning data science...

If we always have only two Gaussians we could also have more and we are working on a particularly simple example which is just one dimensional data while our data could be multidimensional once we have randomly initialized these two Gaussian distributions, in the next step we have to calculate something known as responsibility in very simple terms. I can say that it is the probability of membership for each point, so for each data point we must assign how similar it is to a point that belongs to this blue distribution or to this orange distribution. for example, a data point here, although it is in a less dense blue region, is still far from the orange distribution, so it is more likely to be blue compared to orange.
what are gaussian mixture models soft clustering unsupervised machine learning data science
This is just an appearance, it will always have a small orange shape that is not even visible here; Similarly, a data point here that is in a relatively less dense orange region is even further away from the blue distribution, so it is more likely to be orange and

what

about the data points in between? They are more likely a mix. because in this region there is a bit of uncertainty, the same would be the case with these points, so all these points in the middle would be claimed by both distributions, the proportions could vary, for example here there may be an equal claim for both distributions.
what are gaussian mixture models soft clustering unsupervised machine learning data science
Where is the point that is very close to the center, which is the high density region for any distribution? You might be more likely to get that color. This color coding is something you may consider a liability, so we say this point is more likely. be a point that belongs to the blue distribution, so the probability that this point is a point of the blue distribution is higher compared to the probability that this point belongs to the orange distribution. Note that liability or just another name for probability as we are calling it would always add up to one, so if you have three Gaussians maybe all three would claim each point, but all the probabilities of belonging to these distributions will always add up to one because only There are three possibilities in our case that we are analyzing.
There are two possibilities and that is why we say that these will add up once we have finished assigning responsibility to each data point. The next step is to focus only on specific points for particular Gaussians, which means that now we are only going to consider the blue points first and then consider the other points, which means that we will have to consider these points for each Gaussian distribution separately, so we have already made the assignment of responsibility. Now, based on the presence of these blue dots, let's reconsider the distribution. assignment, which means that we are going to calculate the normal distribution once again.
A normal distribution, as you know, is characterized by its mean and standard deviation. The mean is where the centralized and standard deviation determines the shape of the normal distribution. So let's say we come to this. quadrant as a revised distribution and similarly for the orange data points, we once again compute the normal distribution. Remember that originally we just did a random initialization of these Gaussians, now we are calculating based on responsibility, so let's say if we find this version distribution, we have These as our distributions so far we are supposed to repeat certain steps, so with Regarding these distributions now let's look at the data points again and once again we need to determine liability.
Note that responsibility is influenced by Gaussian causes and parameters. They are determined by responsibilities, so it is a kind of cyclical process. We start with the random Gaussian distributions, calculate the liabilities, and then based on those liabilities, we review the new distribution parameters like the mean and variance and say that once again we will calculate the liability. So based on this blue Guardian distribution, maybe a lot of these points are now closer to the center, so we'll say these points will be mostly blue when you see this as completely blue. Note that the normal distribution is asymptotic, meaning this distribution is not. limited to this point or this blue distribution is not only limited to this point, they extend to Infinity, so this would always have some influence from the orange distribution here as well, but that is perhaps minuscule, similarly, these points here while they are closer to the maximum density region of a normal distribution which is orange, they will also have some influence from the blue distribution, but maybe that is minuscule at the moment, while the points in between would always tend to have more mixing, so What the first difference here compared to classical

clustering

algorithms is that these points are grouped in a striking way, we are not saying that they have to strictly belong to one group or another, we accept that there can be a combination, but there can be an influence majority of a particular distribution and we will be guided by that and these.
The steps for calculating liability and checking model parameters will be repeated, for example you may end up getting this as the final output. At what stage do we stop? When do we say that we have converged or that we have reached the final conclusion when we do? We don't see much change in the parameters of these Gaussian distributions when we continue to repeat these steps, but beyond the point where the distributions don't really change significantly, we say that we have converged and that is how we will determine the groups, so once again to summarize which is the similarity between Gaussian mixture models and k-means

clustering

just like K means clustering.
This is also an iterative approach in K means we used to iteratively determine these centroids until we finally converge here we iteratively determine the mean and variance and this is applicable only when we are dealing with one dimensional data at the time we move to multidimensional space we will no longer have just a mean value we will have a mean vector in the same way we will not have just a variance we will have a variance covariance Matrix, which means there will be much more complicated calculations there, but the basic IES is still the same, that's why we try to understand it with the help of a one-dimensional exam.
What is different about Guardian blend models, as opposed to K stands for Guardian blend? The models follow a

soft

clustering technique, which means that we do not assume that points strictly belong to a group. We say that there is a possibility that a given point could belong to multiple groups and we can observe the proportion even if we do not go too deep. The technicality of these steps that we have followed here confirms that this is a probabilistic model because we are assuming that there is a distribution, whereas in the case of K it means that we never assume that the data follows a specific distribution and steps two and three that you I have seen here where first the responsibility will be calculated based on the random initialization and then in the next step we will calculate the distribution parameters based on the responsibility and alternatively we will continue repeating these steps.
This is part of the expectation maximization algorithm. Step 2 It is known as step e or expectation step. Step 3 is known as the maximization step. Please note that the equations related to Guardian mixture models can be quite daunting because they involve a large number of audiences, but for our purpose we have tried to explain this in the simplest way possible without using too many objectives, so I hope you have clarity on how Gaussian mixture models are useful and in which cases they are more meaningful compared to BK. Thank you.

If you have any copyright issue, please Contact