YTread Logo
YTread Logo

What are p-values?? Seriously.

Jun 06, 2021
Hello team, welcome to this video on p-

values

. Before we begin, I want to show you that you already have the necessary intuition for p-

values

​​and I will show you that using this two-dollar Australian coin, which yes, is smaller than the one-dollar coin, I cannot explain it. What happens if I flip this coin 100 times? How many heads do you think I'm going to throw? Could you flip 50 heads? How about 52 faces? Could you get 56 heads out of 100? If I told you I got 90 heads out of 100 coin flips,

what

does your gut tell you about

what

's happening with this coin to make that happen now?
what are p values seriously
I think most people who watch this video will tell me, Justin, there's something wrong with that coin. There's no way you can realistically get 90 heads out of 100 in 100 coin flips, but what you've actually done without knowing it is you've asked yourself this question: You've asked yourself if the coin was fair, how? You are likely to get 90 heads out of a hundred, or in other words, how extreme our sample is, given that the coin is fair. Now that my friends have the right line of thinking for p-values, let's explore that concept a little more. The name is justin zeltzer and these are z-stats.
what are p values seriously

More Interesting Facts About,

what are p values seriously...

As you can see, this video is part of a series on health statistics called health statistics iq. You can check out all the other videos at zed stats.com, but for now let's dive in. directly to p-values, so I'm going to finish that explanation around the currency example we looked at in the introduction, after which we'll look at the history of p-values, how they actually came about, and then we'll look at the difference between p-values ​​of one-tailed and two-tailed and I finally found a really interesting article to look at that shows how p-values ​​tend to be used and how realistically you will analyze them, so let's go back to our little currency example now, before we evaluate what a p value, we need to have something to test and this in statistics is called the null hypothesis, it's like our default position that we take in the hopes of seeing if there is enough evidence to reject it. our default position here is that the coin is fair and in the example I gave we had 100 coin tosses and clearly then if the coin is fair then under the null hypothesis we would expect 50 heads of course we know that Due to the randomness of sampling , we won't necessarily get 50 heads of a fair coin, we might get more than 50, we might get less than 50, but the distribution will center on 50.
what are p values seriously
This is our number of heads now. Why this is a bell curve is a little beyond the scope of this video. If you want to know more, you can watch my video on the binomial distribution if you want to find out why this is so. It looks like a very normal distribution or bell curve shape, but for now just keep in mind that you are more likely to get numbers closer to 50 than numbers further away from 50, so you are less likely to get, say, 60 heads. 70 heads 80 heads becomes much less likely in 100 coin tosses. Now let's take one of those scenarios again where we had 56 heads, so let's say I had a sample of 100 coin flips and I got 56 out of 100 heads and that exists.
what are p values seriously
Here in the distribution to the right of our midpoint, which is 50, so somewhere in this kind of upper tail, what is the p value. This is the official definition. It is under the null hypothesis. The p value is the probability of obtaining a sample. as or more extreme than ours, so here it is 56, the p-value would be represented by all possible samples above 56 heads, in other words, more extreme than 56 if we consider the coin to be fair, under the null hypothesis where the coin It's fair that a more extreme sample is one that is furthest from where we might expect the sample to be, which is 50.
Now here's the interesting thing: it's not just this region that is as or more extreme than our sample is actually the reflected region. on the other side, also from 44 heads down, don't forget that 44 heads actually means 56 tails, so it's as extreme a sign to say you have 56 tails as it is to say you had 56 heads correctly, so all these samples here down where we go 44 faces 43 faces 42 faces, etc. to zero, they will be as extreme or more extreme than our current sample, which is 56. So these two regions will become what is called our p-value and again While this is beyond the scope of the video, you can calculate that region which is 0.193 Thinking graphically that it is actually 19.3 percent of the area under this curve, in other words, we can say that if p is 0.193, we can say that if the coin was fair, the probability of getting 56 heads or a more extreme sample That's 19.3 percent, so I'm just converting that to a percentage to give us an idea of ​​how extreme our sample was. 56 faces.
Turns out it's actually not that extreme, it's very possible. so that we would get a sample of 56 heads if the null hypothesis was still true i.e. the coin is fair, so this should not be alarming for us to think that the coin is rigged if we get 56 heads as our sample, but let's see what This happens if we have 60 heads, so there are 60 and it's a little bit further than 50. So our p value is now much smaller because the area in the most extreme sections of this curve is much smaller and here it turns out that the p-value is now 0.032, so again, if the coin was fair, the probability of getting 60 heads or a more extreme sample is actually 3.2 percent, so now it is quite small and, as we will see normally , which happens when the p value falls below. around 0.05, which is actually this, we tend to start slandering the null hypothesis, we start thinking, you know, that null hypothesis might not be so true anymore, it seems pretty unlikely that we would get 60 sides of a fair coin now , with the example I gave in fact, the introduction was not 60 heads, it was 90 heads.
I can't even put this on the graph because it's so far to the right of where I had everything else before, but 90 would be way up somewhere on the right side and you can see that the area represented by the most extreme parts of the curve in In reality it will be practically zero, so I wrote here that p is equal to 0.000; Realistically, it won't be exactly zero, but to three decimal places it certainly will be and We can say that if the coin were fair, the probability of getting 90 heads when flipping it a hundred times is practically zero, so if I told you that I got 90 heads on hundred tosses, you would rightly say there is something wrong with that coin.
My friend, now you're probably thinking, look, this is a health statistics video, why are we talking about flipping a coin? Interestingly, the history of p-values ​​in health dates back to 1710 and is actually very similar to the coin toss example, John. Arbuthnot, an 18th century Scottish guy in his article, an argument for divine providence, what he did was he reviewed 82 years of birth data, but what he found was that every year that data was collected, more men were born than women. , and while he didn't actually use the phrase p-value, he came to this conclusion if male and female babies were equally likely to be more numerous 82 years in a row is this very, very small number 2.01 multiplied by 10 to the power of negative 25 which is actually 0.00000 25 of those zeros then 201 but I hope you can see how this looks a lot like the coin toss example.
Basically I had a null hypothesis that both sexes would have the same probability at birth, so in any given year it's like flipping a coin to determine which sex is more numerous, so if we had 82 of those years, we would expect In 41 of those years more male babies were born than female babies and perhaps in 41 of those years more girls were born than boys, which would be the likely outcome. If in fact both sexes are equally likely then it is a very similar situation. There is our expected value. 41 we end up with 82 years out of 82 with more baby boys is like turning heads 82 times in a row what is the probability that your The coin is fair if you just flipped 82 heads in a row, since we found out that it is 2.01 times 10 to the power minus 25.
I can't even draw the point 82, but it is the small region that is more extreme than 82, which in fact This is practically zero in this case, so this article was not only instrumental in telling us that in humans male babies are born slightly more likely than female babies; It's certainly mild, I think it's about 51 percent, but he also introduced us to this concept. or the logic that would soon become the p value simeon poisson in 1837 in his article richard probably etc. His article was an investigation of criminal trials particularly criminal trial juries and he did something interesting: he made two particular comparisons in that article: one one of them had a probability of occurring by chance of 0.0897 and the other one had the other comparison that he made , he said it had a probability of occurring by chance of 0.00468, so what he did was he said well, when it was .0897, he said you know what seems reasonable, maybe this happened by chance, but when it went down to 0.00468 he said, you know which is now too small and I don't think this happened by chance, I think there is something else going on, something structural going on with these two sets of jurors and that was really the first introduction to this kind of vague idea of ​​a more critical value beyond which we will consider significant p values ​​and it was finally in 1925 when Ronald Fisher in his book Statistical Methods for Research Workers.
He finally defined what the p-value is, which is what we've been dealing with in this video, and also explained that the p-value of 0.05 will be that useful cutoff point that turned out to be halfway between the two Poisson groups. . which compared and therefore we have the p-value okay so now let's see what the difference is between a one-tailed and a two-tailed p-value so let's look at a two-tailed p-value first because it's kind of what we were already looking at here are three hypotheses that we could test that will give us a two-tailed p value, so we've already seen one where the coin is fair is our null hypothesis, but also in health.
In science, you'll get a lot of studies where they look at sex differences between men and women, if our null hypothesis, if our default position is that there are no differences between the sexes, then this could confuse us again with the p value of two queues when we take a sample of men and women and evaluate the differences in the score of whatever we are testing and finally another example that comes to mind is where we might be looking at anxiety medications and the side effects of that medication. In particular, we could have a null hypothesis that says the medication has no effect on someone's weight.
Now, what will make this a two-tailed p-value for each of these scenarios is that we can reject these null hypotheses either way. In other words, for the coin we can reject that statement if we get many more heads than tails, but we can also reject that statement if we get many more tails than heads and that is where this second statement that we are going to make arises. Calling the alternative hypothesis is so because the alternative hypothesis is simply that the coin is skewed in either direction, this will end up being a two-tailed p-value, the same with the sex difference if our alternative hypothesis is simply that there is a difference.
That can happen when men get significantly higher scores than women or women get considerably higher scores than men. They are two different ways. The same goes for anxiety medication, the medication could increase or decrease the person's weight and either of those situations would be relevant, so let's take a look and see. What this distribution might look like, as we saw before, is the distribution of samples if the null hypothesis is true, so for our coin example, the expected percentage of heads is 50 and if we have a sample here we would construct a value p from both regions because that entire shaded region represents samples that are more extreme than the one we got, in other words, for the purpose of these hypotheses, these values ​​on the lower side here on the left side are as extreme as the on the positive side and that's true when looking at the sample difference between the sexes, just as the expected difference between the sexes would be zero and let's just say that our sample result turned out to be up here, which shows that maybe women got scores higher than men in whatever we are testing again. would be a two-tailed p-value, so I would add both regions to calculate the p-value, since all the results for the samples in the shaded regions represent samples that are as or more extreme than the one we obtained and similarly with the weight change for anxiety medications, it's the same thing if our sample was here showing you that we have an increase in weight, you would have two regions again to add to get to your p value, so what does a one tailed change mean ?
The p valuelooks good, it's not so much about how it looks but what kind of hypotheses would lend themselves to a one-tailed p-value and here's an example that I thought might make a little sense if we're looking. In the case of a particular drug that aims to reduce swelling after an injury, let's say we could start with the null hypothesis that it has no effect on swelling, but the alternative hypothesis here is that the drug specifically reduces swelling. , we are only interested in one direction to reject. this null hypothesis then if the change in the distribution of swelling looks a little like this and under the null hypothesis we are going to have 0 in the middle because that is the change in swelling that we would expect if the drug has no effect on the swelling , so in our sample that we received we hope that there will be no change in the person's swelling after taking the medicine, but if the result of our sample is down here, which is on the negative side, which shows that the swelling has reduced the p value again is the region represented by what is as extreme or more extreme than the sample we got, so that's this. yellow region here but it's just that yellow region so think about it if the medication was actually increasing the swelling and you were here somewhere in your sample so the poor person who took this medication trying to reduce the swelling in actually found an increase in swelling which is not going to tell us that the drug is effective because we are only interested in rejecting this null hypothesis in one way: the p value reflects that factor and therefore the values ​​that are so or more extreme than our sample are all on that left side. and here is another example that I thought of is seeing that mortality from coronavirus is equal between sexes.
If we try to show that coronavirus mortality is higher for men, specifically showing that it is higher for men, then we could look at the difference in mortality between men minus women and If our sample result is again on the higher side, we would find that that p value is just the region above our sample because if our sample shows that women in the sample have higher mortality than men, that's not going to happen. Help us with this test, we want to show that specifically men have a higher mortality than women. Now that distinction can be quite subtle because in the last example I gave you a difference between the sexes that was a two-tailed p value and yes, the difference. between those two situations is quite technical, but I hope you can see that the nature of the one- or two-tailed p-value is determined by what we are actually trying to test and whether it is one-way or whether we don't care if one group is higher than the other, we just want to show a difference now realistically.
If you are just interpreting the p-values ​​of your research, you don't have to worry about whether it is a one-tailed or two-tailed p-value like the statistics show. The experts behind the scenes have done all of that for you, but if you're doing the research yourself, you may have to know what kinds of things you're testing before you can start citing p-values, so with that said, let's take a look. research where they use a lot of p-values ​​and try to find out how they use them. This comes from this year's australian medical journal 2020 and it's called optimizing epilepsy management with a smartphone app and rct and here are the authors yang c et al.
What they were trying to do in this study was to see if people with epilepsy could better control their episodes using a smartphone app rather than not using the smartphone app. Okay, here we go, this is the medical study. australia magazine here is the article that we are looking at and I am going to go directly to the tables that we have or the tables that call them from the study, so p values ​​tend to be used for comparison between groups, so in this particular table We have the group that was assigned the app and the control group and you can see that there were 990 people enrolled in each of the two arms of this study, so this is common to any study that you are trying to evaluate. differences in results between the groups, you first need to make sure that the baseline characteristics of the people in each group are quite similar, so you can see that we have 54 men in the group that has the app and 54 men who have the app group. control as in By not using the app, presumably all of these people have epilepsy, so the p-value for that is actually 1.0, which tells us that's exactly what we'd expect if there's no sex difference between the two groups .
You can see that for age it is slightly higher for the app group only very slightly and the p value says 0.94, which tells us that if there was no age difference between the groups, the probability of getting this sample difference, which is only 0.1, the probability of getting that sample difference or one that is more extreme than that is 0.94, so it is very likely that we could get a difference like this if, in fact, the ages were the same between the groups, then what will happen is that as the differences between these factors increase and increase, the p values ​​will increase. they're getting lower and lower, so here's an example looking at the unemployment rate again, very similar 23 to 22, so our p-value is pretty high, which is good, we want to know that they're pretty similar, but when we look residence in terms of In urban areas you can see that the application group is slightly larger: the application group has 53 percent, while the control group has 46 percent, so there is a slight difference in terms of the tendency of people to live in urban areas if they are in the app group versus people who are slightly less likely to live in urban areas for the control group and you can see that the p value reflects that it is 0 .18, which is not worrying, but you can see that it is much smaller than all the other p values ​​we have.
What I've seen before tells us that there's a 18 chance that the difference we're seeing here occurred purely by chance, and as I said, 18 isn't worrying, but it's interesting to note that it's much lower than the others, so , what does an article tend to? What we want to do is make sure that all of these p values ​​are reasonably high, certainly above 0.05. You don't want any of them to be below 0.05. If we scroll down, you can see that yes, in fact, they are all above 0.05, which is good. because that tells us that there really isn't a significant difference between the two groups in terms of their baseline characteristics, but then we move on to the next chart that shows the results between the two groups.
Now I'm not an epilepsy expert, but these are all the possible outcome variables so you can see that the app group scored 144 as a total score versus 125 for the control group, so the difference seems to be quite large and In fact, the p-value tells us that it is a significant difference in other In words, the probability that the difference we are seeing here occurred purely due to random chance is less than 0.001. It seems much more likely that there really is a significant difference between these two groups and you can see that many of the other outcome measures here have e.g. -values ​​less than 0.01, except for these two here, those two itt seizure management and pp seizure management, their p values ​​are a little bit higher, so that would tell us that we don't have a significant difference between those two groups in those results but for all the others there is a significant difference, so it is interesting that we actually have two different scenarios here, in the original one we analyzed we expected to get high p values ​​to show that there are no differences between the groups in all of these baselines. characteristics and then we wanted to see low p values ​​when we analyzed the results between the groups and, in fact, that's what we found, so that's it, team.
Thanks for sticking with me. If you got something out of this video, tell your friends. I can increase the audience a little bit and that will allow me to make more of these videos and you can subscribe and like the video, do all those beautiful things if you can and if you want to check out some of the other videos that are out there. I'll keep an eye on zed stats.com. I will see you later.

If you have any copyright issue, please Contact