Hypothesis Testing Example

The Problem:
Suppose we have a quarter, and we want to decide if that quarter is fair or not. We toss the quarter 200 times and observe 110 heads. Test if the quarter is fair or not. Use a 5% level of significance.

The Solution:
The key word "Test" is what tells us that we are doing a hypothesis test.

Hypotheses: We must decide what parameter we are testing about: p or µ. That is, is the problem asking about a proportion/percentage, or is it asking about an average/mean? In this case, a coin would be considered fair if the true proportion of heads was 0.5, so our parameter must be p.

Now we must set up our hypotheses regarding p. We are trying to test if the coin is fair or not. A fair coin would be one where p = 0.5, while an unfair coin would be any other one, i.e. where p ≠ 0.5. These are our two hypotheses: p = 0.5 and p ≠ 0.5. The one with the equal sign is always the H0, so now we have our hypotheses:

H0: p = 0.5
H1: p ≠ 0.5

Alpha: The alpha value is also sometimes called the significance level of the test. Sometimes it is given in a problem, but if not, you can make your own choice, with α = 0.05 being a good choice. This time we were told to use a 5% significance level, so:

α = 0.05

Decision Rule: This is a statement that tells us how we will make our choice between the hypotheses after we look at the data. In particular, it will tell us what will be necessary for us to reject H0.

We must make the following decisions:
a) What statistic is best to talk about our parameter of interest, and what standard statistic does that statistic turn into (so we can use a table)? If our parameter was µ, then we would use the statistic X-bar, which would turn into a standard statistic of either z or t (depending on if we know σ or not). But this time our parameter is p, so our best statistic is p-hat, which turns into a standard statistic of z.
b) What form will our decision rule take? There are three options, and they all depend on the form of H1.
- If H1 has a > sign, then our test is a one-tail upper tail test, and our decision rule will be of the form "If std stat > critical value, then reject H0."
- If H1 has a < sign, then our test is a one-tail lower tail test, and our decision rule will be of the form "If std stat < -critical value, then reject H0."
- If H1 has a ≠ sign, then our test is a two-tail test, and our decision rule will be of the form "If std stat > critical value, or std stat < -critical value, then reject H0."
In our example, H1 has a ≠ sign, so our decision rule will take the form "If std stat > critical value, or std stat < -critical value, then reject H0."
c) What critical values chop off the right amount of area? Our alpha value is the amount of area we need to have in one or both tails, depending on if this is a one- or two-tail test. If it is a one-tail upper tail problem, then we find the z- or t-score that chops off the alpha area in the upper tail. If it is a one-tail lower tail problem, then we find the z- or t-score that chops off the alpha area in the lower tail. And if it is a two-tail problem, then we split the alpha area in half, and find the z- or t-score that chops off that half-area in the upper tail (remembering that the negative of that score will chop off the matching area in the lower tail.) On our example, we split the alpha = 0.05 area in half, to get half-area = 0.025, and we look for the z-score that chops off an area of 0.025 in the upper tail. That z-value would be 1.96.

Putting all of that together, we get the following decision rule:

DR: If z > 1.96 or z< -1.96, then reject H0.

Statistic: Now we must calculate the statistic (z-score or t-score) we identified in our decision rule. In our example, our statistic is p-hat = 110/200 = 0.55, and we must turn it into a z-score. Since p-hat has a normal distribution with mean of p and standard deviation of sqrt(pq/n), and since H0 tells us that p = 0.5, we know the mean is p = 0.5 and std dev is sqrt(pq/n) = sqrt(0.5*0.5/200) = 0.0354. So the z-score is:

z = (0.55 - 0.50)/0.0354 = 1.41.

Conclusion 1: We look at the statistic we calculated, and we plug it into our decision rule to make our decision. In our example, since z = 1.41 is NOT more than 1.96, nor is it less than -1.96, we fail to reject H0. That is conclusion 1:

Fail to reject H0.

Conclusion 2: We translate our conclusion into English, as though we were summarizing what we found for someone who maybe didn't know statistical lingo, or perhaps had asked us a particular question. In our example, we failed to reject H0, which means we voted for H0, so we voted for the coin being fair. That is one way to state this 2nd conclusion:

The coin is fair.