class: center, middle, inverse layout: yes name: inverse ## STAT 305: Chapter 6 - Part II ## Hypothesis Testing ### Amin Shirazi .footnote[Course page: [ashirazist.github.io/stat305_s2020.github.io](https://ashirazist.github.io/stat305_s2020.github.io/)] --- class: center, middle, inverse layout: yes name: inverse # Hypothesis Testing ## Deciding What's True (Even If We're Just Guessing) --- class: center, middle, inverse layout: yes name: inverse # Let's Play A Game ## A "Friendly" Introduction to Hypothesis Tests --- layout:false .left-column[ ### My Game ### The Rules ] .right-column[ ## Let's Play A Game The semester is getting a little intense! You are a livinLet's break the tension with a friendly game. Here are the rules: - I have a new deck of cards. 52 Cards, 26 with Suits that are Red, 26 with Suits that are Black - You draw a red-suited card, you give me a dollar - You draw a black-suited card, I give you two dollars **Quick Questions** What is the expected number of dollars you will win playing this game? Would you play this game? ] --- name: inverse layout: true class: center, middle, inverse --- # Are We Forgetting Something? --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ] .right-column[ ### Be Careful About Your Assumptions Pause for a minute and think about what you are assuming is true when you play this game. For instance, - You assume I'm going to shuffle the cards fairly - You assume there are 52 cards in the deck - You assume the deck has 26 red-suited cards in it - You assume the deck has **a** red-suited card in it How can we make sure the assumptions are safe?? - Shuffling assumption: watch me shuffle, make sure I'm not doing magic tricks, etc - 52 Cards assumption: count the cards - Red-suit assumption: Count the number of red cards Whew! We can actually make sure all of our assumptions are good! ] --- name: inverse layout: true class: center, middle, inverse --- # One Problem ## I Refuse to Show You The Cards --- <img src="aladdin-trust.jpg" alt="no!" width="620"> ## Do You Trust Me? --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ] .right-column[ ### Our Assumptions I'm not going to show you all the cards. In other words, I refuse to show you the _population of possible outcomes_. This is justified: we are in a statistics course after all. So, let's start with our unverifiable assumption: Is it safe to assume that this is a fair game. Why would we make this assumption? - You trust that I'm (basically) an honest person (*assumption of decency*) - You trust that I'm getting paid enough that I wouldn't risk cheating students out of money (*assumption of practicality*) - You saw the deck was new (*manufacturer trust assumption*) - You want it to be an fair game because you would win lots of money if it was (*assumption in self-interest*) ] --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ] .right-column[ ### Our Assumptions In statistical terminology, we wrap all these assumptions up into one assumption: our "**null hypothesis**" is that the game is not rigged - that the probability of you winning is 0.5 >**Null Hypothesis** </br> >The assumptions we are operate under in normal circumstances (i.e., what we believe is true). We wrap these assumptions up into a statistical/mathematical statement, but we will accept them unless we have reason to doubt them. We use the notation `\(H_0\)` to refer to the null hypothesis. In this case, we could say that the probability of winning is `\(p\)` and that would make our null hypothesis `$$H_0: p = 0.5$$` ] --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ] .right-column[ ### Our Assumptions Of course our assumptions could be wrong. We call the other assumptions our "alternative hypothesis": >**Alternative Hypothesis** </br> >The conditions that we do require proof to accept. We would have to change our beliefs based on evidence. We use the notation `\(H_A\)` (or sometimes, `\(H_1\)`) to refer to the alternative hypothesis. In this case, we could say that our alternative to believing the game is "fair" is to believe the game is not fair, or that the probability of winning is not `\(0.5\)`. We write: `$$H_A: p \ne 0.5$$` ] --- name: inverse layout: true class: center, middle, inverse --- # A Compromise ## I Won't Show You All The Cards ## But I Will Let You Test The Game --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ### The Test ] .right-column[ ### Testing the Game The test of whether or not the game is worth playing can be defined in term of whether or not our assumptions are true. In other words, we are going to test whether our null hypothesis is correct: >**Hypothesis Tests** </br> A **hypothesis test** is a way of checking if the outcomes of a random experiment are _statistically unusual_ based on our assumptions. If we see really unusual results, then we have **statistically significant** evidence that allows us to **reject our null hypothesis**. If our assumptions lead to results that are not unusual, then we **fail to reject our null hypothesis**. ] --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ### The Test ] .right-column[ ### Testing the Game So how can we test the game? What if we tried a single round of the game? - What are the probabilities of the outcome of a single game? - If we draw a single card do we have enough evidence that the game is fair? - Do we have enough evidence that the game is rigged? Based on a single round of the game, both of the possibel outcomes are pretty normal - that's not good enough. If we draw a losing card, then we might be inclined to call the game unfair - even though a losing card is pretty common for a single round of the game If we draw a winning card, then we might be inclined to call the game fair - even though a winning card may be common even when the game is not fair! **We can make lots of mistakes!!** ] --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ### The Test ### The Errors ] .right-column[ ### The Mistakes We Might Make We could of course be wrong: For instance, we could, just by random chance, see outcomes that are unusual for the assumptions we make and reject the assumptions even if (in reality they are true). This is called a "Type I Error" >**Type I Error** </br> >When the results of a hypothesis test lead us to reject the assumptions, while the assumptions are actually true, we have committed a Type I Error. ] --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ### The Test ### The Errors ] .right-column[ ### The Mistakes We Might Make A common example of this is found in criminal court: - We assume that a individual accused of a crime is innocent (our assumption) - After examinig the evidence, we conclude that it is there is no reasonable doubt the person is not innocent (in other words, we reject the assumption because it is very unlikely to be true based on our evidence). - If the person truly is innocent, then we have committed a Type I error (rejecting assumptions that were true). ] --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ### The Test ### The Errors ] .right-column[ ### The Mistakes We Might Make We could also make a different error: we could choose not to reject the assumptions when in reality the assumptions are wrong. >**Type II Error** </br> >When the results of a hypothesis test lead us to fail to reject the assumptions, while the assumptions are actually false, we have committed a Type II Error. ] --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ### The Test ### The Errors ] .right-column[ ### The Mistakes We Might Make Again, if we consider the example of criminal court: - We assume that a individual accused of a crime is innocent (our assumption) - After examinig the evidence, we conclude that it is there is **not** evidence beyond a reasonable doubt the person is not innocent (in other words, the evidence is not enough to reject our assumption because it is still reasonable to doubt the accused's guilt). - If the person truly is not innocent, then we have committed a Type II error (failing to reject assumptions that were false). In general, we want to make sure that a Type I error is unlikely. To take the example of court again, - We commit a Type II error: a guilty person goes free - We commit a Type I error: an innocent person goes to jail; the guiilty person is still free ] --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ### The Test ### The Errors ] .right-column[ ### The Mistakes We Might Make Let's go back to my game: We assume I am an honest person (i.e., we assume that the probability of winning a single game is `\(p = 0.5\)`)<br> **Type I Error: Rejecting True Assumptions** - We gather evidence - Looking at our evidence, we decide that the game was not fair even though it was. - Fallout: you slander me, you disparge me, we have a fight, BOOOM.<br> **Type II Error: Failing to Reject False Assumptions** - We gather evidence - Looking at our evidence, we decide that the game was fair even though it was not. - Fallout: you play the game and lose some money. Ideally, we won't make either error. However, we can only base our decision of our evidence we can gather - the truth is out of our grasp! ] --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ### The Test ### The Errors ### The Evidence ] .right-column[ ### Gathering Statistical Evidence Okay, so we don't want to make either error - that means we need good evidence. Like we talked about before, even if the game is fair one test round of the game would not be enough to make a good decision since drawing a red-suited card and drawing a black-suited card are both pretty normal for a single round of the game. But what if we played the game 10 times in a row? After 10 rounds, do you think we would have enough evidence to make a decision about our assumption? ] --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ### The Test ### The Errors ### The Evidence ### p-value ] .right-column[ ### p-value If we assume the null hypothesis, then we can make some assumptions about what results are likely and what results are unlikely. We describe the likelihood of the results that we actually get using a **p-value** >**p-value** </br> >After gathering evidence (aka, data) we can determine the probability that we would have gotten the evidence we did if our assumptions were true. That probabiliity is called the p-value. If the p-value is really, really small that means that the assumptions we started with are pretty unlikely and we reject our assumptions. If the p-values is not small, then the evidence collected (aka, the data) is pretty normal for our assumptions and we fail to reject our assumptions. ] --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ### The Test ### The Errors ### The Evidence ### p-value ] .right-column[ ### p-value In other words, we collect evidence and determine a way to measure the whether or not our data was unusual *if our assumptions are true*. If we have a very, very low chance of - seeing both our results and - having true assumptions then we reject the assumptions Going along with the terminology we have introduced, if we have a small p-value then we reject our null hypothesis. ] --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ### The Test ### The Errors ### The Evidence ### p-value ] .right-column[ ### Gathering Statistical Evidence In this game, if we assume that the game is fair, we have - two outcomes: success (winning) and failure (losing) - a constant chance of a successful outcome `\((p = 0.5)\)`, assuming the game is fair) - independent rounds of the game (assuming fair shuffle, which we can check) In other words, if we test the game 10 times we can model the number of successful outcomes as binomial: For `\(X\)` = the total number of wins, $$ P(X = x) = \dfrac{10!}{x!}{(10-x)!} (0.5)^x (1 - 0.5)^{10 - x} $$ This gives us a way of getting our p-value ] --- name: inverse layout: true class: center, middle, inverse --- # Let's Test the Game --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ### The Test ### The Errors ### The Evidence ### p-value ### The Conclusion ] .right-column[ ### Gathering Statistical Evidence We played the game. Let's figure out whether our results were unusual or not. Again, we assume the game is fair and have decided that the number of times we win will follow a binomial distribution with probabiliity function $$ P(X = x) = \dfrac{10!}{x!}{(10-x)!} (0.5)^x (1 - 0.5)^{10 - x} $$ Now we need to make a conclusion: do we accept or reject our assumptions? What do we consider unusual? Is it fair to decide after we play? ] --- layout:false .left-column[ ### My Game ### The Rules ### The Assumptions ### The Test ### The Errors ### The Evidence ### p-value ### The Conclusion ] .right-column[ ### Summary - Sometimes we can know if something is true or not by examining the truth directly, but not always - When we can't examine the truth, we need to test what we believe to be true - A statistical test is a tool for testing our assumptions about what we believe - We state our assumed belief (generally our current beliefs, or the ethical beliefs, or the beliefs we hope are true, ...) - We come up with a way of collecting data that could validate or invalidate our assumption - We measure how likely it was that we would have gathered the data we did if our assumptions were correct - We reject the assumptions if our data is very unlikely we are our current beliefs ] --- layout: true class: center, middle, inverse --- ##Now let's make everything ##a little more formal --- layout: true class: center, middle, inverse --- #Section 6.3 #Hypothesis Testing --- layout:false .left-column[ ###Hypothesis Testing ] .right-column[ ## Hypothesis testing Last section illustrated how probability can enable confidence interval estimation. We can also use probability as a means to use data to quantitatively assess the plausibility of a trial value of a parameter. **Statistical inference** is using data from the sample to draw conclusions about the population. >1. **Interval estimation (confidence intervals):** <br> <span style="color:red">Estimates</span> population parameters and specifying the degree of precision of the estimate. >2. **Hypothesis testing:** <br> Testing the <span style="color:red">validity</span> of statements about the population that are formed in terms of parameters. ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ] .right-column[ ###Definition:<br> Statistical **significance testing** is the use of data in the quantitative assessment of the plausibility of some trial value for a parameter (or function of one or more parameters).<br> Significance (or hypothesis) testing begins with the specification of a trial value (or **hypothesis**). A **null hypothesis** is a statement of the form `$$\text{Parameter}=\#$$` or `$$\text{Function of parameters}=\#$$` for some `\(\#\)` that forms the basis of investigation in a significance test. A null hypothesis is usually formed to embody a status quo/"pre-data" view of the parameter. It is denoted `\(\text{H}_0\)`. ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ] .right-column[ ###Definition:<br> An **alternative hypothesis** is a statement that stands in opposition to the null hypothesis. It specifies what forms of departure from the null hypothesis are of concern. An alternative hypothesis is denoted as `\(\text{H}_a\)`. It is of the form `$$\text{Parameter}\not=\# \quad$$` or `$$\quad \text{Parameter}>\# \quad \text{ or } \quad \text{Parameter}<\# \quad$$` Examples (testing the true mean value): `\begin{eqnarray*} \text{H}_0: \mu = \# \quad & \text{H}_0: \mu = \# \quad&\ \text{H}_0: \mu = \# \\ \text{H}_a: \mu \not= \# \quad & \text{H}_a: \mu > \# \quad & \text{H}_a: \mu < \# \end{eqnarray*}` Often, the alternative hypothesis is based on an investigator's suspicions and/or hopes about th true state of affairs. ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ] .right-column[ The **goal** is to use the data to debunk the null hypothesis in favor of the alternative. 1. Assume `\(\text{H}_0\)`.<br> 2. Try to show that, under `\(\text{H}_0\)`, the data are preposterous.(using probability)<br> 3. If the data are preposterous, reject `\(\text{H}_0\)` and conclude `\(\text{H}_a\)`.<br> The outcomes of a hypothesis test consists of: ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ] .right-column[ ###Probability of type I error It is not possible to reduce both type I and type II erros at the same time. The approach is then to fix one of them. We then fix the **probability of type I error** and try to minimize the probability of type II error. >We define the <span style="color:red">probability of type I error</span> to be `\(\alpha\)` (the significance level) ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ] .right-column[ **Example:** [Fair coin] Suppose we toss a coin `\(n=25\)` times, and the results are denoted by `\(X_1,X_2,\dots,X_{25}\)`. We use `\(1\)` to denote the result of a head and `\(0\)` to denote the results of a tail. Then `\(X_1 \sim Binomial(1,\rho)\)` where `\(\rho\)` denotes the chance of getting heads, so `\(\text{E}(X_1) = \rho, \text{Var}(X_1) = \rho(1-\rho)\)`. Given the result is you got all heads, do you think the coin is fair? `\begin{align} \text{Null hypothesis}&: H_0: \text{the coin is fair or } H_0: \rho= 0.5\\ \text{Alternative hypothesis} &: H_a: \rho \ne 0.5 \end{align}` If `\(H_0\)` was correct, then `\(P(\text{results are all heads})= (1/2)^{25}< 0.000001\)` >I don't think this coin is fair (reject `\(H_0\)` in favor of `\(H_a\)`) ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ] .right-column[ In the real life, we may have data from many different kinds of distributions! Thus we need a universal framework to deal with these kinds of problems. We have `\(n= 25 \ge 25\)` iid trials `\(\Rightarrow\)` By CLT we know if `\(H_0: \rho = 0.5 (= \text{E}(X))\)` then $$\dfrac{\overline{X}- \rho}{\sqrt{\rho(1-\rho)/n}}\sim N(0,1) $$ We obsrved `\(\overline{X} = 1\)`, so $$\dfrac{\overline{X}- 0.5}{\sqrt{0.5(1-0.5)/25}}= \dfrac{1- 0.5}{\sqrt{0.5(1-0.5)/25}} = 5 $$ Then the probability of seeing as <span style="color: red"> *wierd or wierder* </span> data is `\begin{align} P(\text{Observing something wierd or wierder})& = \\ P(Z\ \text{bigger than 5 or less than -5} )\\ < 0.000001 \end{align}` ] --- layout:true class: center, middle, inverse #Significance Tests for a Mean `\(\mu\)` --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ] .right-column[ ### Significance tests for a mean **Definition:**<br> A **test statistic** is the particular form of numerical data summarization used in a significance test.<br> **Definition:**<br> A **reference (or null) distribution** for a test statistic is the probability distribution describing the test statistic, provided the null hypothesis is in fact true.<br> **Definition:**<br> The **observed level of significance or `\(p\)`-value** in a significance test is the probability that the reference distribution assigns to the set of possible values of the test statistic that are <span style="color: red">*at least as extreme as*</span> the one actually observed. ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ] .right-column[ ### Significance tests for a mean In the previous example, the test statistic was `\(\dfrac{\overline{X}- \rho}{\sqrt{\rho(1-\rho)/n}}\sim N(0,1)\)` In the previous example, the null distribution was `\(N(0,1)\)` In the previous example, the p-value was < 0.000001 ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ] .right-column[ ### Significance tests for a mean In other words: Let K be the test statistics value based on the data Say `\begin{align} H_0&: \mu = \mu_0\\ H_a&: \mu \ne \mu_0 \end{align}` `$$P(\text{observing data as or more extreme as K} )\\= P(Z< -K\ or \ Z>k )$$` is defined as the <span style= "color:green">p-value</span> ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ] .right-column[ ### Significance tests for a mean Based on our results from Section 6.2 of the notes, we can develop hypothesis tests for the true mean value of a distribution in various situations, given an iid sample `\(X_1, \dots, X_n\)` where `\(\text{H}_0: \mu = \mu_0\)`. Let `\(K\)` be the value of the test statistic, `\(Z\sim N(0,1)\)`, and `\(T\sim t_{n - 1}\)`. Here is a table of `\(p\)`-values that you should use for each set of conditions and choice of `\(\text{H}_a\)`. <center> <img src= "table.png"> </center> ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ] .right-column[ ### Steps to perform a hypothesis test >1. State `\(H_0\)` and `\(H_1\)` >2. State `\(\alpha\)`, significance level, usually a small number (0.1, 0.05 or 0.01) >3. State form of the test statistic, its distribution under the null hypothesis, and all assumptions >4. Calculate the test statistic and p-value >5. Make a decision based on the p-value(if p-value < `\(\alpha\)`, reject `\(H_0\)` otherwise we fail to reject `\(H_0\)`) >6. Interpret the conclusion using the consept of the problem ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ] .right-column[ **Example:**[Cylinders] The strengths of `\(40\)` steel cylinders were measured in MPa. The sample mean strength is `\(1.2\)` MPa with a sample standard deviation of `\(0.5\)` MPa. At significance level `\(\alpha = 0.01\)`, conduct a hypothesis test to determine if the cylinders meet the strength requirement of 0.8 MPa. ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ] .right-column[ **Example:** [Concrete beams] 10 concrete beams were each measured for flexural strength (MPa). The data is as follows. [1] 8.2 8.7 7.8 9.7 7.4 7.8 7.7 11.6 11.3 11.8 The sample mean was `\(9.2\)` MPa and the sample variance was `\(3.0933\)` MPa. Conduct a hypothesis test to find out if the flexural strength is different from `\(9.0\)` MPa. ] --- layout:true class: center, middle, inverse --- #Hypothesis Testing Using Confidence Interval --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ] .right-column[ ### Hypothesis testing using the CI We can also use the `\(1-\alpha\)` confidence interval to perform hypothesis tests (instead of `\(p\)`-values). The confidence interval will contain `\(\mu_0\)` when there is little to no evidence against `\(\text{H}_0\)` and will not contain `\(\mu_0\)` when there is strong evidence against `\(\text{H}_0\)`. ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ] .right-column[ ### Hypothesis testing using the CI Steps to perform a hypothesis test using a confidence interval: >1. State `\(H_0\)` and `\(H_1\)` >2. State `\(\alpha\)`, significance level >3. State the form of 100 `\((1-\alpha)\)` % CI along with all assumptions necessary. (use one-sided CI for one-sided tests and two-sided CI for two sided tests) >4. Calculate the CI >5. Based on 100 `\((1-\alpha)\)` % CI, either reject `\(H_0\)` (if `\(\mu_0\)` is not in the interval) or fail to reject (if `\(\mu_0\)` is in the interval ) >6. Interpret the conclusion in the content of the problem ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ] .right-column[ **Example:**[Breaking strength of wire, cont'd] Suppose you are a manufacturer of construction equipment. You make `\(0.0125\)` inch wire rope and need to determine how much weight it can hold before breaking so that you can label it clearly. You have breaking strengths, in kg, for `\(41\)` sample wires with sample mean breaking strength `\(91.85\)` kg and sample standard deviation `\(17.6\)` kg. Using the appropriate `\(95\%\)` confidence interval, conduct a hypothesis test to find out if the true mean breaking strength is above `\(85\)` kg. Steps: >1- `\(H_0: \ \ \mu= 85 \ vs.\quad H_1: \ \ \mu> 85\)` >2- `\(\alpha= 0.05\)` ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ] .right-column[ **Example:**[Breaking strength of wire, cont'd] >3- One-sided test and we care about the lower bound. So, we use `\((\overline{X}- z_{1-\alpha}\dfrac{s}{\sqrt{n}}, +\infty )\)`. >4- From the example in previous set of slides, the CI is `\((87.3422, +\infty)\)`. >5- Since `\(\mu_0 = 85\)` is not in the CI, we **reject** `\(H_0\)`. >6- There is <span style= "color:red"> significant evidence</span> to conclude that the true mean breaking strength of wire is greater than the 85kg. Hence the requirement is met. ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ] .right-column[ **Example:** [Concrete beams, cont'd] 10 concrete beams were each measured for flexural strength (MPa). The data is as follows. [1] 8.2 8.7 7.8 9.7 7.4 7.8 7.7 11.6 11.3 11.8 The sample mean was `\(9.2\)` MPa and the sample variance was `\(3.0933\)` `\((MPa)^2\)`. At `\(\alpha= 0.01\)`, test the hypothesis that the true mean flexural strength is `\(10\)` MPa using a confidence interval. Steps: >1- `\(H_0: \ \ \mu= 105 \ vs.\quad H_1: \ \ \mu\ne 10\)` >2- `\(\alpha= 0.01\)` > 3- This is two-sided test with `\(n=10\)` and 100 `\((1-\alpha)\)` % CI is `$$(\overline{X}- t_{(n-1, 1-\alpha/2)}\dfrac{s}{\sqrt{n}}, \overline{X}+ t_{(n-1, 1-\alpha/2)}\dfrac{s}{\sqrt{n}})$$` ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ] .right-column[ **Example:**[Breaking strength of wire, cont'd] > 4- Check that the CI is `\((7.393, 11.007)\)`. > 5- Since `\(\mu_0 = 10\)` is within the CI, we **fail** to reject `\(H_0\)`. > 6- There is <span style= "color:red"> not enough evidence</span> to conclude that the true mean flexural strength is different from 10 Mpa. ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ] .right-column[ **Example:**[Paint thickness, cont'd] Consider the following sample of observations on coating thickness for low-viscosity paint. [1] 0.83 0.88 0.88 1.04 1.09 1.12 1.29 1.31 1.48 1.49 1.59 1.62 1.65 1.71 1.76 [16] 1.83 Using `\(\alpha= 0.1\)`, test the hypothesis that the true mean paint thickness is `\(1.00\)` mm. Note, the `\(90\)`\% confidence interval for the true mean paint thickness was calculated from before as `\((1.201,1.499)\)`. >1- `\(H_0: \ \ \mu= 15 \ vs.\quad H_1: \ \ \mu\ne 1\)` >2- `\(\alpha= 0.1\)` > 3- This is two-sided test with `\(n=16\)`, `\(\sigma\)` unknown, so 100 `\((1-\alpha)\)` % CI is `$$(\overline{X}- t_{(n-1, 1-\alpha/2)}\dfrac{s}{\sqrt{n}}, \overline{X}+ t_{(n-1, 1-\alpha/2)}\dfrac{s}{\sqrt{n}})$$` ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ] .right-column[ **Example:**[Breaking strength of wire, cont'd] > 4- The CI is `\((1.201,1.499)\)`. > 5- Since `\(\mu_0 = 1\)` is not in the the CI, we **reject** `\(H_0\)`. > 6- There is <span style= "color:red"> enough evidence</span> to conclude that the true mean paint thickness is not 1mm. ] --- layout: true class: center, middle, inverse --- #Section 6.4 #Inference for matched pairs and two-sample data --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ ### Inference for matched pairs and two-sample data An important type of application of confidence interval estimation and significance testing is when we either have *paired data* or *two-sample* data. ###Recall: Matched pairs Paired data is bivariate responses that consists of several determinations of basically the same characteristics >**Example:** <br> >- Practice SAT scores *before* and *after* a preperation course >- Severity of a disease *before* and *after* a treatment >- Fuel economy of cars *before* and *after* testing new formulations of gasoline ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ ## Inference for matched pairs and two-sample data One simple method of investigating the possibility of a consistent difference between paired data is to >1. Reduce the measurements on each object to a single difference between them >2. Methods of confidence interval estimation and significance testing applied to differences (using Normal or t distributions when appropriate) ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ **Example:**[Fuel economy] Twelve cars were equipped with radial tires and driven over a test course. Then **the same twelve cars** (with the same drivers) were equipped with regular belted tires and driven over the same course. After each run, the cars gas economy (in km/l) was measured. Using significance level `\(\alpha= 0.05\)` and the method of critical values, test for a difference in fuel economy between the radial tires and belted tires. Construct a 95% confidence interval for <span style= "color:red">true mean difference due to tire type.</span> (i.e `\(\mu_d\)`) <table> <tbody> <tr> <td style="text-align:left;"> car </td> <td style="text-align:right;"> 1.0 </td> <td style="text-align:right;"> 2.0 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 4.0 </td> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 6.0 </td> <td style="text-align:right;"> 7.0 </td> <td style="text-align:right;"> 8.0 </td> <td style="text-align:right;"> 9.0 </td> <td style="text-align:right;"> 10.0 </td> <td style="text-align:right;"> 11.0 </td> <td style="text-align:right;"> 12.0 </td> </tr> <tr> <td style="text-align:left;"> radial </td> <td style="text-align:right;"> 4.2 </td> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 6.6 </td> <td style="text-align:right;"> 7.0 </td> <td style="text-align:right;"> 6.7 </td> <td style="text-align:right;"> 4.5 </td> <td style="text-align:right;"> 5.7 </td> <td style="text-align:right;"> 6.0 </td> <td style="text-align:right;"> 7.4 </td> <td style="text-align:right;"> 4.9 </td> <td style="text-align:right;"> 6.1 </td> <td style="text-align:right;"> 5.2 </td> </tr> <tr> <td style="text-align:left;"> belted </td> <td style="text-align:right;"> 4.1 </td> <td style="text-align:right;"> 4.9 </td> <td style="text-align:right;"> 6.2 </td> <td style="text-align:right;"> 6.9 </td> <td style="text-align:right;"> 6.8 </td> <td style="text-align:right;"> 4.4 </td> <td style="text-align:right;"> 5.7 </td> <td style="text-align:right;"> 5.8 </td> <td style="text-align:right;"> 6.9 </td> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 6.0 </td> <td style="text-align:right;"> 4.9 </td> </tr> </tbody> </table> ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ **Example:**[Fuel economy] <table> <tbody> <tr> <td style="text-align:left;"> car </td> <td style="text-align:right;"> 1.0 </td> <td style="text-align:right;"> 2.0 </td> <td style="text-align:right;"> 3.0 </td> <td style="text-align:right;"> 4.0 </td> <td style="text-align:right;"> 5.0 </td> <td style="text-align:right;"> 6.0 </td> <td style="text-align:right;"> 7.0 </td> <td style="text-align:right;"> 8.0 </td> <td style="text-align:right;"> 9.0 </td> <td style="text-align:right;"> 10.0 </td> <td style="text-align:right;"> 11.0 </td> <td style="text-align:right;"> 12.0 </td> </tr> <tr> <td style="text-align:left;"> radial </td> <td style="text-align:right;"> 4.2 </td> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 6.6 </td> <td style="text-align:right;"> 7.0 </td> <td style="text-align:right;"> 6.7 </td> <td style="text-align:right;"> 4.5 </td> <td style="text-align:right;"> 5.7 </td> <td style="text-align:right;"> 6.0 </td> <td style="text-align:right;"> 7.4 </td> <td style="text-align:right;"> 4.9 </td> <td style="text-align:right;"> 6.1 </td> <td style="text-align:right;"> 5.2 </td> </tr> <tr> <td style="text-align:left;"> belted </td> <td style="text-align:right;"> 4.1 </td> <td style="text-align:right;"> 4.9 </td> <td style="text-align:right;"> 6.2 </td> <td style="text-align:right;"> 6.9 </td> <td style="text-align:right;"> 6.8 </td> <td style="text-align:right;"> 4.4 </td> <td style="text-align:right;"> 5.7 </td> <td style="text-align:right;"> 5.8 </td> <td style="text-align:right;"> 6.9 </td> <td style="text-align:right;"> 4.7 </td> <td style="text-align:right;"> 6.0 </td> <td style="text-align:right;"> 4.9 </td> </tr> <tr> <td style="text-align:left;"> d </td> <td style="text-align:right;"> 0.1 </td> <td style="text-align:right;"> -0.2 </td> <td style="text-align:right;"> 0.4 </td> <td style="text-align:right;"> 0.1 </td> <td style="text-align:right;"> -0.1 </td> <td style="text-align:right;"> 0.1 </td> <td style="text-align:right;"> 0.0 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:right;"> 0.5 </td> <td style="text-align:right;"> 0.2 </td> <td style="text-align:right;"> 0.1 </td> <td style="text-align:right;"> 0.3 </td> </tr> </tbody> </table> Since we have paired data, the first thing to do is to find the differences of the paired data. ( `\(d= d_1 - d_2\)`, where `\(d_1\)` is associated with radial and `\(d_2\)` is associated with belted tires.) Then writing down the information available: `$$n= 12,\quad \overline{d}= 0.142,\quad s_d= 0.198$$` `$$\color{red}{\overline{d}= \frac{1}{n}\sum_{i=1}^{n}d_i}, \quad \color{blue}{s^{2}_{d}= \frac{1}{n-1}\sum_{i=1}^{n}(d_{i} - \overline{d})^{2}}$$` Then we just need to apply steps of hypothesis testing. Note that the null hypothesis here is that<span style= "color:red"> **there is no difference** between the gas economy recorded of the two different tires.</span>(i.e `\(\mu_{d}=0\)`) ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ **Example:**[Fuel economy] >1- `\(H_0: \ \ \mu_d= 0 \ vs.\quad H_1: \ \ \mu_d\ne 0\)` >2- `\(\alpha= 0.05\)` > 3- I will use the test statistics `\(K=\frac{\overline{d}-0}{s_d/\sqrt{n}}\)` which has a `\(t_{n-1}\)` distribution assuming that<br> >- `\(H_{0}\)` is true and >- `\(d_{1}, d_{2}, \cdots, d_{12}\)` are iid `\(N(\mu_{d}, \sigma_{d}^{2})\)` ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ **Example:**[Breaking strength of wire, cont'd] > 4- `\(K= \frac{0.421}{0.198/\sqrt{12}}= 2.48 \sim t_{(11, 0.975)}\)`. >\begin{align} p-value &= P(\vert{T}\vert > K)= P(\vert{T}\vert > 2.48)\\\\ &=P(T > 2.48) + P(T <- 2.48)\\\\ > &=1- P(T < 2.48) + P(T <- 2.48)\\\\ >(\color{red}{\text{by software}})&=1- 0.9847+ 0.9694= 0.03 >\end{align} > 5- Since p-value < 0.05, we **reject** `\(H_0\)`. > 6- There is <span style= "color:red"> enough evidence</span> to conclude that fuel economy differs between radial and belted tires. ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ **Example:**[Breaking strength of wire, cont'd] A two-sided 95% confidence interval for the true mean fuel economy **difference** is `\begin{align} (\overline{d}- t_{(n-1, 1-\color{red}{\alpha/2})}\frac{s_d}{\sqrt{n}}\ &,\ \overline{d}+ t_{(n-1, 1-\color{red}{\alpha/2})}\frac{s_d}{\sqrt{n}})\\\\ = (0.142- t_{(11, 0.975)}\frac{0.198}{\sqrt{12}}\ &,\ 0.142+ t_{(11, 0.975)}\frac{0.198}{\sqrt{12}})\\\\ = (0.142- 2.2\frac{0.198}{\sqrt{12}}\ &,\ 0.142+ 2.2\frac{0.198}{\sqrt{12}})\\\\ =(0.0164\ &,\ 0.2764) \end{align}` Note that `\(d= d_1 - d_2\)`, so the interpretation will be: We are 95% confident that the radial tires get between 0.0166 km/l and 0.2674 km/l more in fuel economy than belted tires on average ] --- layout: true class: center, middle, inverse --- # Hang on for a Second ## Let's review slide 58 again --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ **Example:**[Breaking strength of wire, cont'd] >\begin{align} p-value &= P(\vert{T}\vert > K)= P(\vert{T}\vert > 2.48)\\\\ &=P(T > 2.48) + P(T <- 2.48)\\\\ > &=1- P(T < 2.48) + P(T <- 2.48)\\\\ >(\color{red}{\text{by software}})&=1- 0.9847+ 0.9694= 0.03 >\end{align} ] --- layout: true class: center, middle, inverse --- ## We have seen t-student table ## How do we get that p-value using <span style= "color:red"> software</span> !!! --- ## What is happening? --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ >Unlike *standard Normal distribution* table which gives us probability under the standard Normal curve, t tables are quantile tables. >i.e We use the `\(t\)` table (Table B.4 in Vardeman and Jobe) to calculate <span style="color: red">quantiles</span>. >To have exact probabilities, we need software. ![Student's `\(t\)` distribution quantiles.](ch6-andee/images/head_t_quantiles.png) ] --- layout: true class: center, middle, inverse --- ## The approach in calculating p-value when ## t distribution is involved --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ ###Two important points: >P-value and `\(\alpha\)` are both probabilities. (so `\(\in [0, 1]\)` ). >They are areas under the curve in tails under *null hypothesis*. ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ ### For a random variable with `\(\sim t_{(11, 0.975)}\)`: By the t table, the t quantile of `\(t_{(11, 0.975)}\)` is 2.2. <img src="stat305_ch6_part2_files/figure-html/unnamed-chunk-7-1.png" style="display: block; margin: auto;" /> ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ ### For the critical value we calculated under the null hypothesis: The critical value calculated is `\(K= 2.34\)` <img src="stat305_ch6_part2_files/figure-html/unnamed-chunk-8-1.png" style="display: block; margin: auto;" /> ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ ### Both together <img src="stat305_ch6_part2_files/figure-html/unnamed-chunk-9-1.png" style="display: block; margin: auto;" /> We reject the null if p-value < `\(\alpha\)`. >Remember p-value and `\(\alpha\)` are areas under the curve ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ **Example:**[End-cut router] Consider the operation of an end-cut router in the manufacture of a company's wood product. Both a leading-edge and a trailing-edge measurement were made on each wooden piece to come off the router. Is the leading-edge measurement different from the trailing-edge measurement for a typical wood piece? Do a hypothesis test at `\(\alpha= 0.05\)` to find out. Make a two-sided 95% confidence interval for the true mean of the difference between the measurements. <table> <tbody> <tr> <td style="text-align:left;"> piece </td> <td style="text-align:right;"> 1.000 </td> <td style="text-align:right;"> 2.000 </td> <td style="text-align:right;"> 3.000 </td> <td style="text-align:right;"> 4.000 </td> <td style="text-align:right;"> 5.000 </td> </tr> <tr> <td style="text-align:left;"> leading_edge </td> <td style="text-align:right;"> 0.168 </td> <td style="text-align:right;"> 0.170 </td> <td style="text-align:right;"> 0.165 </td> <td style="text-align:right;"> 0.165 </td> <td style="text-align:right;"> 0.170 </td> </tr> <tr> <td style="text-align:left;"> trailing_edge </td> <td style="text-align:right;"> 0.169 </td> <td style="text-align:right;"> 0.168 </td> <td style="text-align:right;"> 0.168 </td> <td style="text-align:right;"> 0.168 </td> <td style="text-align:right;"> 0.169 </td> </tr> </tbody> </table> ] --- layout:true class: center, middle, inverse --- #Two-Sample Data --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ ### Two-sample data Paired differences provide inference methods of a special kind for comparison. Methods that can be used to compare two means where two different *unrelated* samples will be discussed next. >SAT score of high school A vs. high school B >Severity of a disease in men vs. women >Height of Liverpool soccerr players vs. Man United soccer players >Fuel economy of gas formula type A vs. formula type B ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ ### Two-sample data ####Notations: ] --- layout: true class: center, middle, inverse --- ##Large Samples --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ #### Large samples `\((n_1 \ge 25, n_2 \ge 25)\)` The difference in sample means `\(\overline{x}_1 - \overline{x}_2\)` is a natural statistic to use in comparing `\(\mu_1\)` and `\(\mu_2\)`. i.e $$\text{E}(\overline{X}_1)= \mu_1\ \ \text{E}(\overline{X}_2)= \mu_2\ \ \text{Var}(\overline{X}_1)= \frac{\sigma_1^2}{n_1}\ \ \text{Var}(\overline{X}_2)=\frac{\sigma_2^2}{n_2} $$ If `\(\sigma_1\)` and `\(\sigma_2\)` are **known**, then we have `\(\text{E}(\overline{X}_1 - \overline{X}_2) = \text{E}(\overline{X}_1) - \text{E}(\overline{X}_2)= \mu_1- \mu_2\)` `\(\text{Var}(\overline{X}_1 - \overline{X}_2) = \text{Var}(\overline{X}_1) + \text{Var}(\overline{X}_2) = \frac{\sigma_1^2}{n_1}+ \frac{\sigma_2^2}{n_2}\)` ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ #### Large samples `\((n_1 \ge 25, n_2 \ge 25)\)` If, in addition, `\(n_1\)` and `\(n_2\)` are large, `\(\overline{X}_1\sim N(\mu_1, \frac{\sigma_1^2}{n_1})\)` is independent of `\(\overline{X}_2\sim N(\mu_2, \frac{\sigma_2^2}{n_2})\)` (by CLT). So that `\(\overline{X}_1 - \overline{X}_2\)` is <span style="color: red"> approximately Normal</span> (trust me) `$$Z= \frac{\overline{X}_1 - \overline{X}_2 - (\mu_1 - \mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}} \sim N(0 , 1)$$` ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ #### Large samples `\((n_1 \ge 25, n_2 \ge 25)\)` So, if we want to test `\(\text{H}_0: \mu_1 - \mu_2 = \#\)` with some alternative hypothesis, `\(\sigma_1\)` and `\(\sigma_2\)` are known, and `\(n_1 \ge 25, n_2 \ge 25\)`, then we use the statistic `\(K =\frac{\overline{X}_1 - \overline{X}_2 - (\#)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}\)` which has a `\(N(0,1)\)` distribution if >1. `\(\text{H}_0\)` is true >2. The sample 1 points are iid with mean `\(\mu_1\)` and variance `\(\sigma^2_1\)`, and the sample 2 points are iid with mean `\(\mu_2\)` and variance `\(\sigma^2_2\)`. > 3.Sample I is independent of sample II ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ #### Large samples `\((n_1 \ge 25, n_2 \ge 25)\)` The confidence intervals (2-sided, 1-sided upper, and 1-sided lower, respectively) for `\(\mu_1-\mu_2\)` are: - *Two-sided* `\(100(1-\alpha)\%\)` confidence interval for `\(\mu_1 - \mu_2\)` `$$\color{red}{(\overline{x}_1-\overline{x}_2)\pm z_{1-\alpha/2} * \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}$$` - *One-sided * `\(100(1-\alpha)\%\)` confidence interval for `\(\mu_1 - \mu_2\)` with a upper confidence bound `$$\color{red}{(-\infty\ ,\ (\overline{x}_1-\overline{x}_2)\pm z_{1-\color{blue}{\alpha}} * \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}})}$$` - *One-sided* `\(100(1-\alpha)\%\)` confidence interval for `\(\mu\)` with a lower confidence bound `$$\color{red}{((\overline{x}_1-\overline{x}_2)\pm z_{1-\color{blue}{\alpha}} * \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}})\ ,\ +\infty)}$$` ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ #### Large samples `\((n_1 \ge 25, n_2 \ge 25)\)` If `\(\sigma_1\)` and `\(\sigma_2\)` are **unknown**, and `\(n_1 \ge 25, n_2 \ge 25\)`, then we use the statistic `\(K =\frac{\overline{X}_1 - \overline{X}_2 - (\#)}{\sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}}}\)` and confidence intervals (2-sided, 1-sided upper, and 1-sided lower, respectively) for `\(\mu_1-\mu_2\)`: - *Two-sided* `\(100(1-\alpha)\%\)` confidence interval for `\(\mu_1 - \mu_2\)` `$$\color{red}{(\overline{x}_1-\overline{x}_2)\pm z_{1-\alpha/2} * \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$$` ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ #### Large samples `\((n_1 \ge 25, n_2 \ge 25)\)` - *One-sided * `\(100(1-\alpha)\%\)` confidence interval for `\(\mu_1 - \mu_2\)` with a upper confidence bound `$$\color{red}{(-\infty\ ,\ (\overline{x}_1-\overline{x}_2)\pm z_{1-\color{blue}{\alpha}} * \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}})}$$` - *One-sided* `\(100(1-\alpha)\%\)` confidence interval for `\(\mu\)` with a lower confidence bound `$$\color{red}{((\overline{x}_1-\overline{x}_2)\pm z_{1-\color{blue}{\alpha}} * \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}})\ ,\ +\infty)}$$` ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ **Example:**[Anchor bolts] An experiment carried out to study various characteristics of anchor bolts resulted in 78 observations on shear strength (kip) of 3/8-in. diameter bolts and 88 observations on strength of 1/2-in. diameter bolts. Let Sample 1 be the 1/2 in diameter bolts and Sample 2 be the 3/8 indiameter bolts. Using a significance level of `\(\alpha= 0.01\)`, find out if the 1/2 in bolts are more than 2 kip stronger (in shear strength) than the 3/8 in bolts. Calculate and interpret the appropriate 99% confidence interval to support the analysis. - `\(n_1 =88, n_2 =78\)` - `\(\overline{x}_1 = 7.14, \overline{x}_2 =4.25\)` - `\(s_1 =1.68,s_2 =1.3\)` ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ **Example:**[Anchor bolts] - `\(n_1 =88, n_2 =78\)` - `\(\overline{x}_1 = 7.14, \overline{x}_2 =4.25\)` - `\(s_1 =1.68,s_2 =1.3\)` ] --- layout: true class: center, middle, inverse --- ##Small Samples --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ #### Small samples If `\(n_1 < 25\)` or `\(n_2 < 25\)`, then we need some **other assumptions** to hold in order to complete inference on two-sample data. >We need two independent samples to be iid Normally distributed and `\(\sigma_1^2 \approx \sigma_2^2\)` A test statistic to test `\(\text{H}_0:\mu_1-\mu_2= \#\)` against some alternative is `$$K =\frac{\overline{X}_1 - \overline{X}_2 - (\#)}{S_p\sqrt{ (\frac{1}{n_1} + \frac{1}{n_2}})}$$` where `\(S_p^2\)` is called **pooled sample variance** and is defined as $$S_p^2= \frac{(n_1- 1)S_1^2 + (n_2- 1)S_2^2}{n_1+ n_2- 2} $$ ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ #### Small samples Also assuming - `\(\text{H}_0\)` is true, - The sample 1 points are iid `\(N(\mu_1,\sigma^2_1)\)`, the sample 2 points are iid `\(N(\mu_2,\sigma^2_2)\)`, - and the sample 1 points are independent of the sample 2 points and `\(\sigma_1^2 \approx \sigma_2^2\)`. Then `$$K =\frac{\overline{X}_1 - \overline{X}_2 - (\#)}{S_p\sqrt{ (\frac{1}{n_1} + \frac{1}{n_2}})} \sim t_{(n_1 + n_2 -2)}$$` ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ #### Small samples `\(1-\alpha\)` confidence intervals (2-sided, 1-sided upper, and 1-sided lower, respectively) for `\(\mu_1-\mu_2\)` under these assumptions are of the form: (let `\(\nu= n_1 + n_2 - 2\)`) - *Two-sided* `\(100(1-\alpha)\%\)` confidence interval for `\(\mu_1 - \mu_2\)` `$$\color{red}{(\overline{x}_1-\overline{x}_2)\pm t_{(\nu, 1-\alpha/2)} * S_p\sqrt{ (\frac{1}{n_1} + \frac{1}{n_2}})}$$` - *One-sided * `\(100(1-\alpha)\%\)` confidence interval for `\(\mu_1 - \mu_2\)` with a upper confidence bound `$$\color{red}{(-\infty\ ,\ (\overline{x}_1-\overline{x}_2)+ t_{(\nu, 1-\color{blue}{\alpha})} * S_p\sqrt{ (\frac{1}{n_1} + \frac{1}{n_2}})}$$` ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ #### Small samples - *One-sided* `\(100(1-\alpha)\%\)` confidence interval for `\(\mu\)` with a lower confidence bound `$$\color{red}{((\overline{x}_1-\overline{x}_2) - t_{(\nu, 1-\color{blue}{\alpha})} * S_p\sqrt{ (\frac{1}{n_1} + \frac{1}{n_2}})\ ,\ +\infty)}$$` ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ #### Small samples **Example:**[Springs] The data of W. Armstrong on spring lifetimes (appearing in the book by Cox and Oakes) not only concern spring longevity at a 950 N/ `\(\text{mm}^2\)` stress level but also longevity at a 900 N/ `\(\text{mm}^2\)` stress level. Let sample 1 be the 900 N/ `\(\text{mm}^2\)` stress group and sample 2 be the 950 N/ `\(\text{mm}^2\)` stress group. <table> <thead> <tr> <th style="text-align:left;"> 900 N/mm2 Stress </th> <th style="text-align:left;"> 950 N/mm2 Stress </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 216, 162, 153, 216, 225, 216, 306, 225, 243, 189 </td> <td style="text-align:left;"> 225, 171, 198, 189, 189, 135, 162, 135, 117, 162 </td> </tr> </tbody> </table> ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ #### Small samples **Example:**[Springs] <img src="stat305_ch6_part2_files/figure-html/unnamed-chunk-12-1.png" style="display: block; margin: auto;" /> Let's do a hypothesis test to see if the sample 1 springs lasted significantly longer than the sample 2 springs. ] --- layout:false .left-column[ ###Hypothesis Testing ###Null ###Alternative ###P-value ###CI method ###Matched Pairs ###Two-sample ] .right-column[ #### Small samples **Example:**[Stopping distance] Suppose `\(\mu_1\)` and `\(\mu_2\)` are true mean stopping distances (in meters) at 50 mph for cars of a certain type equipped with two different types of breaking systems. Suppose `\(n_1=n_2= 6\)`, `\(\overline{x}_1= 115.7\)`, `\(\overline{x}_2= 129.3\)`, `\(s_1=5.08\)`, and `\(s_2= 5.38\)`. Use significance level `\(\alpha = 0.01\)` to test `\(\text{H}_0: \mu_1-\mu_2 =-10\)` vs. `\(\text{H}_A:\mu_1-\mu_2 < -10\)`. Construct a 2-sided 99% confidence interval for the true difference in stopping distances. ]