Monday, October 19, 2009

Hypothesis Testing

The objective of statistics is to make inferences about unknown population parameters based on information contained in sample data. These inferences are phrased in two ways, as estimates of the respective parameters or as test of hypotheses about their values.

In many ways the formal procedure for hypothesis testing is similar to the scientific method. The scientist observes nature, formulate a theory, and then tests this theory against observations. The scientist poses a theory concerning one or more population parameters – that they equal specified values, then samples the population and compares observation with theory. If the observations disagree with the theory, the scientist rejects the hypothesis. If not, the scientist concludes either that the theory is true or that sample did not detect the difference between the real and hypothesized values of the population parameters.

Hypothesis tests are conducted in all fields in which theory can be tested against observation. Hypotheses can be subjected to statistical verification by comparing the hypotheses with observed sample data.

The objective of a statistical test is to test a hypothesis concerning the values of one or more population, called research hypothesis. For example, suppose that a political candidate, Jones, claims that he will gain more than 50% of votes in a city election and thereby emerge as the winner. If we don’t believe Jones’s claim, we might seek to support the research hypothesis that Jones is not favored by more than 50% of the electorate. For this research hypothesis, also called the alternative hypothesis, is obtaining by showing (using sample data as evidence) that the converse of the alternative hypothesis, the null hypothesis, is false. Thus support for one theory is obtained by showing lack of support for its converse, in a sense a proof by contradiction. Since we seek support for the alternative hypothesis that Jones’s claim is false, our alternative hypothesis is that p, the probability of selecting a voter favoring Jones, is less than .5. If we can show that the data support the rejection of the null hypothesis, p-.5 (the minimum value needed for a plurality), in favor of the alternative hypothesis, p<.5, we have achieved our research objective. Although it is common to speak of testing a null hypothesis, keep in mind that the research objective is usually to show support for the alternative hypothesis.

The element of a statistical test:
1. Null hypothesis, Ho
2. Alternative hypothesis, H1
3. Test statistics
4. Rejection region

The functioning parts of statistical test are the test statistic and associated rejected region. The test statistic is a function of sample measurement upon which statistical decision will be based. The rejection region specifies the value of the test statistic for which the null hypothesis is rejected. If for a particular sample the computed value of the test statistic fall in the rejection region, we reject the null hypothesis H0 and accept the alternative hypothesis H1. If the value of the test statistic does not fall into the rejection region, we accept H0.

Decision must often be made based on sample data. The statistical procedures that guide the decision making process are known as test of hypotheses. Sample observations of characteristic under consideration are made and descriptive statistics are calculated. These sample statistics are then analyzed and question is answered based on the results of the analysis. Because the data used to answer the questions are sample data, there is always chance that answer will be wrong. If the sample is not truly representative of the population from which it was taken, the type I and type II errors can occur. Thus, when a test of hypothesis is performed, it is essential that the confidence level – the probability that the statement is correct – be stated.

1. Stating the Hypothesis
When tests of hypothesis are to be used to answer questions, the first step is to state what is to be proved.

The statement that is to be proved is known as the null hypothesis or H0

A second hypothesis inconsistent with the null hypothesis is called alternative hypothesis or H1
Statement is what the data analysis will attempt to prove or disprove. If analysis shows that the statement is true, fine. But if the analysis indicates that the statement is not true, a fallback position is needed.

It is strongly recommended that the null hypothesis always be stated as an equality. Although this isn’t necessary for statistical purposes, it does make later analysis much easier. The alternative hypothesis is then expressed either as the direction (less than or greater than) inequality or as a non directional inequality. The wording of the initial question determines the nature of the inequality used in the statement of the alternative hypothesis. A question involving “better than”, “faster than”, ”stronger than”, or similar terminology would require a directional inequality. The phrase “same as” or “not any different than” would imply a non directional inequality. The statement of the alternative hypothesis must be consistent with the observed sample data.

When the alternative hypothesis is stated as a directional inequality the procedure is called a one tailed test of hypothesis.

A non directional inequality in the alternative hypothesis signifies a two tailed test of hypothesis.

2. Specifying the Confidence Level
After both the null and the alternative hypotheses have been stated, the second step is to specify the confidence level. Usually the selection is arbitrary. However, there may be organizational guidelines that specify the confidence level. Common confidence level are 90 percent, 95 percent, and 99 percent. A brief statement or an equation defining the confidence level in terms of alpha is usually sufficient; for example, the notation alpha = 0.05 might appear after the hypothesis. This would designate 95 percent confidence.

3. Collecting Sample Data
The third step in testing hypothesis is the collection of sample data. After the null hypothesis has been identified – the equality of means, proportions, standard deviations, or whatever - the nature of the required data can be specified. The data must then be collected, and the appropriate sample descriptive statistics must be calculated.

4. Calculating Test Statistics
After sample test statistics have been calculated, the appropriate test statistics must be calculated. There are many test statistics that may be calculated. The specific test statistic used will depend on the nature of the null and alternative hypotheses.

5. Identifying Table Statistics or Using P-value
After the test statistics is calculated, the table statistic is determined. The nature of the alternative hypothesis, the sample size, and the specific statistic being tested will determine which of the standard distribution tables, such as the normal curve, student-t, or chi-square, should be used.

6. Decision Making
The following rule will govern all of the decision, provided common sense is applied.
-. If the absolute value of the test statistic is less than or equal to the table statistics or if p-value greater than alpha, then there is not sufficient evidence to reject the null hypothesis or – the null hypothesis is accepted as being true.
-. If the absolute value of the test statistic is greater than the table statistics or if p-value less than alpha, then there is sufficient evidence to reject the null hypothesis as being true – this would imply that the alternative hypothesis must be true.

-. Mathematical Statistics with Application, William Mendenhall, Richard L. Sceaffer, Dennis D. Wackerly,
-. Fundamentals of Industrial Quality Control, 3rd edition, Lawrence S. AFT, St. Lucie Press, London, 1998

1 comment:

Leslie Lim said...

This is a great website, so many people need this information, thanks for providing it. I love your color scheme too!