Monday, October 19, 2009

Statistical Significance versus Statistical Power

This title based on sub chapter in Multivariate book (Multivariate Data Analysis Joseph F. Hair, Jr.; William C. Black; Barry J. Babin; Rolph E. Anderson; Ronald L. Tatham, Person Education International, 2006, Singapore)

A census of the entire population makes statistical inference unnecessary, because any difference or relationship, however small, is “true” and does exist. Rarely, if ever, is a census conducted, however. Therefore, the researcher id forced to draw inferences from a sample.

Types of Statistical Error and Statistical Power

Power: probability of correctly rejecting the null hypothesis when it false, that is, correctly finding a hypothesized relationship when it exist.

Determined as a function of:
1. The statistical significance level set by the researcher for type I error (alpha)
2. The sample sized used in the analysis
3. The effect size being examined

Interpreting statistical inferences requires that the researcher specify the acceptable levels of statistical error due to using sample (known as sampling error). The most common approach is to specify the level of type I error, also known as alpha. The type I error is the probability of rejecting the null hypothesis when actually true, or in simple terms, the chance of the test showing statistical significance when it actually is not present – the case of a “false positive”. By specifying an alpha level, the researcher sets the allowable limits for error and indicates the probability of concluding that significance exists when it really does not.

When specifying the level of type I error, the researcher also determines an associated error, termed the type II error or beta. The type II error is the probability of failing to reject the null hypothesis when actually false. An even more interesting probability is 1 – beta, termed the power of the statistical inference test. Power is probability of correctly rejecting the null hypothesis when it should be rejected. Thus power is the probability that statistical significance will be indicated if it is present. The relationship of the different error probabilities in the hypothetical setting of testing for the difference in two means is shown here:

Although specifying alpha establishes the level of acceptable statistical significance, it is the level of power that dictates the probability of success in finding the differences if they actually exist. Then why not set both alpha and beta at acceptable levels ? Because the type I and type II error are inversely related, and as the type I error becomes more restrictive (moves closer to zero), the probability of a type II error increases. Reducing the type I errors therefore reduces the power of the statistical test. Thus, the researcher must strike a balance between the level of alpha and resulting power.

Impact on Statistical Power
But why can’t high levels of power always be achieved ? Power is not solely a function of alpha. It is actually determined by three factors:

Effect size: The probability of achieving statistical significance is based not only on statistical considerations but also on the actual magnitude of the effect of interest (e.g., a difference of means between two groups or the correlation between variables) in the population, termed the effect size. As one would expect, a larger effect is more likely to be found than a smaller effect, and thus more likely to impact the power of the statistical test. To assess the power of any statistical test, the researcher must first understand the effect being examined. Effect size are defined in standardized terms for ease of comparison. Mean differences are stated in terms of standard deviations, so that an effect size of .5 indicates that the mean difference is one-half of a standard deviation. For correlations, the effect size is based on the actual correlation between the variables

2. Alpha: As note earlier, as alpha becomes more restrictive, power decreases. Therefore, as the researcher reduces the chance of incorrectly saying an effect is significant when it is not, the probability of correctly finding an effect also decreases. Conventional guidelines suggest alpha levels of .05 or .01. The researcher must consider the impact of this decision on the power before selecting the alpha, however.

3. Sample size: At any given alpha level, increased sample sizes always produce greater power of statistical test. A potential problem then becomes too much power. By “too much” mean that by increasing sample size, smaller and smaller effects will be found to be statistically significant, until at very large sample sizes almost any effect is significant. The researcher must always be aware that sample size can affect the statistical test by either making it insensitive (at small sample sizes) or overly sensitive (at very large sample sizes).

The relationship among alpha, sample size, effect size, and power are quite complicated, and a number of sources of guidance are available. To achieve such power levels, all three factors – alpha, sample size and effect size – must be considered simultaneously.