Monday, June 15, 2009

Test-Retest Reliability

What is Test-Retest Reliability ?

1. Test-retest is a statistical method used to examine how reliable a test is: A test is performed twice,
e.g., the same test is given to a group of subjects at two different times. Each subject should score different than the other subjects, but if the test is reliable then each subject should score the same in both test. (Valentin Rousson, Theo Gasser, and Burkhardt Seifert, (2002) "Assessing intrarater, interrater and test–retest reliability of continuous measurements," Statistics in Medicine 21:3431-3446).

2. A measure of the ability of a psychologic testing instrument to yield the same result for a single point at 2 different test periods, which are closely spaced so that any variation detected reflects reliability of the instrument rather than changes in status.

3. The test-retest reliability of a survey instrument, like a psychological test, is estimated by performing the same survey with the same respondents at different moments of time. The closer the results, the greater the test-retest reliability of the survey instrument. The correlation coefficient between such two sets of responses is often used as a quantitative measure of the test-retest reliability. (

4. Because a scale is considered reliable if it consistently produces the same measurement for a given amount or type of a response, one obvious way to assess reliability is to take two or more measures at different points in time using the same respondents. This is known as test-retest reliability. These measures must be taken using exactly the same measuring instrument an under conditions that are as similar as possible. Reliability is usually measured in terms of correlation coefficient between the first and second measures or among all measures if more than two are taken. The higher the correlation, the more similar the measurements are and therefore the greater is the test-retest reliability.


1. A group of respondents is tested for IQ scores: each respondent is tested twice - the two tests are, say, a month apart. Then, the correlation coefficient between two sets of IQ-scores is a reasonable measure of the test-retest reliability of this test. In the ideal case, both scores coincide for each respondent and, hence, the correlation coefficient is 1.0. In reality, correlation coefficient is 1.0 is almost never the case - the scores produced by a respondent would vary if the test were carried out several times. Normally, values of the correlation 0.7...0.8 are considered as satisfactory or good.

2. Various questions for a personality test are tried out with a class of students over several years. This helps the researcher determine those questions and combinations that have better reliability.

3. In the development of national school tests, a class of children are given several tests that are intended to assess the same abilities. A week and a month later, they are given the same tests. With allowances for learning, the variation in the test and retest results are used to assess which tests have better test-retest reliability.

The test-retest reliability is the most popular indicator of survey reliability. A shortcoming of the test-retest reliability is that the "practice effect" - respondents "learn" to answer the same questions in the first test and this affects their responses in the next test. For example, the IQ-scores may tend to be higher in the next test.

Reliability can vary with the many factors that affect how a person responds to the test, including their mood, interruptions, time of day, etc. A good test will largely cope with such factors and give relatively little variation. An unreliable test is highly sensitive to such factors and will give widely varying results, even if the person re-takes the same test half an hour later.

This method is particularly used in experiments that use a no-treatment control group that is measure pre-test and post-test.

We estimate test-retest reliability when we administer the same test to the same sample on two different occasions. This approach assumes that there is no substantial change in the construct being measured between the two occasions. The amount of time allowed between measures is critical. We know that if we measure the same thing twice that the correlation between the two observations will depend in part by how much time elapses between the two measurement occasions. The shorter the time gap, the higher the correlation; the longer the time gap, the lower the correlation. This is because the two observations are related over time -- the closer in time we get the more similar the factors that contribute to error. Since this correlation is the test-retest estimate of reliability, you can obtain considerably different estimates depending on the interval.

-. Managerial Application of Multivariate – Analysis in Marketing, James H.. Myers and Gary M. Mullet, 2003, American Marketing Association, Chicago

No comments: