Tuesday, May 12, 2009

Validity of Measurement

The term “valid” means that a question or a scale measures what it is intended to measure. Validity is confidence in measures and design. Physical measurements such as height and weight can be measured reliably (and they are also valid measures of how tall or heavy someone is), but they may not relate in any various artistic or athletic achievements. Therefore, they are not valid measurements for these purpose or objectives. We need to ensure that respondent ratings of various kinds as well as reliable. We normally do not know the true score of object with respect to a given characteristic.

In psychological and educational testing, “Validity refers to the degree to which evidence and theory support the interpretations of test scores entailed by proposed uses of tests”. Although classical models divided the concept into various "validities," such as content validity, criterion validity, and construct validity, the modern view is that validity is a single unitary construct.

Validity also extends to:
Internal Validity: Precision in the design of the study – ability to isolate causal agents while controlling other factors
External Validity: Ability to generalized from the unique and idiosyncratic settings, procedures and participants to other populations and conditions

Type of Validity
Unfortunately, the concept of validity is not a simple one, because there are several possible meanings of this term. The most common ones include the following:

1. Concurrent validity:
accurate measurement of the current condition or state. Most physical measuring instruments have excellent concurrent validity (eg: thermometers, weighting scales, oil dipsticks). Most tests of mental ability also have this type of validity, but to a lesser extent.
In concurrent validity, we assess the operationalization's ability to distinguish between groups that it should theoretically be able to distinguish between. For example, if we come up with a way of assessing manic-depression, our measure should be able to distinguish between people who are diagnosed manic-depression and those diagnosed paranoid schizophrenic. If we want to assess the concurrent validity of a new measure of empowerment, we might give the measure to both migrant farm workers and to the farm owners, theorizing that our measure should show that the farm owners are higher in empowerment. As in any discriminating test, the results are more powerful if you are able to show that you can discriminate between two groups that are very similar.

2. Predictive validity: accurate prediction of some future events, such as success in academic or athletic achievement or future customer purchase or loyalty.
In predictive validity, we assess the operationalization's ability to predict something it should theoretically be able to predict. For instance, we might theorize that a measure of math ability should be able to predict how well a person will do in an engineering-based profession. We could give our measure to experienced engineers and see if there is a high correlation between scores on the measure and their salaries as engineers. A high correlation would provide evidence for predictive validity -- it would show that our measure can correctly predict something that we theoretically think it should be able to predict.

3. Face validity: the measuring scales appear to measure what they are intended to measure.
In face validity, you look at the operationalization and see whether "on its face" it seems like a good translation of the construct. This is probably the weakest way to try to demonstrate construct validity. For instance, you might look at a measure of math ability, read through the questions, and decide that yep, it seems like this is a good measure of math ability (i.e., the label "math ability" seems appropriate for this measure). Or, you might observe a teenage pregnancy prevention program and conclude that, "Yep, this is indeed a teenage pregnancy prevention program." Of course, if this is all you do to assess face validity, it would clearly be weak evidence because it is essentially a subjective judgment call. (Note that just because it is weak evidence doesn't mean that it is wrong. We need to rely on our subjective judgment throughout the research process. It's just that this form of judgment won't be very convincing to others.) We can improve the quality of face validity assessment considerably by making it more systematic. For instance, if you are trying to assess the face validity of a math ability measure, it would be more convincing if you sent the test to a carefully selected sample of experts on math ability testing and they all reported back with the judgment that your measure appears to be a good measure of math ability.

4. Construct validity: accurate measurement of some basic underlying idea or concept.
For example, customer satisfaction measurement involves all four types of validity. Ratings must accurately reflect how customers now feel about various aspects of the company (concurrent validity). They should also enable us to predict future loyalty and customer retention (predictive validity). They must appear to measure factors or aspects that are meaningful and important to company management and employees (face validity). And they should accurately reflect the basic idea of satisfaction (construct validity). This is a tall older, and it means that careful analysis should be done in the early stages of constructing an instrument for measuring costumer satisfaction on an ongoing basis.

Different from above type, in Marketing Research, Methodological Foundations, 5th edition, The Dryden Press International Edition, author Gilbert A. Churchill, Jr, validity measurement divide in three types:
1. Pragmatic Validity
2. Content Validity
3. Construct Validity

It is worth repeating the earlier caveat that a measuring instrument cannot be valid unless it is reliable. More specifically, reliability puts an upper limit on validity. Loosely speaking, if a measuring scale is only about 50% reliable, it can be only about 50% valid. This is why it is so important for companies to test their own questionnaires periodically and to make every effort to improve both reliability and validity of ratings gathered for any particular purpose.

In spite of importance of both reliability and validity, it is likely that even today most market research studies proceed without giving a thought to either of these topics. This may due to ignorance, time pressures, budgetary constraints, or other factors. Yet management usually just assumes that the data presented are both accurate and relevant in terms of the project objective. It is the responsibility of research managers and outside suppliers to ensure that, whenever possible, evidence is gathered to support the integrity of research findings.

Example of validity
Many recreational activities of high school students involve driving cars. A researcher, wanting to measure whether recreational activities have a negative effect on grade point average in high school students, might conduct a survey asking how many students drive to school and then attempt to find a correlation between these two factors. Because many students might use their cars for purposes other than or in addition to recreation (e.g., driving to work after school, driving to school rather than walking or taking a bus), this research study might prove invalid. Even if a strong correlation was found between driving and grade point average, driving to school in and of itself would seem to be an invalid measure of recreational activity.


Source:
-. Managerial Application of Multivariate – Analysis in Marketing, James H.. Myers and Gary M. Mullet, 2003, American Marketing Association, Chicago
-. Marketing Research, Methodological Foundations, 5th edition, The Dryden Press International Edition, author Gilbert A. Churchill, Jr.
-. http://www.colostate.edu/
-. http://www.wikipedia.edu/
-. http://www.socialresearchmethods.net

No comments: