Task 3. Who is right?
What is validity? How can you measure validity? What types of validity are there?
Kaplan, R. M., & Saccuzzo, D. P. (2018). Psychological testing: principles, applications, &
issues (Ninth). Cengage Learning. Chapter 5: Validity
Validity is the evidence that a test has a specific meaning. The meaning of a test is defined by
specific evidence acquired by specific methods. Not only must there be evidence that a test
has meaning in general, but also there must be evidence that it has validity for the particular
situation in which it is applied. This evidence gives the test its validity. Evidence for validity
comes from showing the association between the test and other variables.
Defining validity
Validity can be defined as the agreement between a test score or measure and the quality it
is believed to measure. Validity is sometimes defined as the answer to the question, "Does
the test measure what it is supposed to measure?".
The booklet The Standards for Educational and Psychological Testing consists of three
sections: foundations (basic psychometric concepts such as validity and reliability),
operations (how tests are designed, built, administered, scored and reported; reviews
standards for test manuals and documentation) and applications (many issues, for example:
administration and interpretation training).
Validity is the evidence for inferences made about a test score. There are three types of
evidence: (1) construct related, (2) criterion related, and (3) content related. Validity is a
unitary concept, that represents all of the evidence that supports the intended
interpretation of a measure.
Aspects of validity
Face validity
Face validity is technically not a form of validity, but is commonly used in testing literature.
Face validity is the mere appearance that a measure has validity. It is often used when items
seem to be reasonably related to the perceived purpose. Face validity is not validity, because
it does not offer evidence to support conclusions drawn from test scores. However, in many
settings it is crucial to have a test that “looks like” it is valid. These appearances can help
motivate test takers because they can see that the test is relevant.
The person who takes the test has to see that it tests what it is supposed to test. The test
doesn’t really have to be valid. But it seems like it.
Content-related evidence for validity
Content-related evidence for validity considers the adequacy of representation of the
conceptual domain the test is designed to cover. Content-related evidence of validity is
provided by the correspondence between the items on the test and the information in that
it should measure. Traditionally, content validity evidence has been of greatest concern in
educational testing and more recently in tests developed for medical settings. Because the
boundaries between content and other types of evidence for validity are not clearly defined,
we no longer think of content validity evidence as something separate from other types of
validity evidence. However, content validity evidence has unique features (e.g. it is more
,logical instead of statistical). In looking for content validity evidence, we attempt to
determine whether a test has been constructed adequately. Establishing content validity
evidence for a test requires good logic, intuitive skills, and determination. The content of the
items must be carefully evaluated (e.g. wording, reading level). Determination of content
validity evidence is often made by multiple expert judgments or statistical methods such as
factor analysis.
Two concepts are relevant to content validity evidence: (1) construct underrepresentation is
the failure to capture important components of a constructs. Construct-irrelevant variance
occurs when scores are influenced by factors irrelevant to the construct. Often a test reflects
many factors besides what it is supposed to measure, you should take all factors into
account to make accurate generalization about the meaning of the test score.
Example: make sure that all the aspects that are important for the course, are measured. It is
not good enough to ask about one single aspects. You need to measure every aspect that is
covered in that specific course.
Also, if you want to assess a biology test, and the wording is too hard, it is also measuring
your reading skills, and that’s a different content.
Criterion-related evidence for validity
Criterion-related evidence for validity tells us how well a test corresponds with a particular
criterion. Such evidence is provided by high correlations between a test and a well-defined
criterion measure. A criterion is the standard against which the test is compared. The reason
for gathering criterion validity evidence is that the test or measure is to serve as a "stand-in"
for the measure we are really interested in.
- Predictive validity evidence is the forecasting function of tests, which is a form of
criterion validity. The test itself is the predictor variable, and the outcome is the
criterion. The purpose of the test is to predict the likelihood of succeeding on the
criterion. Many tests do not have very good prediction records.
Correcting for restricted range boosts the correlations significantly. Restricted range
occurs when not the full range of outcomes is tested by only selecting an extreme part of
the population. This problem can reduce the level of observed correlation.
Predictive validity evidence applies to medical and psychological measures. To evaluate
information from tests, you need to consider the relationship between the test and the
criterion. The low correlation between cholesterol tests and heart disease suggests that
we cannot say precisely which specific individuals will benefit. However, the small but
significant statistical relationship tells us that there is some important predictive value in
cholesterol tests.
Example: admission test: test how high school students will do in school. How will they
do it later? In the future?
Concurrent criterion validity is another type of criterion validity. Concurrent-related
evidence for validity comes from assessments of the simultaneous relationship between
the test and the criterion. Here the measures and criterion measures are taken at the
same time, because the test is designed to explain why something is happening at that
time. This can give diagnostic information that can guide the development of
individualized help. Concurrent evidence for validity applies when the test and the
criterion can be measured at the same time. Job samples are an example of concurrent
validity evidence: test potential employees on a sample of behaviors that represent the
, tasks to be required of them. Because these samples were shown to correlate well with
performance on the job, the samples alone could be used for the selection and screening
of applicants. Employers must demonstrate that tasks used to test potential new
employees relate to actual job performance. It requires good scientific evidence that a
test used to screen employees is valid in terms of how job candidates will perform if
employed is required. This is also done in mental health measures. The first step in
establishing the validity of a psychological measure is to demonstrate that it is related to
other measures designed to assess the same construct (high correlation). In addition,
reports of the experience of specific symptoms on the questionnaire were systematically
related to clinicians’ judgments of individual symptoms for the same patients. You can
also interview relatives of patients with the diagnostic interview schedule. Discriminant
evidence for validity can show the advantage of a measure over other measures.
Another use of concurrent validity evidence is when someone does not know how
he/she will respond to the criterion measure. For example, not knowing what job you
want. Then a test uses criteria patterns of interest among people who are satisfied with
their careers.
To test the relationship between test and criterion at the same time. If employees are
good enough to do a job. They have to do certain tasks that will represent what they
have to do in the job. They ask: can they do the job now.
- Validity coefficient this is the correlation that shows the relationship between a test
and a criterion. It shows the extent to which the test is valid for making statements
about the criterion. There are no rules about how large the coefficient must be to be
meaningful. In practice, it is rarely larger than .60. coefficients between .30 and .40 are
considered adequate. A coefficient is statistically significant if the chances of obtaining its
value by chance alone are quite small: usually less than 5 in 100. The validity coefficient
squared is the percentage of variation in the criterion that we can expect to know in
advance because of our knowledge of the test scores. Thus, we will know .40 (validation
coefficient) squared, or 16%, of the variation in outcome because of the information we
have from the test. This leaves 84% unexplained, which can be explained by many other
factors. In many circumstances, using a test is not worth the effort because it contributes
only a bit to the understanding of variation in a criterion. However, low validity
coefficients (.30 - .40) can sometimes be especially useful even though they may explain
only 10% of the variation in the criterion. In other circumstances, a validity coefficient
of .30 - .40 means almost nothing. It is about how you interpret the data. Because not all
validity coefficients of .40 have the same meaning, you should watch for several things in
evaluating testing information:
Construct-related evidence for validity
Studies of criterion validity evidence would require that a specific criterion of intelligence be
established against which tests could be compared. However, there was no criterion for
intelligence, because it is a hypothetical construct. A construct is defined as something built
by mental synthesis. As a construct, intelligence does not exist as a separate physical thing,
so it cannot be used as an objective criterion. Constructs of interest in psychology (e.g. love,
curiosity etc) are often not clearly defined, and there is no established criterion against
which you can compare the accuracy of the tests. Construct-related validity evidence is
established through a series of activities in which a researcher simultaneously defines some
construct and develops the instrumentation to measure it. This process is required when "no