Test theory - PSBE2-06
University of Groningen
Book: Thomas P. Hogan (2013). Psychological Testing, a Practical Introduction 3th edition.
Chapter 1 - The world of psychological testing.................................................................. 1
Chapter 3 - Test norms.......................................................................................................... 4
Statistics review..................................................................................................................4
Chapter 4 - Reliability............................................................................................................ 8
Statistics review..................................................................................................................8
Chapter 5 - Validity...............................................................................................................15
Chapter 6 - Test development, Item analysis, and Fairness............................................ 21
Chapter 7 - Intelligence: Theories and Issues...................................................................26
Chapter 8 - Individual tests of Intelligence........................................................................ 30
Chapter 12 - Objective Personality Test.............................................................................33
Chapter 13 - Clinical Instruments and Methods................................................................ 37
Chapter 14 - Projective Techniques....................................................................................42
Chapter 15 - Interests and Attitudes.................................................................................. 44
Factor analysis document...................................................................................................48
,Chapter 1 - The world of psychological testing
Central idea of psychological testing → we would like to measure constructs like
intelligence and personality because we assume that there are meaningful individual
differences in “performance”, “skills” or “traits” (cannot be directly observed)
Mental ability tests = tests designed for measuring cognitive functions (memory, spatial
visualization, creative thinking, etc.).
→ e.g. Wechsler Adult Intelligence Scale (WAIS); Stanford-Binet Intelligence Scale
Achievement tests = tests that attempt to assess a person’s level of knowledge or skill in a
particular domain.
1. achievement batteries used in elementary and secondary schools
2. single-subject tests that cover only one area (psychology, French)
3. variety of tests used for purposes of certification and licensing in such fields as
nursing, teaching, therapy, etc.
4. government agencies that support achievement testing programs
Personality tests = tests to get information about humans personality
1. Objective personality test; true/false kind of format
2. Projective techniques; to reveal someone's personality (e.g. Rorschach Inkblot Test)
Vocational interest measures = tests to help individuals explore jobs relevant to their
interests.
Neuropsychological tests = tests that’re designed to yield information about the functioning
of the central nervous system, especially the brain.
Other ways to classify tests:
● Paper-and-pencil vs Performance tests
● Speed vs Power tests
● Individual vs Group tests
● Maximum vs Typical Performance
● Norm-referenced vs Criterion-referenced
We often want to compare an individual test score with test scores in a population so that we
can interpret the score in relation to other scores: Norm-referenced testing
Criterion-referenced testing = on the basis of the criterion your test performance is
considered adequate (our exam)
Criterion-referenced interpretation → you interpret a test score without reference to any
norm group
Major contexts for Use of Tests:
● Clinical (clinical psychology, counseling, neuropsychology, etc.)
● Educational (group-administered tests of ability and achievement by teachers,
educational administrators, parents, etc.)
1
, ● Personnel (business and military)
● Research (in every area of research)
Basic assumptions
1. Humans have traits or characteristics; differences in traits are important
2. We can quantify these traits
3. The traits are reasonably stable
4. Our measures relate to actual behavior
Fundamental questions:
1. We ask about the reliability of the test (stability of test scores)
2. We ask about the validity of the test (what the test is actually measuring)
3. How do we interpret the scores from a test? (depends on norms)
4. We ask about the test development (how it was created, etc.)
5. We ask about the practical issues of a test (costs, duration, language)
Norms = based on the test scores of large groups of individuals who have taken the test in
the past.
Differential perspective = assumes that the answer may differ for different people.
History of testing
Remote background: up to 1840
● remote roots of psychology as well as most fields are in philosophy
Setting the stage: 1840-1880
● scientific interest and public consciousness of mental illness increased enormously
● adoption of formal written examinations by the Boston school committee
● the age of Darwin dawned
● experimental psychology emerged
The roots: 1880-1915
● four key figures:
○ Francis Galton (founder of psychological testing)
○ James McKeen Cattell (contributor to the development of testing; mental test)
○ Alfred Binet (measured mental activities; mental age)
○ Charles Spearman (factor analysis; g factor)
The Flowering: 1915-1940
● Stanford-Binet become the benchmark definition of human intelligence
● development of the first widely used group-administered intelligence test
● intelligence test designed for adults
Consolidation: 1940-1965
● new many revised editions of many of the tests appeared
● testing would play a prominent role in a variety of venues
● military testing
Just yesterday: 1965-2000
● test theory has changed dramatically
● emergence of item response theory (modern test theory) = a new set of methods for
examining a whole range of issues related to the reliability, scaling and construction
of tests
2
, ● legislative and judicial activism regarding tests, some tests were being required by
law, others prohibited.
● testing became the subject of widespread public criticism during this period.
● computers have pervasively influenced contemporary testing
And now: 2000-present
● explosive increase in the number and diversity of tests
● pervasive influence of managed care
● an outgrowth of evidence-based practice (EBP; the notion that whatever the
psychologist oes in practice should be based on sound evidence)
6 major forces that have shaped the field of testing:
1. Scientific impulse
2. Concern for the individual
3. Practical applications
4. Statistical methodology
5. The rise of Clinical psychology
6. Computers
→ three major aspects
1. statistical processing
2. score reporting
3. test administration
scanner = electrical or electronic device that counts marks on a test answer sheet
Interpretive reports = reports of test performance are no longer confined to numbers, be
described with simple words or even continuous narrative.
Computer-adaptive testing = computer not only presents the items but also selects the
next item based on the examinee’s previous responses.
Automated scoring = a computer program has been developed to simulate human
judgment in the scoring of such products as essays, medical diagnoses, etc.
6 key elements of a test:
1. procedure or device
2. it yields information
3. information about behavior and cognitive processes
4. information about a sample of behavior
5. a systematic standardized procedure
6. reference to quantification or measurement
Test = a standardized process or device that yields information about a sample of behavior
or cognitive processes in a quantified manner.
Standardized = uniform procedure for administering and scoring the test
3
, Chapter 3 - Test norms
Raw score = immediate result of an individual’s responses to a test.
Normed score = (derived/scaled scores) an individual's raw score is compared with scores
of individuals in the norm group.
Variables can be described at 3 levels of generality:
1. A variable is a construct; verbal descriptions and definitions of the variable
2. A variable is an operational definition; described how it can be measured
3. We get raw data; numbers that result from application of the measures
Statistics review
Descriptive statistics → help summarize or describe the raw data to aid our understanding
of the data.
Inferential statistics → help us to draw conclusions (inferences) about what is probably
true in the population based on what we discovered about the sample.
4 classifications of scales:
1. Nominal scale; classified, assigns numerals
2. Ordinal scale; indicate a (e.g. rank) ordering
3. Interval; lacks true zero point, equal distances between numbers
4. Ratio; has true zero point, places objects in order with equal intervals
Frequency distribution = organizes raw data into groups of adjacent scores
→ often converted to graphic form (frequency histogram or frequency polygon)
Central tendency = the center around which the raw data tend to cluster
→ 3 commonly used measures of central tendency: mean, median & mode
Mean = arithmetic average
Median = middle score
Mode = most frequent score
Variability indexes: range, standard deviation, variance & interquartile range
Range = distance from lowest to highest score
Standard deviation = how close the data is clustered around the mean
Variance = standard deviation2
Interquartile range = distance between 1st and 3rd quartiles
Z-score = standardized score, (x value - mean) / sd
→ e.g. z = 1,5 means 1,5 SD above the mean in a population
Distributions can be different from the normal curve:
→ kurtosis; the peakedness of the distribution
→ skewness; degree of symmetry for right and left sides of the curve
4