Inhoudsopgave samenvatting Psychometrics: An Introduction (3rd edition)
Hoofdstuk 1 t/m 3, 5 en 6
Voor het vak Testtheorie (500216-B-5) – Collegejaar 2022-2023
Chapter 1…………………………………………………………………………………p. 2 t/m 4
Psychometrics and the Importance of Psychological Measurement
Chapter 2…………………………………………………………………………………p. 5 t/m 7
Scaling
Chapter 3……….………………………………………………………………………p. 8 t/m 9
Individual Differences and Correlations
Chapter 5………………………………………………………………………………p. 10 t/m 13
Reliability: Conceptual Bias
Chapter 6………………………………………………………………………………p. 14 t/m 17
Emprirical Estimates of Reliability
LET OP: rest van de hoofstukken (8, 9, 11 en 14) zijn niet samengevat
,Psychometrics: An Introduction, 3rd edition – R. Michael Furr
Chapter 1 – Psychometrics and the Importance of Psychological Measurement
Behavioral measurement
1) For the behavior itself
2) Underlying psychological processes, assessing unobservable psychological attributes (e.g.
intelligence, depression, ability, extroversion) -> making inferences: observable behavior is
systematically related to an unobservable psychological attribute
- Important: validity (scores) construct validity (psychological construct; theoretical concepts =
hypothetical constructs/latent variables & operational definitions) and measurement validity
(theory linking psychological attribute to observable behavior)
All sciences rely on unobservable psychological constructs to some degree, and they all
measure these constructs by measuring some observable events or behaviors
Cronbach: psychological test = "a systematic procedure for comparing the behavior of two or more
people"
- Three components:
1. Tests involve behavioral samples of some kind
2. The behavioral samples must be collected in some systematic way
3. The purpose of the test is to compare the behaviors of two or more people
Generality:
1) tests come in many forms - e.g. questionnaire, lab setting
2) different types of information - e.g. numbers, categorical data
3) purpose - comparing behavior of different people (=interindividual differences) or
the behavior of the same individuals at different points in time/circumstances
(=intraindividual differences) -> identify, and if possible, quantify inter- +
intraindividual differences (these differences on test performance contribute to test
score validity)
Thousands of different tests, varying on:
- Content (what they attempt to measure)
- Type of response required (e.g. open-end vs. closed-ended tests)
- Methods used to administer them (individual vs. groups)
- Intended purpose of test scores:
1) criterion/domain referenced = often decision about person's skill level, cut-off test score
as criterion, sorting people in two groups (who do/don't exceed criterion)
2) norm referenced = often to understand how a person compares with other people,
comparing person's test score with scores from reference/normative sample: a sample
of people completing a test + thought to be representative of some well-defined
population -> individual higher/lower score than "average person" in the relevant
population
, - Speeded tests = time-limited, counting number of questions answered within a
certain time period, each question should be comparable difficult vs. Power tests =
not time-limited + answer all questions, counting number of correct answers, test
items must range in difficulty
Reflective/effect indicators: the hypothetical construct (e.g. intelligence) determines, in part, a
person's response to the items on the test; these answers are seen as "indicators" of the construct ->
these type of scores relevant for procedures in this book
Formative/casual indicators: e.g. income + education level, occupational status; the indicators aren't
viewed as "caused" by a person's SES, but the indicators of SES are, in part, exactly what defines SES
Test, aka: measure, instrument, scale, inventory, schedule, assessment, battery (bundled tests,
administered together but not necessarily designed to measure a single psychological attribute)
Psychometrics: science concerned with evaluating attributes of psychological tests
- Three of these attributes in particular:
1) Type of information (mostly scores) generated by the use of psychological tests
2) Reliability of data from psychological test
3) Issues concerning validity of data from psychological tests
4) Procedures psychometricians use to evaluate these attributes of tests
Brief history - two key foundations
1. Practice of psychological testing and measurement -> goes back 2.000-4.000 years, increase
in 19th century with psychological science -> even more in 20th century with early
intelligence tests -> past 100+ years exploded, desire high quality + evaluate and improve
tests -> psychometrics
2. Development of statistical concepts + procedures -> begin 19th century ways to understand
+ work with types of quantitative information from psychological testing (Spearman,
Pearson, Galton = sometimes considered founding father modern psychometrics) -> SD,
correlation coefficient, factor analysis, normal distribution (human characteristics) sampling
(measurement error) -> 1930s/40s journal Psychometrika, Psychometric Society, APA
"Division of Evaluation and Measurement" -> Classical Test Theory (CTT) and Item Response
Theory (IRT) emerged
All measurements / (behavioral) sciences are affected by various challenges which can reduce
measurement accuracy (which physical sciences sometimes may not have)
- Complexity of psychological phenomena (consist of much different aspects)
- Participant reactivity (act of measurement itself can have an influence, e.g. response
biases) -> usually not a problem with measuring features of nonsentient physical
objects (e.g. weighing grapes)
*Demand characteristics: figure out researcher's purpose of the study, changing
behavior to accommodate the researcher
*Social desirability: impress person who measures
*Malingering: change behavior to convey poor impression
- The people collecting data - e.g. observing behavior, scoring a test, interpreting
verbal response, observer/scores bias -> bias + expectations (can be difficult to
detect, subtle/unintended biases)
- Composite scores: combine the items' scores (all questions) to create a
total/composite score, which represents the final measure of the construct ->
benefits + issues