Challenges to Psychological Measurement # Learning Goal 1 (Part of Chapter 1)
-we can never be certain that a measurement is perfect -> several challenges apply especially to behavioural science:
1) participant reactivity = people’s knowledge that they are being observed/assessed can cause them to react in
ways that obscure their true levels on the assessed psychological construct -> reduces validity of test interpretation
-> different forms of participant reactivity:
*demand characteristics -> being influenced by the testing situation, which gives evidence about purpose of exp.
*social desirability -> wanting to be liked by the person, who assesses you => may have important consequences
*malingering -> presenting yourself as worse on a psychological construct as you actually are => due to potential
benefits when doing so
2) objectivity is required = often fails because testers bring expectations/biases to the task, which distort their
observation -> related: subject-expectancy bias & experimenter-expectancy bias
3) reliance on composite scores = multiple items make up 1 composite OR total score
4) score sensitivity = ability of a measure to discriminate adequately between meaningful amounts or units of the
dimension that is being measured -> comes down to how the scale is created => questionnaire should provide
enough answer options to reflect meaningful differences on the assessed construct
example: when assessing state of mind on “good versus bad” dimension, we need to provide options in between
& not only “good” & “bad” as the 2 options
5) lack of awareness of psychometric information = applied psychological measurement often seems to be
conducted with little or no regard for the psychometric quality of the tests
example: relying on a test assessing extraversion without considering reliability & interpreting the test scores
without considering validity
TYPES OF VARIANCES (EXTRA INFORMATION – IMPORTANT FOR LATER ON DURING THE COURSE)
Construct variance = differences between individuals based on differences between their true levels on a trait
-> intended, systematic variance => ideally: we want all variance to be construct/trait variance
Method variance = differences between individuals based on the measure, which is used (e.g. questionnaire elicits
differences in how children comprehend the questions) -> unintended, systematic variance
Location/Situation variance = differences between individuals based on the locational/situational differences when
testing them -> unintended, systematic variance
Subject variance = differences between individuals based on differences on other traits/characteristics between the
individuals (e.g. age, weight, IQ…) -> unintended, systematic variance
Non-systematic error variance = differences between individuals based on random things, which don’t affect the
test scores with a particular pattern (e.g. random responding/guessing may work better for some students than for
others)
-> unintended, non-systematic variance
NOTE
1) non-systematic error (random responding, guessing) = error variance (decreases reliability -> underestimation of
reliability)
2) systematic error (method variance, location/situation variance, subject variance, malingering, social desirability
bias, acquiescence bias…) = error, which is falsely attributed to true variance (increases reliability -> overestimation
of reliability)
,
, Individual Differences & Correlations – Chapter 3
VARIABILITY
-variability = differences within a set of test scores or among the values of a psychological attribute
-> interindividual differences = differences between different people on the same attribute
-> intraindividual differences = differences within 1 person (emerging over time/under different circumstances)
-individual differences were emphasized by Galton -> central assumption in psychology: people differ on certain
characteristics => all kinds of psychology rely on the ability to quantify individual differences between people
-when many people take the same test -> distribution of scores exists => researchers want to quantify variability
-3 key features of a distribution: central tendency (mean, median, mode), variability & shape
-numerator of variability often called: sum of squares (SS)
-SD reflects deviation in raw deviation scores, whereas variability indicates deviation in SDs -> SD more intuitive
=> value of both determined by 2 factors: 1) actual deviation & 2) metric of the distribution scores
SD of 2 for GPA isn’t necessarily smaller than 15 for IQ (different metrics/scales)
-4 facts on variability:
1) SD & variance can both not be negative
2) value for either of them can’t simply be categorized as small/large (metric important too)
3) SD & variance are most meaningful when they are set into context -> e.g. comparing it with another distribution
4) importance of both values lies mainly in their influence on other values, which are more directly interpretable
NOTE: this book doesn’t require N-1 in the denominator -> simply N is enough when it comes to SD & variability
-co-variability = degree to which 2 distributions of scores vary in a corresponding manner -> also called: association
-> example: how much are IQ-scores associated with high-school GPA
=> we want to know direction & strength of association (correlation)
COVARIANCE
-represents degree of association between variability of 1 distribution of scores with another distribution of scores
-> calculating covariance in 3 steps:
1) calculating deviation from the mean for both x and y (both variables)
2) calculating cross-products -> (x-deviation) x (y-deviation)
3) computing the mean of all cross-products => covariance
-covariance provides clear information about direction of association -> positive covariance = positive association
=> provides ambiguous strength-info though (due to metric-factor again)
covariance has limited direct interpretation potential -> mostly, basis for other statistics (just like variability)
-variance-covariance matrix -> smallest version includes: 2 variances & 1 covariance => basic features:
1) each variable has a column & a row -> 4 x 4 matrix for 4 variables (EXAMPLE BELOW)
2) variances are presented on diagonal line from top left (variable 1) to bottom right (variable 4 (in example))
3) all other cells represent covariances -> cells always pair another cell -> number of covariances: (x-1)/2
4) covariances are symmetric -> due to organization of the matrix
,CORRELATION
-correlation coefficient -> gives evidence on strength & direction of an association
-> is based on covariance, BUT divided by product of SD of both variables
=> Cohen: 0,1-0,29 = small; 0,3-0,49 = moderate; 0,5-1 = large strength categorization
NOTE: both CORRELATION & COVARIANCE should be seen as an index of consistency of individual differences
between people/groups of people
-----------------------------------------------------------------------------------------------------------------------------------------------------------
VARIANCE & COVARIANCE FOR COMPOSITE SCORES
-composite score = many sub-variables (sub-scores) are measured in order to reflect one overall variables
=> example: 15 different questions (items) are asked to investigate the level of happiness (main-variable) of a person
usually, all these different items are either summed or averaged to obtain main variable
-variance of composite scores depends on variability of all items & the correlations between the items
=> if r = 0, variance depends only on variability of items separately (first equation part)
NOTE: equation on the left refers to a composite score of ONLY 2 ITEMS (i & j)
-covariance of composite scores: simplest case would involve 2 composite scores, composed of 2 items each
simply the sum of the cross-products
BINARY ITEMS
-binary items: have dichotomous answer (e.g. yes or no) => 0 assigned to “no” & 1 assigned to “yes”
-mean of binary items: p = proportion of people saying “yes”: and variance:
-> variance of a binary item can become 0,25 at max
----------------------------------------------------------------------------------------------------------------------------------------------------------
INTERPRETATION OF TEST SCORES
Z-SCORES & STANDARDIZED SCORES
-goal: we need to locate an individual score within a distribution of scores -> 2 types of info are required:
1) above/below the mean?!?
2) how much does the mean difference mean psychologically (i.e. relative size of the difference)
=> combined in z-scores they indicate extremity of a score
-if you take a distribution of scores & convert every score into a z-score, the distribution of the z-
scores will have a mean of 0 and SD of 1 => Z (0, 1)
-benefits of z-scores:
1) frees us from being worried about metric of scores -> due to standardization
2) based on that: we can compare scores from different tests
3) by standardizing scores on 2 different tests, you can also calculate the correlation coefficient:
-limitation: z-score tells us the location of a score compared to a certain sample
=> but not overall population no absolute BUT relative value
-ambiguity: sometimes a z-score is less intuitive => What does an IQ of 1,24 mean?
retransformation to converted standard scores (standardized scores)
z-scores are converted into another (more comprehensible) scale with different mean & SD
procedure:
1) choosing new mean & SD & 2) inserting values in following equation:
,PERCENTILE RANKS
-another relative way of expressing test scores -> score in 85 th percentile: 85% have scored lower than that person
-2 ways of determining percentile ranks:
1) counting the number of people, who scored lower & dividing it by the sample size => result = percentile rank
2) calculating z-score & looking for corresponding percentile rank in Normal distribution table
=> works only if scores within the sample are based on a Normal distribution
-when there is reason to assume that the distribution isn’t Normal: use Normalized scores
-> assumption: actual population is actually normally distributed; BUT this sample is an imperfect representation
=> normalization transformation process consists of 3 steps:
1) compute percentile ranks for all scores (empirical, 1 st method)
2) convert all percentile ranks into z-scores by looking at the Normal distribution table
3) computing converted standard score (standardized score) by choosing the assumed M & SD & using old formula
TEST NORMS
-when new psychological test is developed, test developers choose a large sample, which is thought to represent
entire population as closely as possible => reference sample results of this sample are taken as test norms
-probability-sampling = generated by using a sampling method that guarantees representativeness
=> random sample better than non-probability sampling: representativeness can’t be guarantees due to bias
-----------------------------------------------------------------------------------------------------------------------------------------------------------
ADDITIONAL INFO
-skewed to the right = positive skew -> mean larger than median
-> skewed to the left = negative skew -> median larger than mean
, Test Dimensionality & Factor Analysis – Chapter 4
DIMENSIONALITY OF A TEST
-most psychological tests aim to investigate only a single attribute -> even when you obtain composite scores from
over 20 questions, they should be closely related to a single attribute (e.g. courage)
-3 main questions regarding a test’s dimensionality:
1) How many dimensions are reflected in the test items? -> while some tests cover only 1 dimension, others
investigate 2/more dimension => every dimension requires its own statistical analysis
2) If a test has more than one dimension, then are those dimensions correlated with each other?
3) If a test has more than one dimension, then what psychological attributes are reflected by the test dimensions?
=> a test’s dimensionality has important implications for the scoring, evaluation, and use of the test
NOTE: a dimensional attribute is thought to influence the test taker’s response on corresponding items
UNIDIMENSIONAL TESTS
-unidimensional tests = all items of a test measure same psychological attribute & responses are driven ONLY by this
attribute (& partly random measurement error)
-test items have the property of conceptual homogeneity = responses to each item are only affected by the same
psychological attribute
-scoring: you only receive 1 composite score, which is then evaluated as an overall score
=> example: an exam is a unidimensional test, if it only tests geometry knowledge rather than grammar, algebra…
MULTIDIMENSIONAL TESTS WITH CORRELATED DIMENSIONS
-example: Stanford-Binet Scale => contains 5 different sub-scales, which all reflect a different facet of intelligence
=> scoring high one 1 dimension, increases the likelihood for scoring high on the other dimensions
-scoring: each subscale has its own individual score, BUT the subscales are often still combined to a total test score
=> the most general attribute measured is often called: higher-order factor
-evaluation: each subscale’s attribute is evaluated separately
-> POSSIBLE: some subscales have good psychometric quality, while others have poor psychometric quality
=> reliability & validity is evaluated for each scale + IN MOST CASES: for the total scale too
-test use: you have multiple options -> 1) you could use all/some sub-scores; 2) you could the total score (if accepted
through psychometric evaluation)
MULTIDIMENSIONAL TESTS WITH UNCORRELATED DIMENSIONS
-here: subscales are not associated at all or only weakly associated
=> measured attributes do not reflect any higher-order factors
each sub-scare is treated as unidimensional
whole test could be considered as unrelated unidimensional tests, that are presented together (with mixed items)
-scoring, evaluation & use: similar to correlated dimensions (multiple), BUT no total score is computed
PSYCHOLOGICAL MEANING OF TEST DIMENSIONS (3rd question)
-factor analysis is used to filter the actual psychological attribute, which is assessed by a particular dimension
-----------------------------------------------------------------------------------------------------------------------------------------------------------