MT&A I – FINAL EXAM TOPICS 2019-2020
Topic 1B – Ethical and Social Implications of Testing
Responsibilities of test publishers
Publication and marketing issues: guarding against premature release
Competence of test purchasers
APA proposes that tests fall in one of three levels of complexity requiring different
degrees of expertise from the examiner
Level A: straightforward paper-and-pencil measures that can be administered,
scored, and interpreted with minimal training
Level B: require knowledge of test construction and training in statistics and
psychology; available to persons who have completed and advanced-level course in
testing or equivalent training
Level C: substantial understanding of testing and supporting topics; supervised
experience essential for proper administration
The administration of take-home tests is discouraged
Responsibilities of test users
Best interests of the client: assessment should serve a constructive purpose
for the individual examinee
Confidentiality and the duty to warn: ethical release of information +
clinicians must communicate any serious threat to the potential victim, law
enforcement, or both + client’s welfare should be considered in deciding
whether to release information (especially with minors)
Expertise of test user: test user must accept responsibility for proper
application of tests and must be trained in assessment and measurement
theory < proper standardization, reliability, validity, interpretive accuracy,
and other psychometric characteristics
Informed consent: disclosure, competency, and voluntariness
Obsolete tests and the standard of care: one that is usual, customary, or
reasonable + careful not to rely on test results that are outdated for the
current purpose; evaluate need for retesting
Responsible report writing: simple and direct writing that steers clear of
jargon and technical terms + stay within bounds of expertise of the examiner
Communication of test results: providing effective and constructive feedback
to clients about test results
Consideration of individual differences: practitioners are expected to know
when a test of interpretation may not be applicable because of factors such
as sex, age, gender, race, disability, socioeconomic status etc.
Testing of cultural and linguistic minorities
Cultural background of examinees impact the process of assessment
Sattler (1988): adopt a frame of reference that will enable you to understand how
particular behaviors make sense within each culture
Culturally based differences in response style may conceal underlying competence
of examinees > examinees from culturally and linguistically diverse backgrounds
Stereotype threat: threat of confirming, as self-characteristic, a negative stereotype
about one’s group
Unintended effects of high-stakes testing
1
,MT&A I – FINAL EXAM TOPICS 2019-2020
Lake Wobegon Effect: overly optimistic picture of student achievement
Excessive emphasis on nationally normed achievement tests for selection and
evaluation promotes inappropriate behavior, including cheating and fraud
Topic 3B – Concepts of Reliability:
Classical test theory: idea that test scores result from the influence of
1. Factors that contribute to consistency – stable attributes of the individual
2. Factors that contribute to inconsistency – characteristics of the individual, test,
situation that have nothing to do with the attribute being measured but nonetheless
affect scores
Sources of measurement errors:
Item selection: selection on which questions and items should be included in
the test and how they should be worded – footnotes versus rest of textbook
example
Test administration: uncomfortable temperatures, excessive noise +
momentary fluctuations in anxiety, attention, fatigue level
Test scoring: guidelines are necessary to minimize impact of subjective
judgement in scoring
Systematic measurement error: arises when a test consistently measures
something other than the trait for which it was intended
Systematic errors:
Either positive or negative
Average measurement error ≠ 0
Error in test construction/inconsistency assessed construct
Measure of validity
Crucial assumptions classical test theory – unsystematic errors:
1. Measurement errors are random within population
2. Mean error of measurement = 0
3. True scores and measurement errors are uncorrelated: rTe = 0
4. Measurement errors on different tests are uncorrelated: r12 = 0
5. Measurement errors are normally distributed
6. Measure of reliability
σ X 2=σ T 2+ σ e2
Reliability coefficient (rXX): ratio of true score variance to the total variance of test
scores:
σ T2
rXX =
σ X2
Correlation coefficient (r): expressed the degree of linear relationship between two
sets of scores obtained from the same persons
Reliability as temporal stability
Test-retest reliability: relation between scores of a group on a test with
repeated assessment – Pearson’s r – thus, same sample at T0 and T1
2
, MT&A I – FINAL EXAM TOPICS 2019-2020
Alternate-forms reliability: in which developers produce two forms of which
are constructed to meet the same specifications – derived by administering
both forms to the same group and correlating the two sets of scores
Estimates random fluctuations within the individual
Random fluctuations due to the environment
Random fluctuations due to sample of items
Useful when practice effects are expected
Reliability as internal consistency
Split-half reliability: relation between scores of a group on two test halves
Justification: logistical problems or excessive cost may render it
impractical to obtain a second set of test scores from the same
examinees
Protects against practice effects
Test-retest approaches will yield misleadingly low estimated of
reliability if the trait being measured is known to fluctuate rapidly
(mood)
Splitting test in two halves reduces reliability
Spearman-Brown formula: derives the reliability of the whole test based on
2 rhh
the half-test correlation coefficient: rSB= > corrects for split-half
1+ rhh
reliability but only 1 split disadvantage
Coefficient alpha: mean of all possible split-half coefficients, corrected by the
2
N Σσ j
Spearman-Brown formula: rα=( )(1− 2 )
N −1 σ
N = number of items
Σ σ j = sum of variances of all items
2
2 = variance of the total test scores
σ
ra = varies between 0.00 and 1.00
Index of internal consistency of items; tendency to correlate positively
with one another
Increases with more items, decrease in variance in e, consistency
between items increasess
N Σ pq
Kuder-Richardson estimate of reliability (KR-20): KR−20=( )(1− 2 )
N −1 σ
N = the number of items on the test
σ 2 = variance of scores on the total test
p = proportion of examinees getting each item correct
q = proportion of examinees getting each item wrong
Interscorer reliability: compares scores of different examiners with each other
Item response theory or latent trait theory: alternative model of test theory
1. Item response function (IRF): mathematical equation that describes the relation
between the amount of a latent trait that an individual possesses and the
probability that they will give a correct response to a test item designed to measure
that construct
2. Information functions: test item typically provides a different level of information
at each level of the trait in question
3