Measurement Theory and Assessment 1 // Meten en Diagnostiek 1 (Vrije Universiteit) Course Notes - Year 1, Period 4
83 views 4 purchases
Course
Measurement Theory & Assessment I (P_BMETDIA_1)
Institution
Vrije Universiteit Amsterdam (VU)
Hi! Need help with your upcoming MT&AI exam? No problem!
These notes include all of the relevant information necessary for your Measurement Theory and Assessment 1 exam. Since the professor (S. Noordermeer) deemed the book unnecessary (at least in 2020), I did not include the book notes. Hope thi...
Measurement Theory & Assessment I (P_BMETDIA_1)
All documents for this subject (3)
Seller
Follow
notesbymau
Reviews received
Content preview
Week 1.1: Introduction and Ethics
Diagnosticsinvolves a thorough examination of a situation in order to make a decision
- Diagnosisinvolves determining the nature and source of a person’s abnormal
behavior, and classifying the behavior pattern within an accepted diagnostic system
Likewise, p
sychodiagnosticsis concerned with assessing an individual’s psychological
functioning
- Provides a reliable and valid description of psychosocial reality
- Reliable ⇒ ideally repeatable
- Valid ⇒ ideally approaches reality
- Allows finding possible explanations for problems
- Can be used to test explanations
Interrater reliability of psychiatric diagnosis:
- 0-50% (chance level) with non-standardized interviews and tests
- 60-70% (above chance level) with standardized interviews and tests
Even when the diagnostic assessment is standardized, interrater reliability is still relatively
low because:
- The constructs in question are complex and often lack precise definitions (e.g., no
exact definition of IQ)
- There is a limited amount of time available to assess the construct
- Confirmation bias
: tendency to interpret new evidence as confirmation of one’s
existing beliefs or theories
- Availability heuristics
: tendency to check for symptoms that are related to disorder
with a high prevalence
A testis a standardized procedure for sampling behavior and describing it with categories or
scores
- E.g., academic achievement → procedure → score
- Provides scientifically sound, reliable, and objective information for decision making
- Useful for: problem analysis, classification and diagnostics, treatment planning,
program/treatment evaluation, self-knowledge, scientific research
- Classification can be further broken down into placement (the sorting of
people into appropriate programs), screening (quick identification of people
with special characteristics or needs, certification (e.g., for a driver’s license)
and selection (e.g., for college)
An a
ssessmentrefers to the entire process of compiling information about a person and
using it to make inferences about characteristics and predict behavior
- Therefore, a test is only a c
omponent of the assessment process
Main types of psychological tests:
1
, - Intelligence; aptitude; achievement; creativity; personality; interest inventory;
behavioral procedures; neuropsychological/cognitive
Test developers consider the following questions when constructing a diagnostic test:
- Does a test measure what it aims to measure?
- I.e., validity
- How and under what circumstances could/should you test?
- Is a short version of a test (just as) reliable?
- How is the reference group determined?
- I.e., what is the norm group against which the individual’s score is tested?
Every test score contains a measurement error
:
o
bserved score X = true score T + error component e
- Many psychological/pedagogical constructs are not perfectly defined; a test relies
on an external sample of behavior to estimate an unobservable and inferred
characteristic
- Misapprehension of questions (e.g., misunderstanding the questions, confusing
phrasing)
- Socially desirable answering (intentional)/context (unintentional)
- Negligent use of manual
To achieve s tandardization
, a test has to have the following components:
- Repeatability
- You should always assess the same score in the same individual (unless you
expect there to be a difference due to intervention/training)
- I.e., reliability
- Sample of behavior (i.e., integrality
)
- Neither the subject nor the examiner has
sufficient time for truly comprehensive
testing, even when the test is targeted to a
well-defined and finite behavior domain
- Therefore, only a few concise questions are used per symptom to assess the
behavior (see above)
- Scores or categories (to indicate performance)
- Norms or standards to which an examinee’s test score can be compared
- Norms
: a summary of test results for a large and representative group of
subjects (to establish average performance)
- Norm group ⇒ s tandardization sample
- Takes prevalence into account, unlike the
statistical approach
2
, - The score can also be compared to a statistically set cut-off value (e.g., 1 or 2
standard deviations)
- E.g., a score above 2 standard deviations (2.5%) is indicative of a
disorder
- However, if the disorder has an actual prevalence of 0.5%, you
are over-classifying and over-diagnosing individuals
- Prediction of non-test (specific) behavior
- Validation of a test after its been released
- Raven test → IQ score → does it predict educational achievement?
A test is s tandardized
if the procedures for administering it are uniform from one examiner
and setting to another
- The directions for administration are found in the instructional manual that
accompanies a test
In a norm-referenced test
, the performance of each examinee is interpreted in reference to a
relevant standardization sample. However, in a c
riterion-referenced test
, the objective is to
determine where the examinee stands with respect to very tightly defined educational
objectives
- There is no comparison to the normative performance of others; no reference group
The Dutch Association of Psychologists (www.psynip.nl) and the Dutch Association of
Pedagogues & Educationalists (www.nvo.nl) both provide guidelines on professional ethics
- Quality assurance of instruments
- Registration
- Training courses
- COTAN(Committee on Tests and Testing in the Netherlands) is a dutch
institute that assesses test quality
- Looks at norms, materials, theoretical/hypothetical background of the
test ⇒ advice on the test’s sufficiency
Two main requirements an instrument has to meet:
1. Psychometric criteria; the test has to be sound, reliable, and objective
a. COTAN (1) openly informs users about the quality of instruments and (2)
provides feedback to developers on the quality of their instruments
2. The test should be used ethically
COTAN examines:
- Principles of test construction
- Goal (why was it developed?), (target)
group, function
3
, - Standardization (necessary to reduce measurement error)
- Quality of test material and manual
- Norms
- Representativeness of the reference group (necessary for inference)
- Reliability
- Consistency/repeatability of score
- Validity
- Does the test assess what it aims to assess or is it measuring a different
construct?
There are +/- 800 different tests readily available
- 50% haven’t been assessed on quality
The goal of ethics is r esponsibility, i ntegrity, r espect and e
xpertise
- The test should be relevant
- Assessment should only be done by qualified individuals
- Role of integrity ⇒ no personal relationship with the client
- Confidentiality
- Informed consent
- Independent and objective
- Reporting without jargon
Week 2.1: Reliability I
Reliabilityrefers to the attribute of consistency in measurement
- However, every few measures of physical or psychological characteristics are
completely consistent
- Therefore, the concept of reliability is best viewed as a continuum ranging
from minimal consistency of measurement (e.g., simple reaction time) to
near-perfect repeatability of results (e.g., weight)
- Mainly referred to in terms of the classical test theory
- Charles Edward Spearman
- ‘Theory of true and error scores
’
Reliability has a score between 0 and 1
- The score indicates the (cor)relation between two scores after repeated
assessments or between items within the test
- The score reflects the consistency/reproducibility of scores
- Reliability is the ratio between actual behaviour T versus test score X
The basic starting point of the classical theory of measurement (i.e., theory of true and error
scores) is the idea that test scores result from the influence of two factors:
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller notesbymau. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $5.25. You're not tied to anything after your purchase.