100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Extensive lecture notes Test Construction €4,49
In winkelwagen

College aantekeningen

Extensive lecture notes Test Construction

1 beoordeling
 144 keer bekeken  13 keer verkocht

Extensive lecture notes of lecture 1 - 7 for the course Test Construction. I added visuals from the slides to make it easier to follow as this course can be a bit tricky. The notes therefore form a summary in itself. I got a 9 for this course. Good luck!

Voorbeeld 4 van de 37  pagina's

  • 14 februari 2020
  • 37
  • 2019/2020
  • College aantekeningen
  • Onbekend
  • Alle colleges
Alle documenten voor dit vak (1)

1  beoordeling

review-writer-avatar

Door: abigailbradley1997 • 3 jaar geleden

avatar-seller
leoniemanon
Test construction

College 1

What test for particular aims and particular groups of interest.
How to evaluate quality and interpret test scores.

Know what parameters mean in formula and which to use for which computation

- Test construction: what does a test look like, instructions for administration, administration
- Test theory: stasticial theory about behaviour of item scores and test scores. Quality of items
etc.
Both needed for sensible use of tests, not only about construction

Tests are broadly used
- Human resource management: personnel selection and development
- Education: individual development and performance of students.
Identify deviating patterns of development – pupil assessment system. Mandatory by law
Prediction of most suitable type of high school. Also mandatory in NL. End of primary school
CITO toets.
- Psychodiagnostics: neuropsychology, clinical psychology, developmental psychology

Mainly used as judgement of individuals

But also use of tests in research
to test hypothesis or build a theory
here in research mainly judgements of populations, group level, group comparison etc.

what is a test?
A psychological or educational test is an instrument for the measurement of a person’s maximum or
typical performance under standardized conditions, where the performance is assumed to reflect one or
more latent attributes

Typical performance test: typifies a person – no correct answers. Use them to describe a person. Just
the way someone is. Personality, attitude, mental health

Maximum performance test: persons achievement. Someone needs to do their best. Correct and
incorrect answers. Intelligence, ability

Standardization
Test conditions are fixed. Conditions should be the same in application to person a and person b. test
material, instructions, administration procedure, score computing. Not only instructions are tricky, but
also with material. You cannot just rewrite items meanwhile. Score computing, more often computer
based therefore less sensitive to errors, but with observational instruments need very clear scoring for
example.
Aim: to ensure comparability of test performances between persons and test occasions.

Difficult to achieve perfect standardization. Train test leaders.

Specific aspects to standardize dependent on for example test or target population

Latent attribute: test aims to measure one or more from these. Verbal ability, arithmetic skills,
severity of depression
Attribute that cannot be seen directly. You need indicators for these so you need a test. Test score (X)
should reflect the latent attribute of interest (T, true score). Causal relationship between X and T.

,indicitates: 2 people differ in attribute so the test scores should as well, and other way around. But
there is always measurement error.

Item
Smallest test unit, on which a person is scored. Score can be same as a person response. Items can be
clustered together in a subtest or subscale as an independent part of a test and indicative of an
attribute.

Subtest (subscale or scale)
- Independent part of a test
- Indicative of an attribute
- Consists of various items

Example Bayley-III
Aims to assess the developmental level of young children 1 to 42 months. Individual standardized
assessment. Normed scores. Assessing the developmental level by playing. More observational
instrument. Used when there are concerns about child development or to diagnose developmental
delays in order to plan or evaluate interventions

7 subscales. 2 belong to other subscale language and 2 to motor.

Example of item construction: gross motor
Described when to receive points. Rating can be done consistently due to clear instructions

Example of item scoring: cognition scoring form

Example of item scores for number of children on 9 items in SPSS.

Test construction process

1 Define construct of interest → 2 develop test → 3 pilot studies for feedback (intuitive process so you
can adjust the test and repeat pilot until you are satisfied) → 4 data collection and analysis → 5
validation and norming. Check figure for interaction

1) Construct
Abstract and theoretical. Literature research. There are no golden standards for intelligence for
example. Often begins by people in practice finding they do not have a test proper enough to work
with for their aim and they want to construct a test. Takes a lot of time and money to construct a new
test.
Important to consider homogeneity (one construct, indicators fit together) and dimensionality (do I
want to measure 1 construct or a collection of different constructs that together tell me for example
about personality then you can have a multidimensional test)

In personality we do not say that there is 1 personality score. BIG 5 often used. 5 unidimensional
constructs with each different subtests. You cannot combine the 5 and say you have a high score on
personality, this singular score doesn’t mean anything.

2) Developing a test
Essential aspects

1. Measurement mode of the test. Do you want someone to measure performance themselves (self-
performance mode), fill out a test themselves (self-evaluation) or other evaluation mode (psychologist
evaluates)
2. Objectives of the test
3. Population and subpopulations of testees 4. Conceptual framework of the test

,5. Item response mode
6. Administration mode
7. Item writing

Measurement mode of the test
▪ self-performancemode
▪ self-evaluation mode
▪ other-evaluationmode

Example with different modes of administration SDQ
Brief behavioural screening questionnaire. Strengths and difficulties that children can encounter
questionnaire. Very broad and widely used instrument. For 3 to 16 year olds. Exists in several versions
to meet needs of researchers, clinicians and educationalists. Available in many languages. In primary
school sent to parents.

25 items on psychological attributes on 5 scales. Some more negative focused on difficulties and more
positive strength subscale prosocial behaviour. You can work with these 2 testscores or just with the 5
subscale scores.

Possible as self-report version or parent version. One of the main disadvantages of self-report is that
people can inflate their scores to appear more appealing.

Objectives of the test (what is its aim)
- Research or practice
- Individual or group level
- Description vs diagnosis vs decision making (when to start what treatment based on scores)
Different choices (so your aim) have consequences for norming and validation.

Population and subpopulations of testees
Be as specific as possible. Inclusion and exclusion criteria. Too broad causes more implications for
norm groups and their representativeness. Because you need norms for the entire population. How are
you gonna collect data for a population of which the definition is vague or too broad?
Like range of age, nationality specifics etc.

Administration mode
- Oral
- Paper and pencil
- Computerized
- Computerized adaptive test administration

Conceptual framework of the test
More specific than just definition: it helps to write items.

Typical performance: three broad classes of strategies
- intuitive: rational, prototypical
- deductive:
- construct method: use of theoretical framework (e.g. Koster et al)
- facet design method: conceptual analysis of the construct. You make it smaller and smaller
so you can narrow the kind of construct in the items.
- inductive: constructs to be measured cannot be defined beforehand, but are identified using
association measures (e.g., correlations).
- internal: associations among items
- external: associations between items and external criterion (predictive validity)

there is no theoretical foundation for personality tests as biggest critique.

, Example internal based strategy
16PF. Self-report measuring 16 primary traits based on factor analysis of variables describing a broad
range of actual behaviors. Create different clusters through factor analyses and look for labels for
them.

Factor analysis: to identify subgroups of variables.
- With high correlations within subgroups
- With low correlations between subgroups

Inductive strategy (?) is a useful approach to describe differences between individuals in personality
characteristics. But it does not reveal sources of or causes of differences in personality.

Response mode
- Many
- Frequently used scales
dichotomous = binary
ordinal polytomous: never/sometimes/often

Item writing
Book describes different concrete guidelines. Both for typical and maximum performance test items

In general:
- Each item represents one idea
- Be specific
- Use pos and neg formulated items
- Avoid expressions and jargon
- Consider the reading level of user
- Avoid the use of “not”, its confusing.

Example:
Do you like football is not a good question. Is it about watching? Or about practicing it?

So make sure people know what you mean through a pilot study

Pilot study
Check whether instructions and items are clear
Three types of studies
- Experts pilot: concept items are reviewed by experts for your construct that you measure
- Test takers pilot: concept items are administered to small test takers from your targed
population. Target population use is critical. Useful to use read aloud protocol (read items out
loud) or think out loud protocol (let them say what they think out loud)
- Raters pilot: yields important info to remove items, remove raters (bad at following
instructions bc they feel sorry for participant etc) and or improve training of raters. Focuses on
Interrater agreement and intrarater consistency.

Measures of agreement

- Interrater agreement: 2 different raters of same objects. Individuals or items
- Intrarater consistency: the same rater rates consistently over same objects over multiple times

Measures of agreement per scale type
- Nominal and ordinal with two categories: dichotomous, trichotomous, false -correct.
Measurement of agreement: kappa. Know kappa!!

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper leoniemanon. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €4,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 52510 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€4,49  13x  verkocht
  • (1)
In winkelwagen
Toegevoegd