Summary of the book A CONCEPTUAL INTRODUCTION TO PSYCHOMETRICS by Mellenbergh, used in the course Test Construction at Rijksuniversiteit Groningen. Chapters 1, 2, 3, 4, 5, 7, 8, 11, 12, 14.2‐14.3 and 15
PART I - TEST DEVELOPMENT
Chapter 1 Introduction
1.1 Origins of psychometrics
Simon & Binet published the first modern test in 1905, constructed to differentiate between
normal and mentally retarded children. According to DuBois (1970) modern testing consists
of three roots:
1. Civil service examinations (tests voor overheidsfuncties). The first documentations of
these examinations stem from 1115BC in China, and were still used until 1905.
Voltaire introduced this system to France at the end of the 18th century. The system
was also introduced in England (1833) and the United States (1870).
2. Assessment of academic achievement in universities and schools. Earliest
examinations of this sort were probably in 1219 at the university of Bologna, Italy. In
Belgium examinees of the university of Leuven were graded in four categories (1 =
honor; 2 = satisfactory; 3 = charity pass and 4 = failure). In 1599 the Jesuit order
published a document that contained rules for the conduct of examination in schools.
3. Study of individual differences in behaviour in the 19th century. Examples of these
measures are the tests of Kraepelin, Galton and Cattell.
With the rise of psychometric testing came the development of psychometrical theories on
e.g. reliability and testing error. Spearman founded and published the classical theory for test
scores (CTT) in 1904. This theory was succeeded by the modern item response theories (IRT)
(e.g. Lord, 1980).
1.2 Test definitions
Test: an instrument for the measurement of a person’s maximum or typical performance under
standardized conditions, where the performance is assumed to reflect one or more latent
attributes: a test is in the first place a measurement tool. Other uses (e.g. performance
prediction) are applications of the test. Performance is what a test is measuring and Cronbach
distinguished two types of performance tests:
1. Maximum performance tests (e.g. intelligence and achievement tests), subdivided
according to the type of performance and the latent attribute that is measured.
, 2. Typical performance tests, in which responses are not correct or incorrect but typify
the examinee (e.g. personality and attitude tests). Subdivided according to the attribute
that is measured.
Standardization must be guaranteed because test scores are only comparable when the scores
are obtained under standardized conditions. A test measures latent attributes. This means that
although latent attributes are not observable, the test performance that reflects them is
observable.
The difference between a test and a survey: survey items do not have to reflect latent
attributes (“what happened to you this morning?”). Items can however be used to form a
measurement index (e.g. personal index of stress level by combining a list of negative life
events that recently occurred: the events cause stress, not the other way around).
Subtest: is an independent part of a test. For example the SAT test consists of the subtests
SAT-M and SAT-V.
Item: the smallest possible unit of a (sub)test. A test consists of n items.
Dimensionality: the number of latent attributes of a test define the dimensionality of the
test, which effects test performance. A test that measures one latent attribute is called a
unidimensional test, if more than one attribute is measured the test is multidimensional (two-
dimensional, three-dimensional, etc.).
1.3 Test types
Psychological and educational tests are divided into two categories:
1. Mental test: consists of cognitive tasks
2. Physical tests: consists of instruments to take somatic or physiological measurements
(e.g. heart rate, brain activity)
In a maximum performance test (MPT) the measure is maximum by the accuracy of the
performance or by the speed of the performance. A pure power test is about accuracy, and
there is no time limit or ample time to finish the items. Time limited power tests are accuracy
tests constructed in a way that most test takers have enough time to solve the items, and only a
few would need extra time to finish. The opposite of a power test is a speed test, which
measures the time a person needs to solve an item. A second distinction between MPT’s is the
attribute they measure:
1. Ability test (is sometimes called aptitude test): measures performance in an area that
was not specifically taught or trained (e.g. intelligence or dexterity (=behendigheid)).
According to Cronbach an aptitude test is an ability tests that predicts future
competences
, 2. Achievement test: measures explicitly taught or trained performance (e.g. the exam of
this course)
A typical performance test (TPT) measures typical behaviour. Often called questionnaires or
inventories. Three main types of TPT’s:
1. Personality tests
2. Interest inventories
3. Attitude questionnaires
Another distinction is whether the test taker is also the one that is being measured (self-report
test) or whether the test taker is not the one being measured (observation test, e.g. a mother
filling out items about her child).
PART II – TEST DEVELOPMENT
Chapter 2 Developing maximum performance tests
Test construction is preceded by the making of a plan, consisting of the following essential
elements (can be specified in this order, simultaneously or in a different order than below):
1. Construct of interest
2. Measurement mode of the test
3. Objectives of the test
4. Population and subpopulations the test should be applied to
5. Conceptual framework of the test
6. Response mode of the items
7. Administration mode of the test
2.1 Construct of interest
This specifies the latent variable(s) that the test wants to measure, defining the constructs of
interest is a good way to start the development of a test. Latent variable is general whereas a
construct is the substantive interpretation of a latent variable. The latent variable/construct
affects the test takers’ item responses and test scores. Constructs vary in different ways:
1. In content (mental abilities, psychomotor skills, or physical abilities)
2. In scope (breedte) (e.g. general intelligence to specific multiplication skills)
3. From educational (achievement tests for algebra) to psychological variables
At the start of the test development a good definition of the construct(s) should be given.
2.2 Measurement modes for MPT’s
Self-performance mode: most common measurement mode for MPT’s is to ask someone to
take a mental or physical test.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller jikkej. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $4.32. You're not tied to anything after your purchase.