Summary Measurement Theory & Assessment 2
Furr CH1: Psychometrics and the importance of psychological measurement
Psychological measurement is of great importance. It can even have life or death
consequences. For example, in several states of the US, a person with a significantly
subaverage intelligence cannot be sentenced to death. Significantly subaverage intellectual
functioning is indicated by a score of 70 or below.
In psychology, observation measures may be used to measure a specific type of
behaviour. Much more commonly, however, behavioural scientists observe human behaviour
as a way of assessing unobservable psychological attributes such as intelligence,
depression, knowledge, aptitude, extroversion, or ability. In such cases, they identify some
type of observable behaviour that they think represents the particular unobservable
psychological attribute, state, or process. In most but not all cases, psychologists develop
psychological tests as a way to sample the behaviour that they think reflects the underlying
psychological attribute. This type of measurement requires that we make an inference that an
observable behaviour is systematically related to an unobservable mental attribute. When
measuring an unobservable attribute, we must be able to assume that it is more than a figment
of our imagination. The theoretical concepts such as working memory or intelligence that we
believe actually exist in people are called hypothetical constructs or latent variables. They
are theoretical psychological characteristics, attributes, processes, or states that cannot be
directly observed. The operations or procedures that we use to measure these hypothetical
constructs are called operational definitions. For example, working memory is a
hypothetical construct that is measured by the number of digits that one can recall from a
string of digits. The number of digits that one can recall is the operational definition.
According to Cronbach, a psychological test is a systematic procedure for comparing the
behaviour of two or more people. Psychological tests include three important components:
• Tests involve behavioural samples of some kind.
• The behavioural samples must be collected in some systematic way.
• The purpose of the tests is to compare the behaviours of two or more people.
Comparing behaviours can be interindividual or between different people or it can be
intraindividual or in the same person at different points in time or under different
circumstances.
There are many different types of tests. Some tests vary from each other in terms of
content; achievement tests, aptitude tests, intelligence tests, personality tests, attitude
surveys. Tests can also vary with regard to the type of response required; open-ended tests,
closed-ended tests. Another type of variation is in the administration of tests; individually
administered tests, tests administered to groups of people.
Yet another common distinction is between speeded tests and power tests. Speeded tests
are time-limited tests in which the score is based on the number of questions answered within
the time-limit. Power tests are not time-limited and are more used to measure a person’s
maximum capacity rather than their speed.
, Psychological tests are also often categorised as either criterion referenced or norm
referenced. Criterion-referenced tests are often seen in settings in which a person’s skill
level is being measured. A cut-off score is established as a criterion and is used to sort people
into two groups; people who pass the test and people who do not. Norm-referenced tests are
usually used to understand how a person compares with other people. This is done by
comparing the person’s test score with scores from a reference sample or normative sample
that is representative of some well-defined population. In practice, however, the distinction
between norm-referenced and criterion-referenced tests is often blurred. Criterion-
referenced tests are always normed in some sense; the cut-off score is usually based on a
normative sample. The distinction between criterion- and norm-referenced tests is further
blurred when scores from norm-referenced tests are used as cut-off scores; institutions for
higher education in the US may have a minimum SAT (norm-referenced test) score
requirement for admission.
Just as psychological tests are designed to measure psychological attributes of people,
psychometrics is the science concerned with evaluating the attributes of psychological tests.
The most important analogy is that just as psychological tests are about theoretical attributes
(hypothetical constructs) of people, psychometrics is about theoretical attributes (hypothetical
constructs) of tests. Three of these attributes are of particular interest:
• The type of information generated by the use of psychological tests (scores).
• The reliability of data from psychological tests.
• Issues concerning the validity of data obtained from psychological tests.
Despite the many similarities among the science, measurement in the behavioural sciences
has special challenges that do not exist in the physical sciences. One of these challenges is
related to the complexity of psychological phenomena; notions such as intelligence, self-
esteem, anxiety, and depression have many different aspects to them, making them hard to
measure.
Participant reactivity is another challenge in the behavioural sciences. Because
psychologists are measuring psychological characteristics of people who are conscious and
generally know that they are being measured, the act of measurement can itself influence the
psychological state or process being measured. Participant reactivity can rake many forms. In
research situations, some participants may try to figure out the researcher’s purpose for a
study, changing their behaviour to accommodate the researcher (demand characteristics).
Other forms are social desirability and malingering. In each case, the validity of the
measure is compromised because the person’s true psychological characteristic is hidden by a
temporary motivation or state that is a reaction to the very act of being measured.
Another challenge is that the people collecting the behavioural data may be biased.
Measurement quality is compromised when observers allow expectations and biases to
distort their observations. People who collect behavioural data are usually not consciously
cheating but even subtle and unintended biases can have effects.
Psychologists also tend to rely on composite scores when measuring psychological
attributes (a personality test may have at least 10 questions measuring extraversion and the
composite determines the eventual score on extraversion). Although composite scores do
have their benefits, several issues complicate their use and evaluation. The physical sciences
are less likely to rely on composite scores.
, Score sensitivity is another challenge and refers to the ability of a measure to discriminate
adequately between meaningful amounts or units of the dimension that is being measured.
A final challenge is the apparent lack of awareness of important psychometric
information. Practitioners seem to regularly conduct tests with little or no regard for the
psychometric quality of the tests. The result is a regular use of poorly constructed tests.
Furr CH2: Scaling
Psychological measurement can be seen as a process through which numbers are assigned to
represent the quantities of psychological attributes. The measurement process succeeds if the
numbers assigned to an attribute reflects the actual amounts of that attribute. The standard
definition of measurement is as follows: measurement is the assignment of numerals to
objects or events according to rules.
Three important numerical properties exist; identity, order, and quantity. The most
fundamental form of measurement is the ability to reflect sameness versus differentness.
The simplest measurements are those that differentiate between categories of people. People
within a certain category are similar to each other and different from the people in
another category. Certain rules must be followed when sorting people into categories. The
first and most straightforward rule is that, to establish a category, the people within a
category must satisfy the property of identity. All people within a particular category must
be identical with respect to the feature reflected by the category. Second, the categories must
be mutually exclusive; if a person is classified in one category, they cannot simultaneously
be classified in another category. Third, the categories must be exhaustive; everyone in a
certain population must fit in one of the categories, no one can be left out. At this level,
numerals serve simply as labels of categories and do not have a true mathematical value.
The property if order conveys greater information. When numerals have only the property
of identity, they convey information about whether two individuals are similar or different
but nothing more. In contrast, when numerals have the property of order, they convey
information about the relative amount of an attribute that people possess. However, the
numerals still serve as labels and have no real mathematical value.
The property of quantity conveys the most information. Numerals that have the property
of order convey information about which of two individuals has a higher level of a
psychological attribute, but they convey no information about the exact amounts of that
attribute. In contrast, when numerals have the property of quantity, they provide information
about the magnitude of differences between people. At this level, numerals reflect real
numbers. Units of measurement are standardised quantities; the size of a unit will be
determined by some convention. For example, 1 degree Celsius is defined in terms of 1/100 th
of the difference between the temperature at which ice melts and the temperature at which
water boils. When psychologists use psychological tests to measure psychological attributes,
they often assume that the test scores have the property of quantity. However, this often
might not be a reasonable assumption.
, The number zero has at least two potential meanings. In one possible meaning, zero reflects a
state in which an attribute of an object or event simply has no existence. Zero in this context
is referred to as absolute zero. The second possible meaning of zero is to view it as an
arbitrary quantity of an attribute. A zero of this type is called a relative or arbitrary zero.
For example, a temperature of 0 on the Celsius scale represents the melting point of ice, but it
does not represent the absence of temperature. The mean of a distribution of z scores will
always be 0. Zero in this case also represents an arbitrary or relative zero.
Interpretation of psychological test scores will be influenced by the type of zero
associated with a test. If we can assume that a test has an absolute zero, then we can feel
comfortable performing the arithmetic operations of multiplication and division on the test
scores. On the other hand, if a test has a relative zero point, we would probably want to
restrict arithmetical operations on the scores to addition and subtraction. As a matter of
evaluation, it is important to know what zero means; it could either mean that a person who
scored 0 on a test has none of the attribute that was being measured or that the person might
not have had a measurable amount of the attribute.
The property of quantity requires that units of measurement be clearly defined.
Quantitative measurement depends on our ability to count these units. If people want to
measure the length of a piece of wood, then they will probably use some type of tape marked
off in units of inches or centimetres. The length of the piece of wood is determined by
counting the number of these units from one end to the other. In this case, the inches or
centimetres are the units of measurement. Arbitrariness is an important concept in
understanding units of measurement, and it distinguishes between different kinds of
measurement units. There are three ways in which a measurement unit might be arbitrary.
First, the specific size of a unit might be arbitrary (why is one specific amount of weight
called a pound?). Second, some units of measurement are not tied to any one type of object
(a ruler can be used to measure the length of a piece of wood, a person, or any other object
with a spatial extent). Third, when they take a physical form, some units of measurement can
be used to measure different features of objects (something can be 7 pieces of wood in
length and 7 pieces of wood in weight). In contrast to many physical measures, most
psychological units of measurement are generally arbitrary only in the sense that the
specific size of a unit might be arbitrary.
To add up counts of responses in psychological tests, the units must all be of the same size.
For example, if you try to measure a piece of wood with objects varying in length, you will
get different lengths for the same piece of wood. The same goes for psychological testing.
Because the units are not constant in magnitude, the entire measurement system is flawed. It
is also important for the units of measurement not to change in magnitude as the conditions
of measurement change (e.g., time of day). Consider the example of this phenomenon in
psychological testing on the next page.