Journal of Mathematical Behavior
21 (2002) 203–224
Exploring test performance in mathematics: the questions
children’s answers raise
Elham Kazemi
University of Washington, 122 Miller, P.O. Box 353600, Seattle, WA 98195-3600, USA
Abstract
This article investigates children’s mathematical performance on test items, specifically multiple-choice questions.
Using interviews with 90 fourth-graders, it reveals why particular kinds of items are more or less difficult for students.
By using multiple-choice questions and juxtaposing them with similar open-ended problems, the findings underscore
the costs of not attending to children’s thinking in designing and interpreting problems. The data from this study
suggest that when answering multiple-choice questions, students’ attention is drawn to the choices themselves.
They do not necessarily think through the problem first and thus make their choices based on (often incorrect)
generalizations they have made about problem-solving. Whether students answered a multiple-choice question or a
similar open-ended problem first impacted both their performance and their reasoning. Moreover, children draw on
their life experiences when the context of the problem is salient, thus ignoring important parameters of the stated
problem. Implications for investigating children’s thinking, instruction, and test design are discussed.
© 2002 Elsevier Science Inc. All rights reserved.
Keywords: Children’s thinking; Mathematical performance; Interpreting problems; Testing
1. Introduction
Much research in mathematics education focuses on understanding children’s thinking. The central
concerns of this body of work have been to understand what mathematical knowledge children need to
know, how children come to build sophisticated understandings, and how their reasoning is shaped by the
mathematical experiences they have in and out of school (e.g., Ball & Bass, 2000; Carpenter, Fennema,
Franke, Levi, Empson, 1999; Cobb, Boufi, McClain, & Whitenack, 1997; Lampert, 1990; Lave, 1988;
Saxe, 1990). One area that needs further attention is how children make sense of the assessment tasks
they typically encounter at the end of the school year. This article examines how children’s under-
standing interacts with the way test items are structured. Specifically, the study examines the reasons
E-mail address: ekazemi@u.washington.edu (E. Kazemi).
0732-3123/02/$ – see front matter © 2002 Elsevier Science Inc. All rights reserved.
PII: S 0 7 3 2 - 3 1 2 3 ( 0 2 ) 0 0 1 1 8 - 9
,204 E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224
children articulate for their responses to multiple-choice questions that appear on end-of-the-year assess-
ments. As schools increasingly rely on test scores to make policy, promotion, and instructional decisions
(Linn, 2000; National Research Council, 1999), we need to understand more about how students interpret
items and why they choose the answers they do. By using multiple-choice questions and juxtaposing them
with similar open-ended problems, this study provides further evidence for the costs of not attending to
children’s thinking in designing and interpreting even seemingly straightforward tasks.
This study builds on two different literatures. The first body of work has focused on understand-
ing student reasoning in assessment situations, and the second includes research on understanding the
culturally-specific knowledge that children might use in making sense of particular problem-solving
contexts.
One body of work has arisen in response to highly cited examples of students’ test performance that have
been used to show areas in which students lack understanding. Results from the National Assessment of
Educational Progress, for example, have repeatedly shown that students have difficulty with non-routine
problems that require them to analyze problems, not just solve them. For example, students’ solutions
to the “bus problem” from the third NAEP raised alarms about students’ understanding of division. The
problem read, “An army bus holds 36 soldiers. If 1128 soldiers are being bused to their training site, how
many buses are needed?” Only 24% of the national sample of students taking the test solved this problem
correctly (NAEP, 1983). Others did not interpret the remainder to indicate that another partially filled bus
would be needed while some suggested that a minivan or smaller bus could be used for the remaining
soldiers who would not fill a bus.
Because of the nature of such wide-scale testing data, especially in multiple-choice formats, researchers
interested in student learning do not have access to students’ own explanations of their answers in these
situations. This fact has led to a set of studies linked to the QUASAR1 project, which examined mid-
dle schoolers’ mathematical achievement (Lane & Silver, 1995). In creating the QUASAR Cognitive
Assessment Instrument (QCAI) used to measure middle school students’ capacity for higher-level rea-
soning processes, researchers have carefully studied items to see whether they elicit students’ best rea-
soning (Lane, 1993; Magone, Cai, Silver, & Wang, 1994). This work included the creation of open-ended
versions of multiple-choice problems in order to more fully understand how students made sense of
the problem situations (Cai & Silver, 1995; Lane, 1993; Santel-Parke & Cai, 1997; Silver, Shapiro, &
Deutsch, 1993). For example, a study using division-with-remainder problems (similar to the bus problem
described above) in open-ended format found that a higher percentage of students than indicated by NAEP
results (45% of a sample of about 200 middle school students) could provide an appropriate interpreta-
tion to their computational answer if given a chance to explain their reasoning (Silver et al., 1993). The
QUASAR studies have also compared the influence of different kinds of prompts on students’ responses.
The findings show that prompts, which do not explicitly direct students to pay attention to mathemati-
cal aspects of the task, can underestimate student understanding. Similarly, students’ interpretations of
familiar contexts may inadvertently interfere with their ability to use the reasoning that particular items
intend (see Santel-Parke & Cai, 1997, for examples).
1
QUASAR (Quantitative Understanding: Amplifying Student Achievement and Reasoning) was a reform project (1989–1995)
whose goal was to develop and implement middle school curriculum in economically disadvantaged communities. The curriculum
centered on developing students’ reasoning, problem-solving, and communication skills in order to deepen their mathematical
understandings. The project was directed by Edward A. Silver and headquartered at the Learning Research and Development
Center at the University of Pittsburgh (see Silver, Smith, & Nelson, 1995 for an overview of the project).
, E. Kazemi / Journal of Mathematical Behavior 21 (2002) 203–224 205
Like QUASAR, researchers who contributed to the development of a middle school curriculum,
Mathematics in Context (Romberg, 1998), have built assessments that take into account detailed research
on students’ reasoning. In a series of articles about test development, Van den Heuvel-Panhuizen and her
colleagues underscore the importance of students’ active participation in assessment design. For example,
they write about the value of understanding what makes certain problems more and less difficult for stu-
dents, whether students can articulate why certain problems are more or less difficult, the importance of
allowing students to generate problems to use in assessments, and the role that contexts play in children’s
problem-solving efforts (Van den Heuvel-Panhuizen, 1994; Van den Heuvel-Panhuizen & Gravemeijer,
1993; Van den Heuvel-Panhuizen, Middleton, & Streefland, 1995). Taken together, studies stemming
from QUASAR and the development of Mathematics in Context provide evidence for the importance of
examining how children interpret assessment items.
To understand diversity in children’s sense-making strategies on tests, the second body of work that
informs this study has shown how particular questions require culturally-specific knowledge. These stud-
ies of children’s test performance have been concerned with documenting cultural bias in test language
(McNeil, 2000; Smith & Fey, 2000; Solano-Flores & Nelson-Barber, 2001; Stiff & Harvey, 1988). For
example, Tate (1994) demonstrated how African American students’ varied solutions to a problem re-
flected the life experiences they brought to bear in solving the problem. The problem read, “It costs $1.50
each way to ride the bus between home and work. The weekly pass is $16.00. Which is a better deal,
paying the daily fare or buying the weekly pass?” Students who picked the weekly pass were marked
wrong, but these students reasoned that the bus rider could use the pass to travel to multiple jobs and on
weekends or could share it with family members (see also Ladson-Billings, 1995). I use this example
and the body of work it represents as evidence that students actively make sense of the problems they
encounter and construct a range of valid mathematical interpretations based on their everyday experiences
(see also Cooper & Dunne, 2000 and Solano-Flores & Nelson-Barber, 2001 for additional examples from
mathematics and science assessments). The goal of this article, however, is not to make ethnic-specific or
class-related claims about students’ problem-solving strategies. Instead, I seek to demonstrate the range
of knowledge and interpretations that a diverse group of students evoked in their problem-solving efforts.
This study was motivated in part by my work with teachers in which I have studied how teachers
understand and make use of children’s mathematical thinking in making pedagogical and curricular
decisions (Franke & Kazemi, 2001; Kazemi & Franke, 2000). My work has taken place in schools where
students have not historically performed well on state or national assessments. Many teachers with whom
I work feel compelled, near the end of the school year, to practice an array of discrete mathematical
procedures they anticipate will be covered on the end-of-the-year tests. What is particularly striking is
that some teachers feel they must put aside their efforts to elicit children’s thinking and instead perform
triage on the skills they have not yet “covered.” Their anxieties about coverage are heightened because
of the sheer volume of distinct skills students are expected to master at each grade. Observing teachers
turn to hours of computational practice, I wondered, instead what we might learn if we asked students to
tell us how they approached seemingly straightforward problems that they encounter on tests.
Using interview data of 90 fourth-graders, this study explores why particular kinds of items are more
or less difficult for students. In selecting the problems for this study (see Table 1), I drew from my
experience observing children solve mathematical problems and from my knowledge of research on
children’s thinking about number (e.g., Carpenter et al., 1999). Theoretically, this study is informed by
a situated view of learning. From this perspective, testing is one kind of practice with its own norms and
rules (Miller-Jones, 1989; Rogoff, 1997; Wertsch, 1991). Children’s participation in a testing situation