Production and evaluation of (multimodal) answers to medical questions
Charlotte van Hooijdonka, Emiel Krahmerb, Alfons Maesb, Mariët Theunec, Wauter Bosmaa
a
VU University Amsterdam
b
Tilburg University
c
University of Twente
CMJ.VAN.HOOIJDONK@LET.VU.NL
This paper describes two experiments carried out to investigate the production and evaluation of
multimodal answer presentations in the context of a medical question answering system. In a production
experiment participants had to produce answers to different types of questions. The results show that about
one in four produced answers using multiple media. In an evaluation experiment, users had to evaluate
different types of multimodal answer presentations. Answers with an informative visual were evaluated as
more informative and more attractive than answers with a mere illustrative visual.
Keywords: Multimodal information presentation, cognitive engineering, document design, visual
media
Introduction
This paper investigates the production and evaluation of multimodal answer presentations in a
medical question answering system (QA). Early QA research concentrated on textual answers to
factoid questions (i.e., What is the capital of France? Paris). Currently, there is a growing interest in
the generation of multimodal answers to more complex questions. This raises questions about what
combinations of modalities are most appropriate given particular types of questions.
Multimodal information presentation has been studied in various research fields with various
outcomes. For example, research in cognitive and educational psychology focused on how
multimodal presentations affect the users’ understanding, recall and processing efficiency of the
presented material (e.g., Carney & Levin, 2002; Mayer, 2005; Tversky, Morrison, Bétrancourt, 2002).
Guidelines resulting from this research often relate to specific types of information used in specific
domains, for example cause and effect chains which explain how systems work (e.g., Mayer &
Moreno, 2002) or procedural information (e.g., Michas and Berry, 2000). Research in language
generation research has tried to classify and characterize modalities, information types, and the
matches between them. For example, Bernsen (1994) proposed a taxonomy of generic unimodalities
consisting of various features. Other scholars studied the so-called media allocation problem (i.e.,
How to determine which information to allocate to which medium) and tried to identify which factors
play a role in media allocation (Arens, Hovy and Vossers, 1993).
In short, attempts have been made to generate optimal multimodal information presentations
resulting in several modality guidelines, frameworks, and taxonomies. Still needed is information
about people’s modality preferences in producing and evaluating presentations. Therefore, we carried
out two experiments following the approach of Heiser, Phan, Agrawala, Tversky and Hanrahan
(2004), where people are asked to produce information presentations (e.g., assembly instructions),
which are then rated by others.
Experiment I: Production
Participants and stimuli
111 students of Tilburg University participated for course credits (65 female, 19-33 years old).
Participants were given one of four sets of eight medical questions for which the answers could be
found on the Internet. Four were randomly chosen from one hundred medical questions formulated to
, test the IMIX QA system. Of the remaining four questions, two were definition questions (e.g., “What
does ADHD stand for?”) and two were procedural questions (e.g., “How to apply a sling to the left
arm?”). Participants had to give two answers per question, a brief and an extended answer, using
whatever combinations of modalities they wanted. They were specifically asked to present the
answers as they would prefer to find them in present day digital information environment. Questions
and answers had to be presented in a fixed format in PowerPoint™ with areas for the question
(‘vraag’) and the answer (‘antwoord’). They were acquainted with inserting different types of objects
in PowerPoint.
Coding system and procedure
Each answer was coded on the presence of visual media (photos, graphics, and animations) and on the
function of these visual media in relation to the text, loosely based on Carney & Levin (2002), i.e.,
decorative, representational, or informative. In total 1775 answers were collected (111 participants × 8
questions × 2 answers, minus one missing answer). Six analysts independently coded the same set of
111 answers. Subsequently, every analyst independently coded a part of the total corpus
(approximately 300 answers). Calculations of Cohen’s showed that the analysts almost perfectly
agreed in judging the occurrence of photos ( = .81), graphics ( = .83), and animations ( = .92). An
almost perfect agreement was reached in assigning the function of the visual media ( = .83).
Results
Analysis of the complete corpus of coded answer presentations showed that almost one in four
answers contained one or more visual media, of which graphics were most frequent (14,9%) and
animations were least frequent (3,8%). The presence of photos was between these two (8,6%).
Table 1: Percentages of visual media functions related to answer length (n = 442)
Brief answers (n = 101) Extended answers (n = 341)
Decorative visuals (n = 70) 26.7 12.6
Representational visuals (n = 201) 20.8 52.8
Informative visuals (n = 171) 52.5 34.6
Table 1 shows that visual media occurred significantly more often with extended answers ( ² (1) =
173.89, p< .001). Moreover, the distribution of the functions of visual media differed significantly
over answer length ( 2 (2) = 33.79, p< .001). Informative visuals occurred more often in brief
answers, whereas representational visuals occurred more often in extended answers.
Table 2: Percentages of the functional types of visual media related to definition and procedural
questions (n =271)
Definition questions (n = 91) Procedural questions (n = 180)
Decorative (n = 27) 19.8 5.0
Representational (n = 129) 53.8 44.4
Informative (n = 115) 26.4 50.6
Table 2 shows that visual media differed over question types as well. The analysis of the two
definition and two procedural questions (n= 887, 271 of which contained visual media) showed that
visual media were more frequent with procedural questions than definition questions ( ² (1) = 29.23,
p< .001). Moreover, the distribution of the functions of visual media differed ( ² (2) = 22.70, p< .001).
Decorative visuals are overrepresented in answers to definition questions, and underrepresented in
answers to procedural questions; informative visuals were underrepresented in answers to definition
questions.