Slaughter Semantic Representation of Health Texts
Page 1 of 42
Slaughter, L. A., Soergel, D., & Rindflesch, T. C. (January 01, 2006). Semantic representation of consumer questions and physician
answers. International Journal of Medical Informatics, 75, 7, 513-29. Available at
http://www.sciencedirect.com/science/article/pii/S1386505605001401
Semantic Representation of Consumer Questions and Physician Provided Answers
Authors: Laura A. Slaughter, PhDa
Dagobert Soergel, PhDb
Thomas C. Rindflesch, PhDc
a
Affiliation: Department of Biomedical Informatics, Columbia University, New York, NY
b
College of Information Studies, University of Maryland, College Park, MD
c
National Library of Medicine, Bethesda, MD
Corresponding author:
Laura A. Slaughter, PhD
Department of Biomedical Informatics
Columbia University
622 West 168th Street, VC-5
New York, New York 10032-3720 USA
Office: 212-305-6940
FAX: 212-305-3302
E-mail address: laura.slaughter@dbmi.columbia.edu
,Slaughter Semantic Representation of Health Texts
Page 2 of 42
ABSTRACT
Objective: The aim of this study was to identify the semantic relationships in health consumers’
questions, physicians’ answers and between questions and answers to lay the foundation for
intelligent systems that support health consumers in finding and understanding medical
information
Methods: We manually identified semantic relationship instances within twelve question-answer
pairs from Ask-the-Physician Web sites based on the relationship types and structure of the
Unified Medical Language System (UMLS) Semantic Network as the starter relationship
inventory. We calculated the frequency of occurrence of each semantic relationship class.
Conceptual graphs [XXX or concept maps] were generated, joining concepts together through
the semantic relationships identified. We then analyzed whether representations of physician’s
answers exactly match the form of the question representations. Lastly, we examined
characteristics of physician answer conceptual graphs.
Results: We identified 97 relationship instances in the questions and 334 relationship instances
in the answers. The most frequently identified relationship type in both questions and answers is
brings_about (causal). We examined the relationship instances in the answers that contain a
concept also expressed in the question and found that they most often use the following
relationship types: brings_about, isa, co_occurs_with, diagnoses, and treats. 74% of the
relationship instances identified in the answers did not contain a concept expressed in the
question. For each answer, these relationship instances formed large graphs that contain a “focal
point” concept that usually occurs also in the question [XXX make sure this is correct] having
numerous semantic relationships connecting to concepts not expressed in question.
,Slaughter Semantic Representation of Health Texts
Page 3 of 42
Conclusion: We observed that the interconnecting patterns in semantic representations of
questions and answers possess specific characteristics that can be exploited for improvement of
retrieval strategy. For example, we determined that both consumers and physicians often express
causative relationships and these play a key role in leading to further related concepts.
Keywords: Semantic Processing, Public Health, Unified Medical Language System, Information
Retrieval, Natural Language Processing
, Slaughter Semantic Representation of Health Texts
Page 4 of 42
1. INTRODUCTION
Recent research in medical information processing has focused on health care consumers.
These users often experience frustration while seeking online information [1,2,3], due to their
lack of understanding of medical concepts and unfamiliarity with effective search strategies.
Semantic relationships provide a way of addressing these issues. Semantic information can guide
the user by suggesting concepts not overtly expressed in an initial query. For example, imagine
that a user asks an online question-answering system whether exercise helps prevent osteoporosis
and, after receiving an initial answer, wishes to obtain more information. The semantic
relationship prevents in the proposition representing the question, namely “exercise prevents
osteoporosis”, can support this effort; prevents might be used with osteoporosis to determine
additional ways of preventing this disorder.
This paper presents an analysis of semantic relationships that were manually extracted
from questions asked by health consumers and from the answers provided by physicians as found
on Ask-a-Doctor Web sites. The Semantic Network from the Unified Medical Language System
(UMLS) [4,5] served as version 0 of an inventory of semantic relationship types which was
modified in the course of coding relationship types identified in the health consumer texts.
A simple frequency analysis of occurrence of semantic relationships in all texts leads into
an investigation of patterns within questions and within answers and finally patterns of semantic
relationships that connect the two. Patterns of semantic relationships within answers are of
interest since they provide a useful start for constructing query strategies involving semantic
information. The implied relationships linking questions to answers provide a basis for
identifying external knowledge necessary to understand answers.