Tentamen (uitwerkingen)

Using Syntactic Distributional Patterns for Data-Driven Answer Extraction from the Web

11 keer bekeken 0 keer verkocht

Vak
Using Syntactic Distributional Patterns

Instelling
Using Syntactic Distributional Patterns

The Answer Extractor System Once a Natural-Language query triggers our QA system (QA-SYSTEM), this is sent out to Google so to retrieve a small number of snippets (i.e., usually 30), which are then normalized and cleaned up of math symbols and html tags. Next, the system performs the query anal...

[Meer zien]

Voorbeeld 2 van de 11 pagina's

Bekijk voorbeeld

Geupload op 25 augustus 2024
Aantal pagina's 11
Geschreven in 2024/2025
Type Tentamen (uitwerkingen)
Bevat Vragen en antwoorden

using syntactic distributional patterns for data d
the answer extractor system once a natural languag

Instelling Using Syntactic Distributional Patterns
Vak Using Syntactic Distributional Patterns

Volgen

StudyCenter1

Lid sinds 1 jaar 178 documenten verkocht

€15,61

Toegevoegd

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Using Syntactic Distributional Patterns for
Data-Driven Answer Extraction from the Web

Alejandro Figueroa1 and John Atkinson2,⋆
1
Deutsches Forschungszentrum für Künstliche Intelligenz - DFKI,
Stuhlsatzenhausweg 3, D - 66123, Saarbrücken, Germany
2
Department of Computer Sciences, Universidad de Concepción, Concepción, Chile
alejandro@coli.uni-sb.de, atkinson@inf.udec.cl

Abstract. In this work, a data-driven approach for extracting answers
from web-snippets is presented. Answers are identified by matching con-
textual distributional patterns of the expected answer type(EAT) and
answer candidates. These distributional patterns are directly learnt from
previously annotated tuples {question, sentence, answer}, and the learn-
ing mechanism is based on the principles language acquisition. Results
shows that this linguistic motivated data-driven approach is encouraging.

Keywords: Natural Language Processing, Question Answering.

1 Introduction

The increase of the amount of information on the Web has led search engines
to deal with a huge amount of data as users have become retrievers of all sorts.
Nowadays, search engines are not only focusing on retrieving relevant documents
for a user’s particular request. They also provide other services (i.e., Group
Search, News Search, Glossary), hence the complexity of the request of the users
has addressed the research to Question Answering (QA) systems. These aim to
answer natural language (NL) questions prompted by users, by searching the
answer in a set of available documents on the Web. QA is a challenging task due
to the ambiguity of language and the complexity of the linguistic phenomena
that can be found in NL documents.
Typical questions to answer are those that look for name entities as answers
(i.e., locations, persons, dates, organizations). Nevertheless, QA systems are not
restricted to these kinds of questions. They also try to deal with more complex
ones that may require demanding reasoning tasks while the system is looking
for the answer [11].
Usually, QA systems start by analyzing the query [4,7] in order to determine
the EAT. The EAT allows the QA system to narrow the search space [8], while
it is ranking documents, sentences or sequences of words in which the answer is
⋆
This research is sponsored by FONDECYT, Chile under grant number 1040469 “Un
Modelo Evolucionario de Descubrimiento de Conocimiento Explicativo desde Textos
con Base Semantica con Implicaciones para el Analisis de Inteligencia.”

A. Gelbukh and C.A. Reyes-Garcia (Eds.): MICAI 2006, LNAI 4293, pp. 985–995, 2006.
c Springer-Verlag Berlin Heidelberg 2006

, 986 A. Figueroa and J. Atkinson

supposed to be. This set of likely answers is called answer candidates. In this
last step of the zooming process, the QA system must decide which are the most
suitable answers for the triggering query. This extraction and ranking of answer
candidates is traditionally based on [6,7,8] frequency counting, pattern match-
ing and detecting diﬀerent orderings of query words, called paraphrases. Answer
extraction modules attempt to take advantage of the redundancy provided by
diﬀerent information sources. This redundancy increases significantly the prob-
ability of finding a paraphrase, in which the answer can be readily identified.
Normally, QA systems extract these paraphrases at the sentence level [10]. The
rules for identifying paraphrases can manually be written or automatically learnt
[6,10], and they can consist of pre-parsed trees [10], or simple string based ma-
nipulations [6]. In general, paraphrases are learnt by retrieving sentences that
contain preciously annotated question-answer pairs. For example in [10], anchor
terms (i.e., “Lennon 1980”) are sent to the web, in order to retrieve sentences
that contain query and answer terms. Then, patterns are extracted from this
set of sentences with their likelihood being proportional to their redundancy on
the Web[7]. In most cases, the new set of retrieved sentences is matched with
paraphrases in order to extract new answers. At the same time, a huge set of
paraphrases [6] decreases considerably the need of deep linguistic processing like:
anaphora or synonym resolution. In some cases, it reduces the extraction to a
pattern matching by means of regular expressions[10]. As a result, strategies
based on paraphrases tend to perform better when questions aim for a name
entity as an answer: Locations, Names, Organizations. But, they perform poorly
when they aim for Noun Phrases[10].
Due to the huge amount of paraphrases, statistical methods are also used for
extracting answers. In [5], a strategy for answering questions is learnt directly
from data. This strategy conceives the answer extraction problem as a binary
classification problem in which text snippets are labelled as correct or incorrect.
The classifier is based on a set of features from lexical n-grams to parse trees.
The major problem of statistical-based approaches is that, frequently, they get
inexact answers, which usually consist of substrings of the answer, the answer
surrounded by some context words, or strings highly closed to answers.
Nevertheless, it is still unclear how each diﬀerent technique contributes to deal
with the linguistic phenomena that QA systems face while searching for the an-
swer. One solution for this may involve a trade-oﬀ between the implementation
of rule-based and easy re-trainable data-driven systems. In [10], a strategy for
combining the output of diﬀerent kinds of answer extractors is introduced. This
re-ranker is based on a Maximum Entropy Linear Classifier, which was trained
on a set of 48 diﬀerent types of features such as ranking in the answer extraction
modules, redundancy, negative feedback, etc. Results show that a good strategy
for combing answer extractors, based mainly on diﬀerent strategies, can signifi-
cantly improve the overall performance of QA systems [11].
Strategies based on paraphrases aim to find a re-writing of the query within
the text where the answer is easily identified. Their main drawback is that when-
ever the answer is in an context, which do not match any re-writing rule, it will

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper StudyCenter1. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €15,61. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 81989 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Tentamen (uitwerkingen)

Using Syntactic Distributional Patterns for Data-Driven Answer Extraction from the Web

Document informatie

Onderwerpen

Geschreven voor

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?