100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Summary Natural Language Generation Exam Notes $9.09
Add to cart

Summary

Summary Natural Language Generation Exam Notes

 10 views  0 purchase
  • Course
  • Institution

This document contains notes and summaries covering the content of the course Natural Language Generation within the Artificial Intelligence Master at Utrecht University.

Preview 2 out of 7  pages

  • November 22, 2022
  • 7
  • 2022/2023
  • Summary
avatar-seller
Examples of NLG applications
• Weather forecast, road maintenance, automatic journalism, reporting on sports results, textual feedback on
health, agents and dialogue systems, financial reporting for companies, image labelling
• NLG can produce higher-quality texts than mail-merge, especially when there’s lots of variation in output texts
• NLG systems can be easier to update when need to regularly change content or structure of generated documents

NLG systems’ pipeline
• Most common architecture in NLG systems is a three-stage pipeline with following stages:
o Text planning: combines content determination and discourse planning
o Sentence planning: combines sentence aggregation, lexicalization, and REG
o Linguistic realization: involves syntactic, morphological, and orthographic processing
• Issue of intermediate representations: how inputs and outputs of different stages should be represented and
what to pass from one stage to the other
• Data analytics and interpretation: making sense of data, often discarded
• Content Determination: decide on content and structure of text
⁃ Content selection:
⁃ Deciding what information should be communicated in the text
⁃ Creating a set of messages from system’s inputs or underlying data sources
⁃ Largely consists of filtering and summarizing input data
⁃ Messages created are expressed in some formal language that labels and distinguishes entities,
concepts and relations in domain
⁃ Represent each message as an attribute–value matrix; each describes some relation that holds
between those entities or concepts specified as arguments of that relation
⁃ Most NLG systems base content determination on domain-specific rules acquired from domain
experts; easier and more faithful to human texts
⁃ Document structure:
⁃ How should I organize this content as a text? What order do I say things in? What rhetorical
structure?
⁃ Impose ordering and structure over set of messages to be generated
⁃ Structuring the messages produced by content determination into a coherent text
⁃ Text plans (output of text planner):
⁃ Usually represented as trees whose leaf nodes specify individual messages, and whose internal
nodes show how messages are conceptually grouped together
⁃ Most common strategy is to represent messages as formally similar as possible to the
representation used for sentence plans
⁃ The clustering decisions made in tree will have an impact on determination of sentence and
paragraph boundaries in resulting text
⁃ Sentence plans:
⁃ Classic template systems simply insert parameter into boilerplate without doing any further
processing (newer systems might perform limited linguistic processing as well)
⁃ Abstract sentential representations: represent sentences plans by using an abstract
representation language which specifies the content words (nouns, verbs, adjectives and adverbs)
of a sentence, and how they are related
⁃ Sentence Planning Language (SPL): characterizes the sentence by means of named attributes and
their values, and allows values themselves to consist of named attributes and their values
• Microplanning: decide how to linguistically express text (which words, sentences, etc. to use; how to identify
objects, actions, times)
⁃ Input: a tree-structured text plan whose leaf nodes are messages
⁃ Output: a new text plan whose leaf nodes are combinations of messages that will eventually be realized
as sentences
⁃ Lexical/syntactic choice: which words and linguistic structures to use?

, ⁃ Lexicalization: deciding which specific words and phrases should be chosen to express domain
concepts and relations which appear in messages
⁃ Often simply done by hard coding a specific word or phrase for each domain concept or relation
⁃ Sometimes improve fluency by allowing NLG system to vary words used to express a concept or
relation, either to achieve variety or accommodate subtle pragmatic distinctions
⁃ Especially important when NLG system produces output texts in multiple languages
⁃ Aggregation: how should information be distributed across sentences and paragraphs?
⁃ Sentence aggregation: grouping messages together into sentences, not necessary but often, if
done well, can significantly enhance fluency and readability
⁃ Reference: how should text refer to objects and entities?
⁃ REG: task of selecting words or phrases (linguistic forms) to identify domain entities
⁃ Unlike lexicalization, REG is usually formalized as a discrimination task, where system needs to
communicate sufficient information to distinguish one domain entity from others; this requires
account of contextual factors
⁃ Goal is to include enough information in description to enable hearer to unambiguously identify
target entity
• Linguistic Realization:
⁃ Applying rules of grammar to produce a text which is syntactically, morphologically, and orthographically
correct
⁃ Generating grammatically correct sentences to communicate messages
⁃ Realizer:
⁃ Module where knowledge about grammar of NL is encoded
⁃ Activates syntactic component, morphological component and orthographic component

Building NLG systems
• Need knowledge of language and application:
⁃ Imitate a corpus of human-written texts
⁃ Manually examine, or use learning if corpus is large enough
⁃ Ask domain experts, although they’re better at critiquing what system does
⁃ Experiments with users, very nice in principle, but lots of work
• Evaluation of output texts:
⁃ Does system help people? Do people like texts and believe are useful? When to compare output texts
with human texts?
• Requirement analysis and system specification:
⁃ Developer uses a collection of example inputs and associated output texts to describe to users the system
she proposes to build
⁃ Corpus-based approach where corpus contains examples of system inputs and corresponding output texts
and should cover full range of texts expected to be produced by system, including boundary, unusual and
typical cases
• Analyzing information content of corpus texts:
⁃ Important step: identify parts of human-authored corpus texts conveying info not available to NLG
system; this analysis requires classifying each sentence of a corpus text into one of following categories:
⁃ Unchanging text: text always present in the output; easiest to generate
⁃ Directly available data: text with info already in input data (or DB/KB)
⁃ Computable data: text with info that can be derived from input data via computation or reasoning
⁃ Unavailable data: text with info not present in or derivable from input data; causes most problems
and impossible to generate (if not in input, can’t be in output)


Metrics

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller massimilianogarzoni. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $9.09. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

52355 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$9.09
  • (0)
Add to cart
Added