This document contains notes and summaries covering the content of the course Natural Language Generation within the Artificial Intelligence Master at Utrecht University.
Natural Language Generation course notes, March 2022
Lecture 1: Introduction
What’s NLG
• NLG systems are computer algorithms/systems which produce texts in
English or other human languages
• Input is data (raw or analyzed)
⁃ often text, NLG usually does not include MT
• Output is text:
⁃ sentences, reports, explanations, etc.
• Two aims:
⁃ Understanding language production (Theoretical NLG)
⁃ Building practically useful systems (Practical NLG)
Language technology
• From data to meaning: speech —> speech recognition —> NLU —> meaning
• From meaning to data: meaning —> NLG —> text —> speech synthesis —>
speech
Ex. 1: Weather forecast
• Input: numerical weather predictions
⁃ From supercomputer running a numerical weather simulation
• Output: textual weather forecast
⁃ Users often prefer some NLG texts over human texts
⁃ More consistent, better word choice
Ex. 2: Road maintenance
• Forecasts for gritting and other winter road maintenance procedures
• Input is 15 parameters over space and time
⁃ Temperature, wind speed, rain, etc
⁃ Over thousands of points on a grid
⁃ Over 24 hours (20-min interval)
• Generated text for each of these
• Issues:
⁃ Weather terms can be context dependent
⁃ Light rain in Ireland vs light rain in the Sahara
⁃ Aggregating over a huge set of locations
⁃ Being brief yet truthful and informative
⁃ The risk of false negatives
Ex. 3: BabyTalk
• Goal: summarize clinical data about premature babies in neonatal ICU
• Input: sensor data (blood pressure, heart rate); records of actions/
observations by medical staff
• Output: multi-paramedic texts, summarise
⁃ BT45: 45 mins data, for doctors
⁃ BT-Nurse: 12 hrs data, for nurses
⁃ BT-Family: 24 hrs data, for parents
, • Issues here:
⁃ How to decide on evaluative terms like “stable”
⁃ How to avoid omitting clinically relevant info
⁃ How to generate a coherent narrative
⁃ How be be clear about the time line
Ex. 4: ScubaText system
• Demo system for scuba divers
• Input is dive computer data
⁃ Depth-time profile of scuba dive
• Output is feedback to diver
⁃ Mistakes, what to do better next time
⁃ Encouragement of things done well
Other NLG apps
• Automatic journalism
• Reporting on sports results
• Textual feedback on health
• Agents and dialogue systems
• Financial reporting for companies
• Image labelling
NLG systems’ pipeline
• Data analytics and interpretation:
⁃ Making sense of the data
• Document planning:
⁃ Decide on content and structure of text
⁃ Content selection:
⁃ Of all the things I could inform you about, which should be
chosen?
⁃ Depends on what is important, what is easy to say, what makes
good narrative
⁃ Document structure:
⁃ How should I organize this content as a text?
⁃ What order do I say things in?
⁃ What rethorical structure?
• Microplanning:
⁃ Decide how to linguistically express text (which words, sentences, etc.
to use; how to identify objects, actions, times)
⁃ Lexical/syntactic choice:
⁃ Which words and linguistic structures to use?
⁃ Aggregation:
⁃ How should information be distributed across sentences and
paragraphs?
⁃ Reference:
⁃ How should the text refer to objects and entities?
• Linguistic Realization:
⁃ Grammatical details:
⁃ Form “legal” English sentences based on decisions made in
, previous stages
⁃ Obey sublanguage & genre constraints
⁃ Structure:
⁃ Inserting line breaks
⁃ Form legal HTML, RTF, or whatever output format is desired
⁃ Simple linguistic processing:
⁃ Capitalize first word of sentence
⁃ Subject-verb agreement
Multimodal NLG
• Sometimes output is speech (i.e. spoken)
• Text may be combined with visualizations
⁃ Produce separately, or
⁃ Tight integration
⁃ Text refers to graphic, or graphs have text annotations
• Combined methods are often preferred
Building NLG systems
• Need knowledge of language and the application:
⁃ Where does it come from?
⁃ Imitate a corpus of human-written texts
⁃ Manually examine
⁃ Use learning if corpus is large enough
⁃ Ask domain experts
⁃ Experts bad at explaining what they do
⁃ Better at critiquing what system does
⁃ Experiments with users
⁃ Very nice in principle, but a lot of work
• Evaluation of output texts:
⁃ Does system help people?
⁃ Do people like the texts and delivery they are useful?
⁃ What when compare the output texts with human texts?
NLG vs. NLU
• NLG is about generating/producing rather than understanding language
⁃ Term “Natural Language Processing” (NLP) sometimes denotes NLU,
sometimes all of language technologies
• NLG and NLU are often combined:
⁃ Chatbots, machine translation and automated text summarization
Voordelen van het kopen van samenvattingen bij Stuvia op een rij:
Verzekerd van kwaliteit door reviews
Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!
Snel en makkelijk kopen
Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.
Focus op de essentie
Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!
Veelgestelde vragen
Wat krijg ik als ik dit document koop?
Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.
Tevredenheidsgarantie: hoe werkt dat?
Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.
Van wie koop ik deze samenvatting?
Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper massimilianogarzoni. Stuvia faciliteert de betaling aan de verkoper.
Zit ik meteen vast aan een abonnement?
Nee, je koopt alleen deze samenvatting voor €8,49. Je zit daarna nergens aan vast.