This document contains notes and summaries covering the content of the course Natural Language Generation within the Artificial Intelligence Master at Utrecht University.
Natural Language Generation course notes, March 2022
Lecture 1: Introduction
What’s NLG
• NLG systems are computer algorithms/systems which produce texts in
English or other human languages
• Input is data (raw or analyzed)
⁃ often text, NLG usually does not include MT
• Output is text:
⁃ sentences, reports, explanations, etc.
• Two aims:
⁃ Understanding language production (Theoretical NLG)
⁃ Building practically useful systems (Practical NLG)
Language technology
• From data to meaning: speech —> speech recognition —> NLU —> meaning
• From meaning to data: meaning —> NLG —> text —> speech synthesis —>
speech
Ex. 1: Weather forecast
• Input: numerical weather predictions
⁃ From supercomputer running a numerical weather simulation
• Output: textual weather forecast
⁃ Users often prefer some NLG texts over human texts
⁃ More consistent, better word choice
Ex. 2: Road maintenance
• Forecasts for gritting and other winter road maintenance procedures
• Input is 15 parameters over space and time
⁃ Temperature, wind speed, rain, etc
⁃ Over thousands of points on a grid
⁃ Over 24 hours (20-min interval)
• Generated text for each of these
• Issues:
⁃ Weather terms can be context dependent
⁃ Light rain in Ireland vs light rain in the Sahara
⁃ Aggregating over a huge set of locations
⁃ Being brief yet truthful and informative
⁃ The risk of false negatives
Ex. 3: BabyTalk
• Goal: summarize clinical data about premature babies in neonatal ICU
• Input: sensor data (blood pressure, heart rate); records of actions/
observations by medical staff
• Output: multi-paramedic texts, summarise
⁃ BT45: 45 mins data, for doctors
⁃ BT-Nurse: 12 hrs data, for nurses
⁃ BT-Family: 24 hrs data, for parents
, • Issues here:
⁃ How to decide on evaluative terms like “stable”
⁃ How to avoid omitting clinically relevant info
⁃ How to generate a coherent narrative
⁃ How be be clear about the time line
Ex. 4: ScubaText system
• Demo system for scuba divers
• Input is dive computer data
⁃ Depth-time profile of scuba dive
• Output is feedback to diver
⁃ Mistakes, what to do better next time
⁃ Encouragement of things done well
Other NLG apps
• Automatic journalism
• Reporting on sports results
• Textual feedback on health
• Agents and dialogue systems
• Financial reporting for companies
• Image labelling
NLG systems’ pipeline
• Data analytics and interpretation:
⁃ Making sense of the data
• Document planning:
⁃ Decide on content and structure of text
⁃ Content selection:
⁃ Of all the things I could inform you about, which should be
chosen?
⁃ Depends on what is important, what is easy to say, what makes
good narrative
⁃ Document structure:
⁃ How should I organize this content as a text?
⁃ What order do I say things in?
⁃ What rethorical structure?
• Microplanning:
⁃ Decide how to linguistically express text (which words, sentences, etc.
to use; how to identify objects, actions, times)
⁃ Lexical/syntactic choice:
⁃ Which words and linguistic structures to use?
⁃ Aggregation:
⁃ How should information be distributed across sentences and
paragraphs?
⁃ Reference:
⁃ How should the text refer to objects and entities?
• Linguistic Realization:
⁃ Grammatical details:
⁃ Form “legal” English sentences based on decisions made in
, previous stages
⁃ Obey sublanguage & genre constraints
⁃ Structure:
⁃ Inserting line breaks
⁃ Form legal HTML, RTF, or whatever output format is desired
⁃ Simple linguistic processing:
⁃ Capitalize first word of sentence
⁃ Subject-verb agreement
Multimodal NLG
• Sometimes output is speech (i.e. spoken)
• Text may be combined with visualizations
⁃ Produce separately, or
⁃ Tight integration
⁃ Text refers to graphic, or graphs have text annotations
• Combined methods are often preferred
Building NLG systems
• Need knowledge of language and the application:
⁃ Where does it come from?
⁃ Imitate a corpus of human-written texts
⁃ Manually examine
⁃ Use learning if corpus is large enough
⁃ Ask domain experts
⁃ Experts bad at explaining what they do
⁃ Better at critiquing what system does
⁃ Experiments with users
⁃ Very nice in principle, but a lot of work
• Evaluation of output texts:
⁃ Does system help people?
⁃ Do people like the texts and delivery they are useful?
⁃ What when compare the output texts with human texts?
NLG vs. NLU
• NLG is about generating/producing rather than understanding language
⁃ Term “Natural Language Processing” (NLP) sometimes denotes NLU,
sometimes all of language technologies
• NLG and NLU are often combined:
⁃ Chatbots, machine translation and automated text summarization
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller massimilianogarzoni. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $9.24. You're not tied to anything after your purchase.