Samenvatting

Natural Language Processing - Concise Summary per Lecture

0 keer verkocht

Instelling
Vrije Universiteit Amsterdam (VU)

A more concise summary of the course Natural Language Processing (NLP), MSc AI.

[Meer zien]

Voorbeeld 4 van de 46 pagina's

Bekijk voorbeeld

Geupload op 30 december 2024
Aantal pagina's 46
Geschreven in 2022/2023
Type Samenvatting

€11,99

Ook beschikbaar in voordeelbundel v.a. €19,99

In winkelwagen

Opslaan

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Ook beschikbaar in voordeelbundel (1)

Natural Language Processing - Slides and Concise Summary

€ 26,98 € 19,99 2 items

1. Samenvatting - Natural language processing - summary slides
2. Samenvatting - Natural language processing - concise summary per lecture
Meer zien

Lecture 1
Properties of natural language
● Compositional: the meaning of a sentence is determined by the meanings of its individual words
and the way they are combined/composed
○ The principle of compositionality: meaning of a whole expression (semantics) is a
function of the meaning of its parts and the way they are put together (syntax)
● Arbitrary: there is no inherent or logical relationship between the form of a word or expression and
its meaning
○ There is no inherent reason why the sounds “d”, “o”, “g” arranged in that particular order
should convey the meaning of the four-legged animal
● Creative: one could generate new and meaningful expressions that have not have previously
been encountered or explicitly learned → e.g., Selfie
● Displaced: we could refer to things that are not directly perceivable or present
○ Also referred to displacement

What does an NLP system need to know?
● Language consists of many levels of structure
● Humans fluently integrate all of these in producing and understanding language → ideally, so
would a computer!

Definitions
● Morphology: study of words and their parts or smallest meaningful units
○ i.e., prefixes, suffixes, and base words
● Parts of speech: word classes or grammatical categories like noun, verb, adjective, adverb,
pronoun, preposition, conjunction and interjection
● Syntax: rules that govern the arrangement of words and phrases in a sentence, including rules for
word order, word agreement and the formation of phrases (such as noun phrases, verb phrases
and adjective phrases)
● Semantics: meaning of words, phrases, sentences
● Pragmatics/discourse: analysis of extended stretches of language use, such as conversations,
texts, and narratives in their social/cultural contexts

NLP is concerned with giving computers the ability to understand text and spoken words in much the
same way as humans can
● Combines computational linguistics with statistical, Machine Leaning and Deep Learning models
● Represent language in a way that a computer understands it → representing input
● Process language in way that it is useful for humans → generating output
● Understanding language structure and language use → computational modelling

Why is NLP hard?
1. Ambiguity at many levels
○ Word senses: bank (noun: place to deposit money, verb: to bounce of off something)
○ Parts of speech: chair (noun: seat or person in charge of an organization, verb: act as a
chairperson)
○ Syntactic structure: I saw a man with a telescope (who had the telescope?)
○ Quantifier scope: every child loves some movie
○ Multiple: I saw her duck (saw as in see or the hand tool)
➢ To fix ambiguity:
■ Non-probablistic models → return all possible analyses

1

, ■Probabilistic models → return best possible analyses: only good if probabilities
are accurate
2. Sparse data due to Zipf’s law
○ We have different word counts/frequencies in large corpuses
○ Rank-frequency distribution is an inverse relation → to see what’s really going on, take
logarithm axes

○ Regardless of how large our corpus is, there will be a lot of infrequent (and
zero-frequency) words
3. Variation
○ There are many different languages, dialects, accents
○ People use slang
4. Expressivity
○ Not only can one form have different meanings (ambiguity), but the same meaning can
be expressed with different forms
5. Context dependence
○ Correct interpretation is context dependent
○ Depends on groundedness
6. Unknown representation
○ Difficulty in representing the meaning and structure of language in a way that it can be
understood by machines
7. Natural language is not only written but often spoken and grounded (i.e., our language
understanding is influenced by our sensory and motor experiences in the physical world)
○ We use emojis, we have sign language/dialects

Lecture 2
Human (natural) language
● Language can be manipulated to say infinite things (recursion)
● But the brain is finite → some sort of set of rules
● We can manipulate these rules to say anything (e.g., things that don’t exist of that are totally
abstract)
● There’s a critical period for acquiring language → children need to receive real input
● Language is interconnected with other cognitive abilities

There is structure underlying language
● This is separate from the words we use and the things we say → e.g., we can come up with a
plural for non-existent words (wug → wugs)
● Structure dictates how we use language
● We implicitly know rules about structure → a community of speakers (e.g., Dutch people) share a
roughly consistent consent of these implicit rules
○ All the utterances we can generate from these rules are grammatical
○ But, people don’t fully agree as they have their own idiolect → grammaticality is graded

2

,Subject, Verb, and Object appear in SVO order
● Subject pronouns → I, she, he, they
● Object pronouns → me, her, him, them
● Sentences can be grammatical without meaning
● Not everyone is as strict for some wh- constraints

Why do we even need rules?
● Grammaticality rules accept useless utterances
○ And block out perfectly communicative ones (e.g., me cupcake at)
● A basic fact about language is that we can say anything
○ If we ignore the rules because we know what was probably intended, we are actually
limiting possibilities!
○ Rules give us expressivity

Before self-supervise learning, the way to approach doing NLP was through understanding the human
language system and trying to imitate it
● Now we have language models (LMs) like GPT that catch on to a lot of language patterns →
extract linguistic information
● LLMs often don’t care about word order
● LMs are not engineered around discrete linguistic rules but the pre-training process is not just a
bunch of surface-level memorization → there’s syntactic knowledge but this is complicated
● But there is no ground truth for how language works

Meaning plays a role in linguistic structure
● There’s a lot of rich information in words that affect the final structure of language
● The rich semantics of words is always playing a role in forming and applying the rules of
language

How we train our models these days:

Differential object marking
● Structurally, anything can be an object, but many languages have a special syntactic way of
dealing with this
● LMs are also aware of these gradations

Outlandish meanings are not impossible to express but not all structure-word combinations are possible
● Making less plausible things more prominently is a pervasive feature of grammar

3

, Meaning is not always compositional (in strict sense)
● Language is full of idioms and metaphors/wisdoms
● We’re constantly using constructions that we couldn’t get from just a syntactic + semantic parse
● And even mixed constructions that can compotionally take arguments (e.g., the bigger, the better,
he won’t X, let alone Y)
● Construction grammar provides unique insight into natural LMs
● The meaning of words is sensitive and influenced by context
● Fine-grained lexical semantics in LMs

A big question in NLP: how to strike the balance?
● Language is characterized by the fact that it’s an amazingly abstract system → and we want our
models to capture that
● But, meaning is so rich and multifaceted → high-dimension spaces are much better at capturing
these specificities subtleties than any rules
● Unsolved question: where do deep leaning models stand?
● “While language is full of both broad generalizations and item-specific properties, linguistists have
been dazzled by the quest for general patterns

Lecture 3
Corpora in NLP
● Definition: body of utterances, as words or sentences, assumed to be representative and used for
lexical, grammatical or other linguistic analysis
● Also include metadata-side information about where the language comes from such as author,
date, topic, publication
● Corpora with linguistic annotations → humans marked categories or structures describing their
syntax or meaning

Sentiment analysis: predict sentiment in text → positive/negative/neutral etc.
● Is hard because sentiment is a measure of a person’s private state, which is unobservable
● Sometimes words (e.g., amazing) are a indicator of sentiment but many times it requires deep
world + contextual knowledge
● Text classification: A mapping h from input data x (drawn from instance spance X) to label (or
labels) y from some enumerable output space Y

Why is data-driven evaluation important?
● Good science requires controlled experimentation
● Good engineering requires benchmarks
● Your intuitions about typical inputs are probably wrong
● We also need text corpora to help our systems work well

Annotations
● Supervised learning
● To evaluate and compare sentiment analyzers, we need reviews with gold labels (+ or -)
● These can be either
○ Derived automatically from the original data artifact (metadata such as star ratings)
○ Added by human annotator
■ Issue to consider: how consistent are they?
■ Interannotator agreement (IA)

4

Dit zijn jouw voordelen als je samenvattingen koopt bij Stuvia:

Bewezen kwaliteit door reviews

Studenten hebben al meer dan 850.000 samenvattingen beoordeeld. Zo weet jij zeker dat je de beste keuze maakt!

In een paar klikken geregeld

Geen gedoe — betaal gewoon eenmalig met iDeal, creditcard of je Stuvia-tegoed en je bent klaar. Geen abonnement nodig.

Direct to-the-point

Studenten maken samenvattingen voor studenten. Dat betekent: actuele inhoud waar jij écht wat aan hebt. Geen overbodige details!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper tararoopram. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €11,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 69484 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Begin nu gratis