Samenvatting

Natural Language Processing - Mandatory Readings Summary

4 keer bekeken 0 keer verkocht

Instelling
Vrije Universiteit Amsterdam (VU)

A summary of all the mandatory readings per lecture for the course Natural Language Processing, MSc AI.

[Meer zien]

Voorbeeld 3 van de 23 pagina's

Bekijk voorbeeld

Geupload op 30 december 2024
Aantal pagina's 23
Geschreven in 2022/2023
Type Samenvatting

€4,99

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Lecture 1 and 2
Jacob Eisenstein, Natural Language Processing, Introduction (p. 1-9)
Natural language processing is the set of methods for making human language accessible to
computers. In computational linguistics, language is the object of study, whereas natural
language processing is focused on the design and analysis of computational algorithms and
representations for processing natural human language. The goal of natural language
processing is to provide new computational capabilities around human language. Contemporary
approaches to natural language processing rely heavily on machine learning, which makes it
possible to build complex computer programs from examples. Much of today’s natural language
processing research can be thought of as applied machine learning. However, natural language
processing has characteristics that distinguish it from many of machine learning’s other
application domains.

Text data is fundamentally discrete, with meaning created by combinatorial arrangements of
symbolic units. Although the set of words is discrete, new words are always being created and
the distribution over words (and other linguistic elements) resembles that of a power law (Zipf).
A consequence is that natural language processing algorithms must be especially robust to
observations that do not occur in the training data. Language is compositional: units such as
words can combine to create phrases, which can combine by the very same principles to create
larger phrases.

The goal of artificial intelligence is to build software and robots with the same range of abilities
as humans. Natural language processing is relevant to this goal in several ways: the capacity
for language is one of the central features of human intelligence (and is therefore a prerequisite
for artificial intelligence) and much of artificial intelligence research is dedicated to the
development of systems that can reason from premises to a conclusion, but such algorithms are
only as good as what they know. Natural language processing is a potential solution to the
“knowledge bottleneck”, by acquiring knowledge from texts, and perhaps also from
conversations. Natural language understanding cannot be achieved in isolation from knowledge
and reasoning. Yet the history of artificial intelligence has been one of increasing specialization.

Computer science is also relevant to natural language processing. Large datasets of unlabeled
text can be processed more quickly by parallelization techniques. Natural language is often
communicated in spoken form, and speech recognition is the task of converting an audio signal
to text. From one perspective, this is a signal processing problem, which might be viewed as a
preprocessing step before natural language processing can be applied. However, context plays
a critical role in speech recognition by human listeners. For this reason, speech recognition is
often integrated with text analysis, particularly with statistical language models, which quantify
the probability of a sequence of text.

1

,Natural language processing raises some particularly salient issues around ethics, fairness, and
accountability.
● Access. Who is natural language processing designed to serve?
● Bias. Does language technology learn to replicate social biases from text corpora, and
does it reinforce these biases as seemingly objective computational conclusions?
● Labor. Whose text and speech comprise the datasets that power natural language
processing, and who performs the annotations? Are the benefits of this technology
shared with all the people whose work makes it possible?
● Privacy and internet freedom. What is the impact of large-scale text processing on the
right to free and private communication? What is the potential role of natural language
processing in regimes of censorship or surveillance?

A recurring topic of debate is the relative importance of machine learning and linguistic
knowledge. On one extreme, advocates of “natural language processing from scratch” propose
to use machine learning to train end-to-end systems that transmute raw text into any desired
output structure: e.g., a summary, database, or translation. On the other extreme, the core work
of natural language processing is sometimes taken to be transforming text into a stack of
general-purpose linguistic structures: from subword units called morphemes, to word-level
parts-of-speech, to tree-structured representations of grammar, and beyond, to logic-based
representations of meaning.

Many natural language processing problems can be written mathematically in the form of
optimization:

where, x is the input, which is an element of a set X ; y is the output, which is an element of a
set Y(x); Ψ is a scoring function (also called the model), which maps from the set X × Y to the
real numbers; θ is a vector of parameters for Ψ; ŷ is the predicted output, which is chosen to
maximize the scoring function.

Because the outputs are usually discrete in language processing problems, search often relies
on the machinery of combinatorial optimization. Because the parameters are usually
continuous, learning algorithms generally rely on numerical optimization to identify vectors of
real-valued parameters that optimize some function of the model and the labeled data.

Catherine Anderson, Essentials of Linguistics 6.1-6.5 and 7.1-7.5
A word is a free form that has a meaning. Many words are made up of meaningful small units
called morphemes. Some morphemes are free: they can appear in isolation. (This means that
some words are also morphemes.) But some morphemes can only ever appear when they’re
attached to something else; these are called bound morphemes. In English, the most common
bound morphemes are suffixes and prefixes, which can be affixed to words to derive new words,
or can convey grammatical information via inflection. Although English has a very productive
system of derivational morphology, its inflectional morphology is quite sparse, which is not the
case for indigenous languages.

2

, We can categorize words according to their behaviour, which categories are open to new
members (i.e., syntactic open-class categories such as nouns, verbs and adjectives), and which
categories are not (i.e., semantic lexical categories/content words). Compounding is a very
productive means of creating new words in English by combining two free morphemes. While
most compounds are endocentric and have a head that determines the meaning and category
of the word, for exocentric compounds, the meaning of the compound drifts over time, leaving
the compound without a head.

Lecture 3
SLP Chapters 2, 3, 4
ELIZA was an early natural language processing system (kinda like a chatbot) that could carry
on a limited conversation with a user by imitating the responses of a Rogerian psychotherapist.
It uses pattern matching to recognize phrases and translate them into suitable outputs.

● Regular expressions: specify strings we might want to extract from a document
○ play an important part in text normalization
● Text normalization: converting text to a more convenient, standard form
○ Tokenization: separating out or tokenizing words from running tokenization text,
■ For processing tweets or texts we’ll need to tokenize emoticons or
hashtags
○ Lemmatization: determining whether two words have the same root, despite their
surface differences
■ E.g., sang, sung, and sings are forms of the verb sing.
● Lemma’s as a verb are called wordforms
○ Stemming: a simpler version of lemmatization in which we mainly just strip
suffixes from the end of the word
○ Sentence segmentation: breaking up a text into individual sentences, using cues
like sentence segmentation periods or exclamation points
● Finally, we’ll need to compare words and other strings → edit distance: measures how
similar two strings are based on the number of edits (insertions, deletions, substitutions)
it takes to change one string into the other. It is an algorithm with applications throughout
language processing, from spelling correction to speech recognition to coreference
resolution.

Regular expressions
● Concatenation: putting characters in sequence
● Are case sensitive

3

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper tararoopram. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €4,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 48298 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Verkoper

Samenvatting

Natural Language Processing - Mandatory Readings Summary

Document informatie

Onderwerpen

Geschreven voor

Verkoper

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?