100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Natural Language Processing - Mandatory Readings Summary $5.30
Add to cart

Summary

Natural Language Processing - Mandatory Readings Summary

 5 views  0 purchase
  • Course
  • Institution

A summary of all the mandatory readings per lecture for the course Natural Language Processing, MSc AI.

Preview 3 out of 23  pages

  • December 30, 2024
  • 23
  • 2022/2023
  • Summary
avatar-seller
Lecture 1 and 2
Jacob Eisenstein, Natural Language Processing, Introduction (p. 1-9)
Natural language processing is the set of methods for making human language accessible to
computers. In computational linguistics, language is the object of study, whereas natural
language processing is focused on the design and analysis of computational algorithms and
representations for processing natural human language. The goal of natural language
processing is to provide new computational capabilities around human language. Contemporary
approaches to natural language processing rely heavily on machine learning, which makes it
possible to build complex computer programs from examples. Much of today’s natural language
processing research can be thought of as applied machine learning. However, natural language
processing has characteristics that distinguish it from many of machine learning’s other
application domains.

Text data is fundamentally discrete, with meaning created by combinatorial arrangements of
symbolic units. Although the set of words is discrete, new words are always being created and
the distribution over words (and other linguistic elements) resembles that of a power law (Zipf).
A consequence is that natural language processing algorithms must be especially robust to
observations that do not occur in the training data. Language is compositional: units such as
words can combine to create phrases, which can combine by the very same principles to create
larger phrases.

The goal of artificial intelligence is to build software and robots with the same range of abilities
as humans. Natural language processing is relevant to this goal in several ways: the capacity
for language is one of the central features of human intelligence (and is therefore a prerequisite
for artificial intelligence) and much of artificial intelligence research is dedicated to the
development of systems that can reason from premises to a conclusion, but such algorithms are
only as good as what they know. Natural language processing is a potential solution to the
“knowledge bottleneck”, by acquiring knowledge from texts, and perhaps also from
conversations. Natural language understanding cannot be achieved in isolation from knowledge
and reasoning. Yet the history of artificial intelligence has been one of increasing specialization.

Computer science is also relevant to natural language processing. Large datasets of unlabeled
text can be processed more quickly by parallelization techniques. Natural language is often
communicated in spoken form, and speech recognition is the task of converting an audio signal
to text. From one perspective, this is a signal processing problem, which might be viewed as a
preprocessing step before natural language processing can be applied. However, context plays
a critical role in speech recognition by human listeners. For this reason, speech recognition is
often integrated with text analysis, particularly with statistical language models, which quantify
the probability of a sequence of text.




1

,Natural language processing raises some particularly salient issues around ethics, fairness, and
accountability.
● Access. Who is natural language processing designed to serve?
● Bias. Does language technology learn to replicate social biases from text corpora, and
does it reinforce these biases as seemingly objective computational conclusions?
● Labor. Whose text and speech comprise the datasets that power natural language
processing, and who performs the annotations? Are the benefits of this technology
shared with all the people whose work makes it possible?
● Privacy and internet freedom. What is the impact of large-scale text processing on the
right to free and private communication? What is the potential role of natural language
processing in regimes of censorship or surveillance?

A recurring topic of debate is the relative importance of machine learning and linguistic
knowledge. On one extreme, advocates of “natural language processing from scratch” propose
to use machine learning to train end-to-end systems that transmute raw text into any desired
output structure: e.g., a summary, database, or translation. On the other extreme, the core work
of natural language processing is sometimes taken to be transforming text into a stack of
general-purpose linguistic structures: from subword units called morphemes, to word-level
parts-of-speech, to tree-structured representations of grammar, and beyond, to logic-based
representations of meaning.

Many natural language processing problems can be written mathematically in the form of
optimization:



where, x is the input, which is an element of a set X ; y is the output, which is an element of a
set Y(x); Ψ is a scoring function (also called the model), which maps from the set X × Y to the
real numbers; θ is a vector of parameters for Ψ; ŷ is the predicted output, which is chosen to
maximize the scoring function.

Because the outputs are usually discrete in language processing problems, search often relies
on the machinery of combinatorial optimization. Because the parameters are usually
continuous, learning algorithms generally rely on numerical optimization to identify vectors of
real-valued parameters that optimize some function of the model and the labeled data.

Catherine Anderson, Essentials of Linguistics 6.1-6.5 and 7.1-7.5
A word is a free form that has a meaning. Many words are made up of meaningful small units
called morphemes. Some morphemes are free: they can appear in isolation. (This means that
some words are also morphemes.) But some morphemes can only ever appear when they’re
attached to something else; these are called bound morphemes. In English, the most common
bound morphemes are suffixes and prefixes, which can be affixed to words to derive new words,
or can convey grammatical information via inflection. Although English has a very productive
system of derivational morphology, its inflectional morphology is quite sparse, which is not the
case for indigenous languages.


2

, We can categorize words according to their behaviour, which categories are open to new
members (i.e., syntactic open-class categories such as nouns, verbs and adjectives), and which
categories are not (i.e., semantic lexical categories/content words). Compounding is a very
productive means of creating new words in English by combining two free morphemes. While
most compounds are endocentric and have a head that determines the meaning and category
of the word, for exocentric compounds, the meaning of the compound drifts over time, leaving
the compound without a head.

Lecture 3
SLP Chapters 2, 3, 4
ELIZA was an early natural language processing system (kinda like a chatbot) that could carry
on a limited conversation with a user by imitating the responses of a Rogerian psychotherapist.
It uses pattern matching to recognize phrases and translate them into suitable outputs.

● Regular expressions: specify strings we might want to extract from a document
○ play an important part in text normalization
● Text normalization: converting text to a more convenient, standard form
○ Tokenization: separating out or tokenizing words from running tokenization text,
■ For processing tweets or texts we’ll need to tokenize emoticons or
hashtags
○ Lemmatization: determining whether two words have the same root, despite their
surface differences
■ E.g., sang, sung, and sings are forms of the verb sing.
● Lemma’s as a verb are called wordforms
○ Stemming: a simpler version of lemmatization in which we mainly just strip
suffixes from the end of the word
○ Sentence segmentation: breaking up a text into individual sentences, using cues
like sentence segmentation periods or exclamation points
● Finally, we’ll need to compare words and other strings → edit distance: measures how
similar two strings are based on the number of edits (insertions, deletions, substitutions)
it takes to change one string into the other. It is an algorithm with applications throughout
language processing, from spelling correction to speech recognition to coreference
resolution.

Regular expressions
● Concatenation: putting characters in sequence
● Are case sensitive




3

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller tararoopram. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $5.30. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

48072 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 15 years now

Start selling
$5.30
  • (0)
Add to cart
Added