Natural Language Processing - slides
Lecture 1
Natural Language
● Natural language is quite complex
● Yet, children acquire language easily and are able to understand and produce utterances
they have never heard (Poverty of the Stimulus)
● How is this possible?
○ Language is innate?
○ Learn through imitation?
○ Learn through interaction?
○ Language just like any cognitive faculty, but with more input?
Language Acquisition
Natural language is:
● Compositional
○ The meaning of a sentence is determined by the meanings of its individual words
and the way they are combined/composed
■ "The cat is on the mat." → the animal referred to as a "cat" is located "on"
top of the object referred to as a "mat."
● Arbitrary
○ There is no inherent or logical relationship between the form of a word or
expression and its meaning
■ "Dog" → refers to the domesticated four-legged animal we commonly
associate with the word "dog," but there is no inherent reason why the
sounds "d", "o", and "g" arranged in that particular order should convey
that meaning
● Creative
○ Ability of speakers of a natural language to generate new and meaningful
expressions that may not have been previously encountered or explicitly learned
■ “Selfie”
● Displaced
○ Ability of speakers to refer to things that are not directly perceivable or present
■ "Yesterday, I went to the store and bought some groceries."
1
,Natural Language Processing - slides
What does an NLP system need to know?
● Language consists of many levels of structure.
● Humans fluently integrate all of these in producing and understanding language.
● Ideally, so would a computer!
● Morphology
○ Study of words and their parts or smallest meaningful units of meaning
■ prefixes, suffixes and base words
● Parts of speech
○ Word classes or grammatical categories such as noun, verb, adjective, adverb,
pronoun, preposition, conjunction, interjection
● Syntax
○ Rules that govern the arrangement of words and phrases in a sentence, including
rules for word order, word agreement (e.g., subject-verb agreement), and the
formation of phrases (such as noun phrases, verb phrases, and adjective
phrases)
● Semantics
○ Meaning of words, phrases, sentences
● Pragmatics/discourse
○ Analysis of extended stretches of language use, such as conversations, texts,
and narratives, in their social and cultural contexts
What is Natural Language Processing?
● Core technologies:
○ Language modeling / text generation
○ Sequence / POS tagging
○ Syntactic parsing
○ Named Entity Recognition (NER)
○ Coreference resolution
○ Word disambiguation
○ Semantic role labeling
2
,Natural Language Processing - slides
● Natural language processing (NLP) refers to the branch of computer science—and more
specifically, the branch of artificial intelligence or AI—concerned with giving computers
the ability to understand text and spoken words in much the same way human beings
can.
● NLP combines computational linguistics—rule-based modeling of human language—with
statistical, machine learning, and deep learning models. Together, these technologies
enable computers to process human language in the form of text or voice data and to
‘understand’ its full meaning, complete with the speaker or writer’s intent and sentiment.
What is Natural Language Processing?
● Represent language in a way that a computer can process it
○ representing input
● Process language in a way that is useful for humans
○ generating output
● Understanding language structure and language use
○ computational modelling
Why is NLP hard?
1. Ambiguity
2. Sparse data due to Zipf’s Law
3. Variation
4. Expressivity
5. Context dependence
6. Unknown representation
Ambiguity at many levels
● Word senses: bank (noun: place where people deposit money or verb: to bounce off of
something)
● Part of speech: chair (noun: seat, person in charge of an organization or verb: act as
chairperson)
● Syntactic structure: I saw a man with a telescope (either I had a telescope or the man)
● Quantifier scope: Every child loves some movie (every child loves at least one movie or
every child loves one particular movie)
● Multiple: I saw her duck (saw as in see or a hand tool)
● How can we model ambiguity, and choose the correct analysis in context?
What can we do about ambiguity?
● Non-probabilistic methods (FSMs for morphology, CKY parsers for syntax)
○ Return all possible analyses
● Probabilistic models (HMMs for POS tagging, PCFGs for syntax) and algorithms (Viterbi,
probabilistic CKY)
○ Return the best possible analysis
● But the “best” analysis is only good if our probabilities are accurate. Where do they come
from?
3
, Natural Language Processing - slides
Statistical NLP
● Like most other parts of AI, NLP is dominated by statistical methods
○ Typically more robust than ealier rule-based methods
○ Relevant statistics/probablities are learned from data
○ Normally requires lots of data about any particular phenomenon
Sparse data due to Zipf’s Law
● We have different word counts/ frequencies in large text corpuses
○ Takeaway: Rank-frequency distribution is an inverse relation
■ To really see what’s going on, use logarithmic axes:
● Assume “word” is a string of letters separated by spaces (a great oversimplification…)
● Zipf’s law
○ Summarizes the behaviour above
○ Implications
■ Regardless of how large our corpus is, there will be a lot of infrequent
(and zero-frequency) words
■ In fact, the same holds for many other levels of linguistic structure (e.g.,
syntactic rules in a CFG)
■ This means we need to find clever ways to estimate probabilities for
things we have rarely or never seen
Variation
4