Natural Language Processing - Summary
Tilburg University - School of Humanities and Digital Sciences
Master Course
1
,1 Building blocks of NLP
1.1 Introduction
1.1.1 What is NLP?
NLP is the field concerned with enabling machines to process, understand and
produce natural languages.
It is It is not
Conventional Formal logic
A set of related systems A programming language
Redundant Machine language
Subject to change
Context dependent
1.1.2 Why do we need linguistics?
Linguists have developed a rich toolbox to describe and formalize many complex
phenomena in natural languages. These methods can be used to automate
natural language processing, better describe how a system works and evaluate
it sensibly.
1.1.3 Why do we need computer science?
Computer scientists have developed several efficient and scalable algorithms
and formalism’s to automate processes. Use these formalism’s and algorithms
to perform tasks accurately and quickly, in an optimal way.
1.1.4 Why do we need statistics?
Natural languages are often usefully characterized in terms of probability distri-
butions over discrete units (words, sequential information, meaning...). Statis-
tics provides tools to manipulate probability distributions correctly and to use
this information appropriately.
1.1.5 What are the goals of NLP?
• Recognize a language
• Infer its component symbols, their roles, the rules for combining them and
their meaning
• Formalize the rules for combining symbols
• Combine atomic meanings
• Produce complex sentences
• Process large portions of text
2
, • Translate from one language to another
1.1.6 The Turing test
A behaviorist test of verbal intelligence: when the linguistic behavior of a ma-
chine cannot be distinguished from that of a person, then the machine is intel-
ligent.
1.1.7 Part of a whole
Language is the most natural way in which people interact. NLP gives efficient
and scalable access to these interactions and allow to automatically interact
with people in an intuitive way. It is a core component of AI.
1.2 Levels of analysis
1.2.1 Phonology
The discipline concerned with studying linguistic sounds to construct inventories
of sounds with a linguistic role (when do acoustic differences reflect in linguistic
differences?). This is different from phonetics!
1.2.2 Segmentation
The task of splitting text or speech (harder, why?) into atomic symbols (letters,
phonemes, morphemes, words, chunks...). Why is this relevant? Are there
differences across languages?
1.2.3 Morphology
The study of how words are built up from smaller meaning-bearing units, the
morphemes. Morphological complexity varies cross-linguistically: some lan-
guages have simple morphological systems, others crazy complex ones.
1.2.4 Syntax
The discipline which studies the set of rules, principles and processes for combining
symbols according to the structure of the language. Asserts whether a sentence
is well-formed in a language. Can you make an example of a syntactic rule in
English and find a language with a different rule for the same phenomenon?
3
, 1.2.5 Lexical semantics
The discipline which is concerned with describing the meaning of single symbolic
units (words, morphemes, collocations). It aims to classify and decompose lex-
ical items, compare lexical semantic structures cross-linguistically, and under-
stand similarities across items.
1.2.6 Compositional semantics
The study of how atomic meanings are combined into larger meaningful units,
such as sentences, paragraphs, and so on... Why cannot we just string atomic
meanings together and be happy about it?
1.2.7 Pragmatics
The analysis of how context influences meaning, encompassing semantics, lin-
guistic knowledge of participants, situational context, shared knowledge, goals
and intent.
1.2.8 I made her duck: ambiguity everywhere
• Morpho-syntactic: her (dative v. possessive) and duck (noun or verb)
• Semantic: make (cook, create, cause, transform) and duck (bird, action of
avoiding)
• Syntactic: make (transitive v. ditransitive)
• Phonological: I-eye, made-maid
Luckily, there’s redundancy (where? How to leverage it?).
1.3 Methods
1.3.1 Rule systems
Write rules by hand to cover relevant phenomena:
• Robust: you can write a rule for a rare event and it will be applied anytime
that event occurs
• Good way to incorporate intuitions and domain knowledge into a system
• No necessity of large data-sets
• Expensive to write
• Require domain knowledge
• Rigid when dealing with ambiguity
4