Extensive summary of the slides of the course Natural Language Processing, given at Tilburg University School of Humanities and Digital Sciences. It is an optional course for Data Science and Cognitive Sciences and Artificial Intelligence.
Unfortunately this summary is just a copy paste version of the lecture slides
Seller
Follow
sophievink
Reviews received
Content preview
Natural Language Processing - Summary
Tilburg University - School of Humanities and Digital Sciences
Master Course
1
,1 Building blocks of NLP
1.1 Introduction
1.1.1 What is NLP?
NLP is the field concerned with enabling machines to process, understand and
produce natural languages.
It is It is not
Conventional Formal logic
A set of related systems A programming language
Redundant Machine language
Subject to change
Context dependent
1.1.2 Why do we need linguistics?
Linguists have developed a rich toolbox to describe and formalize many complex
phenomena in natural languages. These methods can be used to automate
natural language processing, better describe how a system works and evaluate
it sensibly.
1.1.3 Why do we need computer science?
Computer scientists have developed several efficient and scalable algorithms
and formalism’s to automate processes. Use these formalism’s and algorithms
to perform tasks accurately and quickly, in an optimal way.
1.1.4 Why do we need statistics?
Natural languages are often usefully characterized in terms of probability distri-
butions over discrete units (words, sequential information, meaning...). Statis-
tics provides tools to manipulate probability distributions correctly and to use
this information appropriately.
1.1.5 What are the goals of NLP?
• Recognize a language
• Infer its component symbols, their roles, the rules for combining them and
their meaning
• Formalize the rules for combining symbols
• Combine atomic meanings
• Produce complex sentences
• Process large portions of text
2
, • Translate from one language to another
1.1.6 The Turing test
A behaviorist test of verbal intelligence: when the linguistic behavior of a ma-
chine cannot be distinguished from that of a person, then the machine is intel-
ligent.
1.1.7 Part of a whole
Language is the most natural way in which people interact. NLP gives efficient
and scalable access to these interactions and allow to automatically interact
with people in an intuitive way. It is a core component of AI.
1.2 Levels of analysis
1.2.1 Phonology
The discipline concerned with studying linguistic sounds to construct inventories
of sounds with a linguistic role (when do acoustic differences reflect in linguistic
differences?). This is different from phonetics!
1.2.2 Segmentation
The task of splitting text or speech (harder, why?) into atomic symbols (letters,
phonemes, morphemes, words, chunks...). Why is this relevant? Are there
differences across languages?
1.2.3 Morphology
The study of how words are built up from smaller meaning-bearing units, the
morphemes. Morphological complexity varies cross-linguistically: some lan-
guages have simple morphological systems, others crazy complex ones.
1.2.4 Syntax
The discipline which studies the set of rules, principles and processes for combining
symbols according to the structure of the language. Asserts whether a sentence
is well-formed in a language. Can you make an example of a syntactic rule in
English and find a language with a different rule for the same phenomenon?
3
, 1.2.5 Lexical semantics
The discipline which is concerned with describing the meaning of single symbolic
units (words, morphemes, collocations). It aims to classify and decompose lex-
ical items, compare lexical semantic structures cross-linguistically, and under-
stand similarities across items.
1.2.6 Compositional semantics
The study of how atomic meanings are combined into larger meaningful units,
such as sentences, paragraphs, and so on... Why cannot we just string atomic
meanings together and be happy about it?
1.2.7 Pragmatics
The analysis of how context influences meaning, encompassing semantics, lin-
guistic knowledge of participants, situational context, shared knowledge, goals
and intent.
1.2.8 I made her duck: ambiguity everywhere
• Morpho-syntactic: her (dative v. possessive) and duck (noun or verb)
• Semantic: make (cook, create, cause, transform) and duck (bird, action of
avoiding)
• Syntactic: make (transitive v. ditransitive)
• Phonological: I-eye, made-maid
Luckily, there’s redundancy (where? How to leverage it?).
1.3 Methods
1.3.1 Rule systems
Write rules by hand to cover relevant phenomena:
• Robust: you can write a rule for a rare event and it will be applied anytime
that event occurs
• Good way to incorporate intuitions and domain knowledge into a system
• No necessity of large data-sets
• Expensive to write
• Require domain knowledge
• Rigid when dealing with ambiguity
4
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller sophievink. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.96. You're not tied to anything after your purchase.