Computational Linguistics (800962-B-6)
Summary Midterm 2021/2022
Made by Saskia Kriege
Lecture 1-7
,CLASS 1 – INTRODUCTION
KC – Introduction
Computational Linguistics
= field concerned with using automatic computational methods to analyze and synthesize
natural languages (text, speech, gestures)
Natural Languages
Study of NL is the domain of linguistics, which has developed several tools to formalize and
describe languages.
Used to automate language processing, better describe how a system works and evaluate it
sensibly.
We can build upon tools from linguistics such that our automatic tools can help with NL
Rely on developed tools
Automatic Computational Methods
Several efficient and scalable algorithms and formalisms needed.
Algorithms and formalisms used to automate language analyses; these algorithms typically
been developed in computer science
Own algorithms not needed when there are already very efficient ones we can use
Those algorithms are optimised;
Used to perform tasks accurately and quickly, in an optimal way (reduced complexity as
much as possible)
Analyze and Synthesize
NLs are often usefully characterized in terms of probability distributions over discrete units
(words, sequential information, meaning,..)
Statistics provides tools to manipulate probability distributions correctly and use this
information to resolve ambiguity.
Explain NLs in terms of statistics (probability of a word occurring, model sequencing,
represent meaning,..)
- Linguistics → characteristics
- Computer science, complexity → algorithms
- Statistics → tools to manipulate probability distributions, respecting everything that’s
important about those distributions. To deal with inherent ambiguity
What is and isn’t NL?
Is:
- Conventional
- Set of related systems
- Redundant → fast changing
- Subjected to change
- Context dependent → same thing in different contexts can be interpreted in very
different ways
,Isn’t:
- Formal logic → conventional and set of related systems yet, but not redundant,
changes very slowly, can be context dependent but in principle not
- Programming language → conventional and set of related systems yes, can be
redundant, is subjected to change, but not context-dependent
- Machine language → interaction with machine is conventional, not redundant, not
subjected to change, not context dependent
Those three are not context dependent
Those characteristics make automatic analysis hard
Goals
- Infer the component symbols of a language, roles, rules for combining, meaning of
symbols
- Formalize rules for combining symbols
Learn from linguistics
- Combine atomic meanings
- Understand large portions of text
- Produce complex sentences, needs to be context appropriate
Why should we care?
Language is most natural way in which we interact is language.
Emojis are language in a way.
We find it natural to interact via language. It is what makes us human.
CL provides tools to study these interactions and supports the implementation of automatic
tools to interact using language.
KC - Levels of Analysis
Phonology
→Studies linguistic sounds to construct inventories of sounds with linguistic roles
Basic unit is sound with linguistic vowel
/l/ and /r/ different in English
Sound can be different but meaning same
Phonetics more focused on variations on sound than linguistic role
Phonology = study of sounds, especially different patterns in different languages. Find
linguistic roles to sounds → phonemes
Phonetics = studies production, transmission, reception of sound. How human sounds are
made → phones
Segmentation
Task of splitting text or speech into symbols (letters, phonemes, morphemes, words,
chunks,..) of appropriate granularity.
Granularity is level of detail
Finding boundaries where one ends and another starts
, When is it relevant?
- Context understanding
Differences across language?
- Different boundaries
- Some languages put a lot in one word (person, plural, etc)
Morphology
How words are built up from smaller meaning-bearing units (morphemes)
Morphological complexity varies cross-linguistically; some languages have simple
morphological systems, others crazy complex ones
‘words’ is a morpheme, but the ‘s’ here as well, as this tells it is plural
Morpheme = smalles meaningful unit of language (in, come, -ing, → incoming)
How much meaning units you pack into a word → Latin; you can tell a lot into one unit
In English spread out over multiple words
Analytical morphology = English
Syntactic morphology = when a lot of meaning is packed into one unit, Latin
Syntax
Set of rules, principles and processes for combining symbols according to the structure of
language.
Asserts whether a sentence is well-formed in a language
Deals with structure, not concerned with meaning. Combining units
Structure + rules that make structure good
Lexical Semantics
Describes the meaning of single symbolic units (words, morphemes, collocations)
Aims to classify and decompose lexical items, compare lexical semantic structures cross-
linguistically, understand similarities across items
Compositional Semantics
How atomic meanings are combined into larger meaningful units, such as sentences,
paragraphs, a book, etc.
Meaning of a phrase is determined by combining the meaning of its subphrases, using
rules driven by syntactic structure
How smaller parts are combined to form larger meaningful units
Pragmatics
How context influences meaning, encompassing semantics, linguistic knowledge of
participants, situational context, shared knowledge, goals and intent
I made her duck; ambiguity everywhere
- Morpho-syntactic; her (dative vs possessive) and duck (noun or verb)
- Semantic; make (cook, create, cause, transform) and duck (bird, action of avoiding)
- Syntactic: make (transitive vs ditransitive)
- Phonological; I-eye, made-maid.
Once we make a certain interpretation, less interpretations become possible