100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Samenvatting Taaltheorie en Taalverwerking Deeltentamen 1 €4,99   In winkelwagen

Samenvatting

Samenvatting Taaltheorie en Taalverwerking Deeltentamen 1

2 beoordelingen
 233 keer bekeken  10 aankopen

Dit is een samenvatting van het eerste deeltentamen van het vak Taaltheorie en Taalverwerking van de Universiteit van Amsterdam. De samenvatting is een mix van stof uit het boek (Jurafsky) en extra stof uit de colleges. De samenvatting is op volgorde van de colleges

Voorbeeld 3 van de 25  pagina's

  • Onbekend
  • 18 april 2020
  • 25
  • 2015/2016
  • Samenvatting
book image

Titel boek:

Auteur(s):

  • Uitgave:
  • ISBN:
  • Druk:
Alle documenten voor dit vak (2)

2  beoordelingen

review-writer-avatar

Door: ramtinbashardoost80 • 4 maanden geleden

review-writer-avatar

Door: luuk90 • 2 jaar geleden

avatar-seller
kimgouweleeuw
Taaltheorie en Taalverwerking Samenvatting 1
Lecture 1a
Reasons for studying natural language processing (NLP) within AI:
 You want a computer to communicate with users in their terms.
 There is a vast store of information recorded in natural language that can be accessible via
computers: news, government reports, social media etc.
 Three main types of applications:
o Enabling human-computer communication
o Improving human-human communication
o Doing useful processing of text or speech

Formal (FLs) and Natural (NLs) languages:
 Formal (computer) languages: e.g. Java, Prolog, Python, HTML
 Natural (human) languages: e.g. Dutch, English, Spanish, German, Japanese
 ‘Languages’ that represent the behavior of a machine or a system: e.g. think about
‘communicating’ with a vending machine via coin insertions and button presses: pressButton1

Syntax trees:
 Syntax: Describes the structural properties of the language. Natural language is much more
complicated than the formal languages used for the artificial languages of logics and computer
programs.




Natural language semantics:
 Semantics: Represents the meaning of words and sentences. Logic is a good candidate.
 Consider the sentence:
o Every student has access to a computer
 The meaning of this can be expressed as two different logical formulas:
o ∀x.(student(x) ⇒ ∃y.(computer(y) ∧ hasAccessTo(x, y)))
 Every student has access to their own computer
o ∃y.(computer(y) ∧ ∀x.(student(x) ⇒ hasAccessTo(x, y)))
 There is a computer to which every student has access to
 Problem: How can (either of) these formulas be mechanically generated from a syntax tree for
the original sentence?

FLs and NLs:
 There are close relationships between FLs and NLs, but also important differences:
o FLs can be pinned down by a precise definition.
o NLs are fluid, fuzzy at the edges and constantly evolving.
o NLs are riddled with ambiguity at all levels.
 This is normally avoidable in FLs.

,Ambiguity in Natural Language:
 Phonological ambiguity: ‘an ice lolly’ vs. ‘a nice lolly’
 Lexical ambiguity: ‘bass’ has at least two meanings: fish and musical instrument
 Syntactic ambiguity: two possible syntax trees for ‘complaints about referees multiplying
o Could mean:
 There are more and more complaints about referees
 Referees are multiplying and there are complaints about that
 Semantic ambiguity: ‘Please use all available doors while boarding the train’ vs. ‘Please fill all
sections in the questionnaire’
o First sentence: Doesn’t mean that you personally have to use every single door
o Second sentence: Does mean that you have to fill in every single section

Levels of Language Complexity:
 Some languages features are ‘more complex’ (harder to describe, harder to process) than
others. We can classify languages on a scale of complexity known as the Chomsky Hierarchy:
o Regular languages: Those whose phrases can be ‘recognized’ by a finite state machine
o Context-free languages: Many aspects of NLs can be described at this level (also used
for most programming languages)
o Context-sensitive languages: Some NLs involve features at this level of complexity
o Recursively enumerable languages: All languages that can in principle be defined via
mechanical rules.

Formal Languages:
 A formal language is a set of strings
 Each string is composed of symbols from a set called an alphabet (or a vocabulary)
o Examples of alphabets:
 English letters: Σ = {a, b, c . . . z}
 Decimal digits: Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
 Programming language ‘tokens’: Σ = {if, while, x, ==}
 ‘Primitive’ actions performed by a machine or system, e.g. a vending machine
Σ = {insert50c, pressButton1, ...}
 Words in (some fragment of) a natural language: Σ = {the, an, a, dog, cat,
sleeps}
o Examples of strings over alphabets:
 Let Σ1 = {0, 1} be an alphabet. Then 01101, 000001, 1101 are strings over Σ1.
— in fact, all binary numbers are strings over Σ1
 Let Σ2 = {a, b, c, d, e, f, g} be an alphabet. Then bee, dad, cabbage, and face
are strings over Σ2, as are fffff and agagag.
 Let Σ3 = {ba, ca, fa, ce, fe, ge} be an alphabet. Then face is a string over Σ3 but
bee, dad or cabbage are not.
 Let Σ4 = {♠, ©, ♣} be an alphabet. Then ♠♠ and ♣©♣ are strings over Σ4.
 The length of a string is the number of token symbols from the alphabet it contains
o Examples:
 The length of ‘face’ over Σ2 = {a, b, c, d, e, f, g} is 4
 The length of ‘face’ over Σ3 = {ba, ca, fa, ce ,fe, ge} is 2
 The string of length 0 is called the empty string, denoted by ε (epsilon).
 Given a string s, a substring of s is a string formed by taking contiguous symbols of s in the
order in which they occur in s.
 An initial substring is called prefix and a final substring a suffix.

, o Let unthinkable be a string over Σ = {a, b, c . . . x, y, z} Then, ε, un, unth, unthinkable
are prefixes, while ε, e, able, thinkable, and unthinkable are suffixes. Other substrings
include nthi, inka, bl.
 Σ* denotes the set of all strings over an alphabet Σ
o Σ is always infinite, regardless of the number of symbols Σ contains.
 We may now define a formal language L over an alphabet Σ as any subset of Σ*: L ⊆ Σ*
o Examples: let Σ = (a, b, c … x, y, z}. Then Σ* is the set of strings over the Latin alphabet
and the following subsets of Σ* are possible formal languages:
 The set of strings consisting of consonants (medeklinker) only
 The set of strings containing at least one vowel (klinker) and one consonant
 The set of strings whose length is less than 9 symbols
 The set {one, two, tree, four, five, six, seven, eight, nine, ten}
 The set of all English words
 The empty set

Ways to define a Formal Language:
 Given an alphabet Σ and the infinite set Σ* of formal languages it can give rise to, how can we
select a particular formal language?
o Direct mathematical definition: Σ = {a, b, c}, L1 = {aa, bb, abc, abcc}, L2 = {a nbn |n > 0}
o Formalisms (formal expressions and grammars): sets of rules
o Automata: computational devices for computing languages
 Specify some machine for testing whether a string is legal or not
 Formalisms and automata allow us to distinguish a formal language of interest (a set of
strings) from other possible languages over a given alphabet
o They capture the patterns that characterize a language
o As such, they act as a definition of the language they capture
 From an abstract point of view, a natural language, like Dutch, is a set of strings
(sounds/letters/words etc.)
 Therefore, formalisms and automata can help us to model aspects of natural languages

Regular expressions:
 Regular expressions are a formal notation for characterizing sets of strings that follow a fairly
simple regular pattern.
 We can construct regular expressions over an alphabet Σ as follows:
(these regular expressions are written in mathematical notation)




o We often ignore the dot in concatenation, and simply write ab
o Disjunction (or union) may be written as a|b a+b or a∪b
o a+ is the set of a-strings with at least one a (same as a*a or aa*)
o an can be used to abbreviate the concatenation of a with itself n times
o the notation Σ* can be seen as abbreviating (a|b|…)* for any symbol a, b, … in Σ
 Examples: Let Σ = {a, b, c … x, y, z}
o me(o)*w mew, meow, meooow, meooooooooooow etc.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper kimgouweleeuw. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €4,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 72841 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen
€4,99  10x  verkocht
  • (2)
  Kopen