Samenvatting

Samenvatting Taaltheorie en Taalverwerking Deeltentamen 1

Name: Samenvatting Taaltheorie en Taalverwerking Deeltentamen 1
SKU: doc_689012
Rating: 3.50 (2 reviews)
Author: kimgouweleeuw

2 beoordelingen

233 keer bekeken 10 keer verkocht

Instelling
Universiteit Van Amsterdam (UvA)

Boek
Speech and Language Processing

Dit is een samenvatting van het eerste deeltentamen van het vak Taaltheorie en Taalverwerking van de Universiteit van Amsterdam. De samenvatting is een mix van stof uit het boek (Jurafsky) en extra stof uit de colleges. De samenvatting is op volgorde van de colleges

[Meer zien]

Voorbeeld 3 van de 25 pagina's

Bekijk voorbeeld

Heel boek samengevat? Onbekend
Geupload op 18 april 2020
Aantal pagina's 25
Geschreven in 2015/2016
Type Samenvatting

tttv
kunstmatige intelligentie
taaltheorie en taalverwerking

2 beoordelingen

Door: ramtinbashardoost80 • 8 maanden geleden

Door: luuk90 • 2 jaar geleden

Volgen

kimgouweleeuw Lid sinds 4 jaar 86 documenten verkocht

€4,99

Ook beschikbaar in voordeelbundel v.a. €9,49

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Ook beschikbaar in voordeelbundel (1)

Samenvatting Taaltheorie en Taalverwerking

€ 9,98 € 9,49

8x verkocht

2 items

1. Samenvatting - Samenvatting taaltheorie en taalverwerking deeltentamen 1
2. Samenvatting - Samenvatting taaltheorie en taalverwerking deeltentamen 2
Meer zien

Taaltheorie en Taalverwerking Samenvatting 1
Lecture 1a
Reasons for studying natural language processing (NLP) within AI:
 You want a computer to communicate with users in their terms.
 There is a vast store of information recorded in natural language that can be accessible via
computers: news, government reports, social media etc.
 Three main types of applications:
o Enabling human-computer communication
o Improving human-human communication
o Doing useful processing of text or speech

Formal (FLs) and Natural (NLs) languages:
 Formal (computer) languages: e.g. Java, Prolog, Python, HTML
 Natural (human) languages: e.g. Dutch, English, Spanish, German, Japanese
 ‘Languages’ that represent the behavior of a machine or a system: e.g. think about
‘communicating’ with a vending machine via coin insertions and button presses: pressButton1

Syntax trees:
 Syntax: Describes the structural properties of the language. Natural language is much more
complicated than the formal languages used for the artificial languages of logics and computer
programs.

Natural language semantics:
 Semantics: Represents the meaning of words and sentences. Logic is a good candidate.
 Consider the sentence:
o Every student has access to a computer
 The meaning of this can be expressed as two different logical formulas:
o ∀x.(student(x) ⇒ ∃y.(computer(y) ∧ hasAccessTo(x, y)))
 Every student has access to their own computer
o ∃y.(computer(y) ∧ ∀x.(student(x) ⇒ hasAccessTo(x, y)))
 There is a computer to which every student has access to
 Problem: How can (either of) these formulas be mechanically generated from a syntax tree for
the original sentence?

FLs and NLs:
 There are close relationships between FLs and NLs, but also important differences:
o FLs can be pinned down by a precise definition.
o NLs are fluid, fuzzy at the edges and constantly evolving.
o NLs are riddled with ambiguity at all levels.
 This is normally avoidable in FLs.

,Ambiguity in Natural Language:
 Phonological ambiguity: ‘an ice lolly’ vs. ‘a nice lolly’
 Lexical ambiguity: ‘bass’ has at least two meanings: fish and musical instrument
 Syntactic ambiguity: two possible syntax trees for ‘complaints about referees multiplying
o Could mean:
 There are more and more complaints about referees
 Referees are multiplying and there are complaints about that
 Semantic ambiguity: ‘Please use all available doors while boarding the train’ vs. ‘Please fill all
sections in the questionnaire’
o First sentence: Doesn’t mean that you personally have to use every single door
o Second sentence: Does mean that you have to fill in every single section

Levels of Language Complexity:
 Some languages features are ‘more complex’ (harder to describe, harder to process) than
others. We can classify languages on a scale of complexity known as the Chomsky Hierarchy:
o Regular languages: Those whose phrases can be ‘recognized’ by a finite state machine
o Context-free languages: Many aspects of NLs can be described at this level (also used
for most programming languages)
o Context-sensitive languages: Some NLs involve features at this level of complexity
o Recursively enumerable languages: All languages that can in principle be defined via
mechanical rules.

Formal Languages:
 A formal language is a set of strings
 Each string is composed of symbols from a set called an alphabet (or a vocabulary)
o Examples of alphabets:
 English letters: Σ = {a, b, c . . . z}
 Decimal digits: Σ = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
 Programming language ‘tokens’: Σ = {if, while, x, ==}
 ‘Primitive’ actions performed by a machine or system, e.g. a vending machine
Σ = {insert50c, pressButton1, ...}
 Words in (some fragment of) a natural language: Σ = {the, an, a, dog, cat,
sleeps}
o Examples of strings over alphabets:
 Let Σ1 = {0, 1} be an alphabet. Then 01101, 000001, 1101 are strings over Σ1.
— in fact, all binary numbers are strings over Σ1
 Let Σ2 = {a, b, c, d, e, f, g} be an alphabet. Then bee, dad, cabbage, and face
are strings over Σ2, as are fffff and agagag.
 Let Σ3 = {ba, ca, fa, ce, fe, ge} be an alphabet. Then face is a string over Σ3 but
bee, dad or cabbage are not.
 Let Σ4 = {♠, ©, ♣} be an alphabet. Then ♠♠ and ♣©♣ are strings over Σ4.
 The length of a string is the number of token symbols from the alphabet it contains
o Examples:
 The length of ‘face’ over Σ2 = {a, b, c, d, e, f, g} is 4
 The length of ‘face’ over Σ3 = {ba, ca, fa, ce ,fe, ge} is 2
 The string of length 0 is called the empty string, denoted by ε (epsilon).
 Given a string s, a substring of s is a string formed by taking contiguous symbols of s in the
order in which they occur in s.
 An initial substring is called prefix and a final substring a suffix.

, o Let unthinkable be a string over Σ = {a, b, c . . . x, y, z} Then, ε, un, unth, unthinkable
are prefixes, while ε, e, able, thinkable, and unthinkable are suffixes. Other substrings
include nthi, inka, bl.
 Σ* denotes the set of all strings over an alphabet Σ
o Σ is always infinite, regardless of the number of symbols Σ contains.
 We may now define a formal language L over an alphabet Σ as any subset of Σ*: L ⊆ Σ*
o Examples: let Σ = (a, b, c … x, y, z}. Then Σ* is the set of strings over the Latin alphabet
and the following subsets of Σ* are possible formal languages:
 The set of strings consisting of consonants (medeklinker) only
 The set of strings containing at least one vowel (klinker) and one consonant
 The set of strings whose length is less than 9 symbols
 The set {one, two, tree, four, five, six, seven, eight, nine, ten}
 The set of all English words
 The empty set

Ways to define a Formal Language:
 Given an alphabet Σ and the infinite set Σ* of formal languages it can give rise to, how can we
select a particular formal language?
o Direct mathematical definition: Σ = {a, b, c}, L1 = {aa, bb, abc, abcc}, L2 = {a nbn |n > 0}
o Formalisms (formal expressions and grammars): sets of rules
o Automata: computational devices for computing languages
 Specify some machine for testing whether a string is legal or not
 Formalisms and automata allow us to distinguish a formal language of interest (a set of
strings) from other possible languages over a given alphabet
o They capture the patterns that characterize a language
o As such, they act as a definition of the language they capture
 From an abstract point of view, a natural language, like Dutch, is a set of strings
(sounds/letters/words etc.)
 Therefore, formalisms and automata can help us to model aspects of natural languages

Regular expressions:
 Regular expressions are a formal notation for characterizing sets of strings that follow a fairly
simple regular pattern.
 We can construct regular expressions over an alphabet Σ as follows:
(these regular expressions are written in mathematical notation)

o We often ignore the dot in concatenation, and simply write ab
o Disjunction (or union) may be written as a|b a+b or a∪b
o a+ is the set of a-strings with at least one a (same as a*a or aa*)
o an can be used to abbreviate the concatenation of a with itself n times
o the notation Σ* can be seen as abbreviating (a|b|…)* for any symbol a, b, … in Σ
 Examples: Let Σ = {a, b, c … x, y, z}
o me(o)*w mew, meow, meooow, meooooooooooow etc.

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper kimgouweleeuw. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €4,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 52355 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Verkoper

Samenvatting

Samenvatting Taaltheorie en Taalverwerking Deeltentamen 1

Document informatie

Onderwerpen

Gekoppeld boek

Meer samenvattingen voor studieboek

Geschreven voor

2 beoordelingen

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?