Samenvatting

Samenvatting van alle lectures

88 keer bekeken 6 keer verkocht

Vak
Text Mining (L_PABAALG002)

Instelling
Vrije Universiteit Amsterdam (VU)

Deze beknopte samenvatting gemaakt en ik had eenm 8.5 gehaald. Ik hoop jij ook straks!

[Meer zien]

Voorbeeld 3 van de 16 pagina's

Bekijk voorbeeld

Geupload op 10 oktober 2023
Aantal pagina's 16
Geschreven in 2021/2022
Type Samenvatting

Volgen

gideonrouwendaal Lid sinds 2 jaar 41 documenten verkocht

€10,49

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Lecture 1: Introduction
Computational linguistics: algorithms that model language data, e.g., similarity, information value
and sequence probabilities (mathematical view)

Natural Language Processing (NLP): engineering to address aspects of natural language, e.g.,
tokenization, lemmatization, compound splitting, syntactic splitting, entity detection, sentiment
analysis… (engineering view)

NLP Toolkits: software packages and resources that provide and/or combine collections of NLP
modules

Language applications: machine translation, summarization, chat bots, text mining

Text mining: from unstructured text to structured data (information or knowledge)

Lecture 2: Linguistic and Natural Language Processing
Subdiscipline Medium or unit Natural Language Model
Phonetics, phonology Sounds Automatic speech recognition
Morphology Words, word formation Part-of-speech taggers,
lemmatizes, compound
splitters
Syntax Sentences, grammatical structure and Syntactic parsers, chunkers
function
Semantics Meaning Semantic parsers
Pragmatics Language use in context Context and domain models
Methods Introspection, behaviorism, empirical
(experimental and stochastic),
mathematical models
Resources Lexicons (dictionary as database),
grammars, data collections and
annotations, data models, annotations
We use minimal information to express a lot (e.g., riots in Amsterdam, exactly know which). Without
context, data (spoken words etc.) is difficult to understand.

- Morphology: study of form and structure of words. Words are composed of morphemes.
Morpheme is the smallest meaning-bearing unit (e.g., talked contains of 2 morphemes: -talk
(activity) and -ed (past)

Different types of morphemes:

- Free morphemes: occur independently (e.g., boy, sing)
- Bound morphemes: attached to another morpheme, and cannot be used independently
(English -s: boys, Dutch -s/en: appels/appelen)
- Affix: prefixes (e.g., gelopen), infixes (e.g., burgemeesterspost) suffixes (e.g., loopje)

Some other basic terms:

- Root or Base: an un-analysable morpheme, expressing the basic lexical content of a word.
Also defined as ‘what is left of a complex form when affixes are stripped’
- Stem: consists of at least a root. It can contain a derivational affix(es) “aardigste”  “aardig”
/ “aard”

, - Lemma: an entry in a dictionary. Single form for nouns (“stemmetje”  “stem”) and
infinitive form for verbs “stemde”  “stemmen”)

The difference between stem and lemma is that stem does not have to be an actual word, whereas
lemma is an actual language word.

Words have part-of-speech (PoS), which specifies the typical phrase structures in which they can be
the head. Open Class (open to word formation and neologisms). Noun (N, boat), Verb (V, float),
Adjective (A, large/fast), Adverb (very/largely). New words are invented veery day and other words
are forgotten. Millions of open class words if we include specialized language. Closed Class (you can
not invent a new closed class word). Pronoun (PRN, he/him/…), Preposition (P, in, at, from…).
Relatively fixed, slowly change over generations; small set of less than a hundred words.

Word modification: given a root, base or stem derive different forms. Inflection: expresses syntactic
properties such as person (1, 2, 3), number (singular/plural), gender, tense… Derivation: changes
semantic and grammatical properties, e.g., incapable. Compounding: “beach head”. Combinations:
aircraft-carriers. Word formation is very productive, our lexicon is potentially infinite: the number of
unseen compounds detected in German and Dutch newspapers grows linearly with the number of
newspapers over time. The names for new chemical compounds and proteins grow rapidly every
year. New products launched every year.

Zipfian distribution (Zipfs law): the frequency of a word in a ranked list is the equal to the frequency
of the most frequent word, divided by the rank. Most frequent words also tend to be short and have
many different meanings.

Lexicon of forms: lists all common base forms with: their part-of-speech, inflectional paradigm
(plural, singular, person, tense) and typical (conventional) derived forms. Inflectional paradigms (-s, -
ed) and derivational morphemes (-ation, -ity, -ly).

Morphology in computation linguistics: analyzing complex words, defining their component parts
(ant+dis+establishment+…). Analysis of grammatical information, encoded in words: part-of-speech
= VERB and inflectional information = [PERSON 3, NUMBER singular, TENSE present]. Obtaining the
stem or root: to reduce the size of the data and to find the word in the lexicon.

Part-of-speech tagging: task is to assign the part-of-speech category to every token and add the
lemma. The main challenge is data sparseness for specific languages and domains. PoS-tagging has
an accuracy around 95-96% for all tokens when training and testing. Remaining issues: long distance
dependencies/genuine ambiguities, annotation errors and unknown words. Relatively high
proportion of sentences has at least one error These errors can propagate: wrong PoS may lead to
wrong word sense/named entity…

Multiword expression: fixed idioms (an apple a day keeps the doctor away), less fixed idioms
(shooting from the hip), slots (X, let alone Y), collocations (running engine, running a programme)
and selectional restrictions (a glass of …)

- Syntax: we experience a sentence as a complete grammatical structure. We can freely
combine words into phrases or constituents and we have a strong intuition about the
grammaticality of these structures within a sentence.

Phrase: a word or a group of words which functions as a single unit within a grammatical hierarchy.
A phrase is built around a head lexical item and has a certain syntactic behaviour (she  Noun
Phrase (NP), the head is a pronoun. A very beautiful morning (NP, the head is a noun). Chases the cat

,  Verb phrase (VP, head is a verb)). The head of a phrase is the element that determines the
syntactic function of the whole phrase.

Syntactic elements

Phrasal categories Lexical categories
Noun phrase (NP Noun (N)
Prepositional phrase (PP) Pronoun (Pr)
Verb phrase (VP) Adjective (A)
Adverbial phrase (AdvP) Adverb (Adv)
Adjectival phrase (AP) Verb (V)
Preposition (P)
A phrase structure can be nested. The nesting is hierarchically and have head – modifier relations.
For example:

- Very nice = Adjective Phrase or AP (head is an adjective (A))
- A very nice looping = NP (head is a noun (N))
- Performs a nice looping = VP (head is a verb (V))
- With a long stick = Prepositional phrase (head is preposition
(P))
- The cow performs a very nice looping with a long stick =
Sentence (S)

Phrase functions: subject, object, main verb, modifier, adjunct…
phrase functions and the different categories can be modelled inside
a syntactic tree:

Gram Subject: agreement with the main verb

Gram Objects: obligatory NPs or PPs to form a grammatical sentence

Syntax Tree with dependency labels:

Most important types of predicates in terms of obligatory arguments
(the complementation = that what is needed to obtain a grammatical
structure:

Valency Predicate Complementation Example
Intransitive walk.v NP.subject The cow walks
Transitive Perform.v NP.subject, NP.direct object The cow performs a loopring
Transitive Count.v NP.subject, PP(on).pp – object The cow is hoping for a big
applause
Transitive Be.v NP.subject, NP.object/AP.object This cow is a
phenomenom/this cow is
phenomenal
Ditransitive Give.v NP.subject, NP.direct object, The cow gives the spectators
NP.indirect object an unforgettable day
A lexicon provides a list of verbs with their complementation patterns

Phrase structure parsers: lookup words from a sentence in a sentence to find a candidate for a main
verb. Get the obligatory arguments of the verb. Match the structure of surrounding phrases with the

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper gideonrouwendaal. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €10,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 50843 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Verkoper

Samenvatting

Samenvatting van alle lectures

Document informatie

Onderwerpen

Geschreven voor

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?