100% tevredenheidsgarantie Direct beschikbaar na betaling Zowel online als in PDF Je zit nergens aan vast
logo-home
Summary Computational Linguistics Final €6,99
In winkelwagen

Samenvatting

Summary Computational Linguistics Final

 39 keer bekeken  4 keer verkocht

Extensive summary of the second part of the course Computational Linguistics (lecture 8-12). Also possible to buy both summaries for the midterm and final in a bundle (cheaper :))!

Voorbeeld 4 van de 58  pagina's

  • 6 juni 2022
  • 58
  • 2021/2022
  • Samenvatting
  • parsing
  • pcfg
  • cfg
  • argmax
  • pcky
  • cky
  • parseval
Alle documenten voor dit vak (2)
avatar-seller
saskiakriege
Computational Linguistics (800962-B-6)
2021/2022
Summary Final
Lecture 8-12
Written by Saskia Kriege

,Lecture 8
Computational Linguistics Lecture 8 – Parsing

YOUTUBE – PCFGs
Probabilities assigned to rules
Any non-terminal has a number of possible expansions, of which the probabilities sum to 1.
Conditional probability of condition of particular non-terminal.
Prob of entire tree = Product of probabilities of different rules

Non-terminal → sequence of non-terminals / one terminal


PARSING
Probabilities, chunks, dependencies

PCFGs = probabilistic context-free grammar
Not only whether sentence is well-formed under the grammar, also delivers likeliest parse for
target sentence given the grammar
Declarative to
Given sequence of words, we want to find the best schematization of grammar
Statistical parsing uses a probabilistic model of syntax in order to assign probabilities to
each parse tree.
Resolving syntactic ambiguity
Differs from CFG by augmenting each rule with conditional probability

Structural Ambiguity
When more than one parse tree can be assigned to a sentence based on the rules in the
grammar.
Attachment ambiguity and coordination ambiguity

Attachment ambiguity: (what do we attach it to?)
Shot an elephant in my pyjamas →
I = NP
Shot an elephant
What does in my pyjamas attach to? Subject I, or an elephant was in my pyjamas when I shot
it
Whole = VP, with elephant as direct object and prepositional phrase modifying this?
We need semantics/lexical info to disambiguate this (meaning of elephant)
Or are those structures more likely in the language? Then we don’t need meaning, only
likelihood of structure in a language

Coordination ambiguity: and (what does coordination entail?)
Tasty sandwiches and flowers make my grandma happy
To what attaches tasty?
Are sandwiches and flowers tasty?
Or only sandwiches → 2 NP coordinated then
Only considering structure, both are possible
Flowers are not tasty, we bring information to disambiguate and prefer the attachment to
sandwiches only.

,More structures can be assigned to one language by the same grammar

CFG → don’t consider meaning, only structural coherence

CKY to use whether it is correct under this grammar
If we want to pick a specific parse, we need more
Level of structure and get likeliest parse structure, we need some way to assign probabilities
to different options and pick most likely one


Parsing as Recognition
CKY only tells if a sentence is well-formed given grammar, not which parse best captures
sentence structure
Sometimes only one parse possible, but how do we choose if two or more are valid?
PCFGs, Which parse best captures sentence structure

Resolving Ambiguity
Rules in CFGs can be augmented with probs indicating how likely RHS is given LHS in order
to compute likelihood of each parse tree given grammar

Probs to different production rules, and use to assign to parse tree

Probabilistic CFG, the CFGs whose rules augmented with probabilities

Extending CFG to PCFG
In CNF/CFG, each rule form X → YZ or X → y
In PCFG in CNF, each rule augmented with prob indicating how likely RHs is given LHS:

X → YZ [ p(YZ | X ) ] or X → y [p(y|X)]

Difference CFG and PCFG:
CFG has the rules. A list of rules that define the set of all well-formed sentences in a
language
PCFG also adds a probability of how likely a RHS is given LHS

Prob of parse tree T
Product of probs of each rule used to expand non-terminals in the parse tree T
Given rules LHS → RHS, prob of any parse tree over a sentence is defined as




Probs can be * bc independent of context and independent of each other (bag of words
assumption and conditional independence)

Pick one with highest probability

Joint prob of parse tree and sentence
Loop through all parse trees, multiply prob of RHS and LHS

, Simplify a bit
p(Ti, s) represents:
- Joint prob of sentence s and parse tree i
- Joint prob of parse tree i, since
p(Ti, s) = p(s|Ti)p(Ti)
boils down to
p(Ti,s) = p(Ti)
as p(s|Ti) is 1, because a parse tree includes all words in s

p(s|Ti) is a constant, 1 so can be left out of equation

Argmax
Out of all parse trees that can parse s (s is called the yield of a parse tree T), the
disambiguation algorithm picks the best parse tree T^ with highest prob given s

With T such that yield(T)=S, since we don’t care about parse trees with a different yield than
the sentence we’re parsing




In estimation, we kick out the sentence. Because we don’t care about parse trees with different
yields. Wont produce meaningful results

Prob of a sentence
PCFGs assign probabilities to strings of words, i.e. sentences, just like language model would
If sentence is unambiguous, its probability under the PCFG is the probability of its only parse
tree with yield s

If s is ambiguous, then its probability is the sum over the probs of all parse trees with yield s
Forward algorithm instead of decoding algorithm
Best option/all options put together
Prob of certain sentence → consider every interpretation
Best one → compare those and get one with highest prob

Cognitive Science → studies shown that people are sensitive to kind of probs across
constituents that PCFG captures, but not Markov chain → garden path sentences
The horse raced passed the bar \and fell → fell doesn’t fit in parse tree
Parse tree has to be adjusted while analysing the sentence
The student forgot the solution was on the back of the book
The student forgot the solution is perfect sentence
But not over, so have to update structure you inferred for the sentence

People form structural relationships, if violated, processing costs

AI → conversational agent that’s human-like, costs should be incorporated
If better than humans, don’t incorporate such costs

Ngram stats with language modelling

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper saskiakriege. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €6,99. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 48298 samenvattingen verkocht

Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Start met verkopen
€6,99  4x  verkocht
  • (0)
In winkelwagen
Toegevoegd