Computational Linguistics (800962-B-6)
2021/2022
Summary Final
Lecture 8-12
Written by Saskia Kriege
,Lecture 8
Computational Linguistics Lecture 8 – Parsing
YOUTUBE – PCFGs
Probabilities assigned to rules
Any non-terminal has a number of possible expansions, of which the probabilities sum to 1.
Conditional probability of condition of particular non-terminal.
Prob of entire tree = Product of probabilities of different rules
Non-terminal → sequence of non-terminals / one terminal
PARSING
Probabilities, chunks, dependencies
PCFGs = probabilistic context-free grammar
Not only whether sentence is well-formed under the grammar, also delivers likeliest parse for
target sentence given the grammar
Declarative to
Given sequence of words, we want to find the best schematization of grammar
Statistical parsing uses a probabilistic model of syntax in order to assign probabilities to
each parse tree.
Resolving syntactic ambiguity
Differs from CFG by augmenting each rule with conditional probability
Structural Ambiguity
When more than one parse tree can be assigned to a sentence based on the rules in the
grammar.
Attachment ambiguity and coordination ambiguity
Attachment ambiguity: (what do we attach it to?)
Shot an elephant in my pyjamas →
I = NP
Shot an elephant
What does in my pyjamas attach to? Subject I, or an elephant was in my pyjamas when I shot
it
Whole = VP, with elephant as direct object and prepositional phrase modifying this?
We need semantics/lexical info to disambiguate this (meaning of elephant)
Or are those structures more likely in the language? Then we don’t need meaning, only
likelihood of structure in a language
Coordination ambiguity: and (what does coordination entail?)
Tasty sandwiches and flowers make my grandma happy
To what attaches tasty?
Are sandwiches and flowers tasty?
Or only sandwiches → 2 NP coordinated then
Only considering structure, both are possible
Flowers are not tasty, we bring information to disambiguate and prefer the attachment to
sandwiches only.
,More structures can be assigned to one language by the same grammar
CFG → don’t consider meaning, only structural coherence
CKY to use whether it is correct under this grammar
If we want to pick a specific parse, we need more
Level of structure and get likeliest parse structure, we need some way to assign probabilities
to different options and pick most likely one
Parsing as Recognition
CKY only tells if a sentence is well-formed given grammar, not which parse best captures
sentence structure
Sometimes only one parse possible, but how do we choose if two or more are valid?
PCFGs, Which parse best captures sentence structure
Resolving Ambiguity
Rules in CFGs can be augmented with probs indicating how likely RHS is given LHS in order
to compute likelihood of each parse tree given grammar
Probs to different production rules, and use to assign to parse tree
Probabilistic CFG, the CFGs whose rules augmented with probabilities
Extending CFG to PCFG
In CNF/CFG, each rule form X → YZ or X → y
In PCFG in CNF, each rule augmented with prob indicating how likely RHs is given LHS:
X → YZ [ p(YZ | X ) ] or X → y [p(y|X)]
Difference CFG and PCFG:
CFG has the rules. A list of rules that define the set of all well-formed sentences in a
language
PCFG also adds a probability of how likely a RHS is given LHS
Prob of parse tree T
Product of probs of each rule used to expand non-terminals in the parse tree T
Given rules LHS → RHS, prob of any parse tree over a sentence is defined as
Probs can be * bc independent of context and independent of each other (bag of words
assumption and conditional independence)
Pick one with highest probability
Joint prob of parse tree and sentence
Loop through all parse trees, multiply prob of RHS and LHS
, Simplify a bit
p(Ti, s) represents:
- Joint prob of sentence s and parse tree i
- Joint prob of parse tree i, since
p(Ti, s) = p(s|Ti)p(Ti)
boils down to
p(Ti,s) = p(Ti)
as p(s|Ti) is 1, because a parse tree includes all words in s
p(s|Ti) is a constant, 1 so can be left out of equation
Argmax
Out of all parse trees that can parse s (s is called the yield of a parse tree T), the
disambiguation algorithm picks the best parse tree T^ with highest prob given s
With T such that yield(T)=S, since we don’t care about parse trees with a different yield than
the sentence we’re parsing
In estimation, we kick out the sentence. Because we don’t care about parse trees with different
yields. Wont produce meaningful results
Prob of a sentence
PCFGs assign probabilities to strings of words, i.e. sentences, just like language model would
If sentence is unambiguous, its probability under the PCFG is the probability of its only parse
tree with yield s
If s is ambiguous, then its probability is the sum over the probs of all parse trees with yield s
Forward algorithm instead of decoding algorithm
Best option/all options put together
Prob of certain sentence → consider every interpretation
Best one → compare those and get one with highest prob
Cognitive Science → studies shown that people are sensitive to kind of probs across
constituents that PCFG captures, but not Markov chain → garden path sentences
The horse raced passed the bar \and fell → fell doesn’t fit in parse tree
Parse tree has to be adjusted while analysing the sentence
The student forgot the solution was on the back of the book
The student forgot the solution is perfect sentence
But not over, so have to update structure you inferred for the sentence
People form structural relationships, if violated, processing costs
AI → conversational agent that’s human-like, costs should be incorporated
If better than humans, don’t incorporate such costs
Ngram stats with language modelling