AI P&T reading material second half
Chapter 8: Reasoning with Uncertainty
8.1: Probability
When an agent makes decisions and is uncertain about the outcomes of its actions, it is gambling on
the outcomes. The view of probability as a measure of belief is known as Bayesian
probability or subjective probability. We are assuming that the uncertainty is epistemological –
pertaining to an agent’s beliefs of the world – rather than ontological – how the world is.
Each variable has a domain which is the set of values that the variable can take. A discrete
variable has a domain that is a finite or countable set. A primitive proposition is an assignment of a
value to a variable, or an inequality between variables and values or between variables
(e.g., A=true, X<7 or Y>Z). Propositions are built from primitive propositions using logical connectives.
If X is a random variable, a probability distribution, P(X), over X is a function from the domain
of X into the real numbers such that, given a value x∈domain(X), P(x) is the probability of the
proposition X=x. The distribution over all worlds, P(X1,…,Xn) is called the joint probability
distribution.
Axioms for probability
- 0 ≤ P (α ) for any proposition α. That is, the belief in any proposition cannot be negative
- P ( τ )=1 if τ is a tautology. That is, if τ is true in all possible worlds, its probability is 1
- P ( α ∨ β )=P ( α ) + P(β ) if α and β are contradictory propositions; that is if ¬ ( α ∨ β ) is a
tautology. In other words, if two propositions cannot be both true (they are mutually
exclusive), the probability of their disjunction is the sum of their probabilities.
The measure of belief in proposition ℎ given proposition e is called the conditional
probability of ℎ given e, written P(h∣e). A proposition e representing the conjunction of all of the
agent’s observations of the world is called evidence. Given evidence e, the conditional
probability P(h∣e) is the agent’s posterior probability of ℎ. The probability P(h) is the prior
probability of ℎ and is the same as P(h∣true) because it is the probability before the agent has
observed anything.
Evidence e, where e is a proposition, will rule out all possible worlds that are incompatible with e. the
conditional probability is only defined if P(e)>0.
n
P ( α 1 ∧ α 2 ∧… ∧α n )=∏ P( α i∨α 1 ∧ … ∧α i−1)
i=1
P(e∨h)∗P (h)
Bayes’ rule: P ( h|e )=
P(e )
- P(e∨h) is the likelihood
- P(h) is the posterior probability of the hypothesis h
The expected value of f, written EP(f), with respect to probability P is E P ( f ) = ∑ f ( w )∗P (w)
w ∈Ω
Information content/ entropy of random variable X:
H ( X )= ∑ −P ( X=x )∗log 2 P( X=x )
x ∈domain( X )
,Conditional entropy:
H ( X|Y ) =∑ P ( Y = y )∗∑ −P ( X=x|Y = y )∗log 2 P(X =x∨Y = y )
y x
8.2: Independence
A useful way to limit the amount of information required is to assume that each variable only directly
depends on a few other variables.
Random variable X is conditionally independent of random variable Y given a set of random
variables Zs if P ( X|Y , Zs )=P( X∨Zs).
Variables X and Y are unconditionally independent if P(X,Y)=P(X)P(Y).
8.3: Belief networks
A belief network is a directed model of conditional dependence among a set of random variables.
Define the parents of random variable Xi, written parents(Xi), to be a minimal set of predecessors
of Xi in the total ordering such that the other predecessors of Xi are conditionally independent
of Xi given parents(Xi). Thus Xi probabilistically depends on each of its parents, but is independent of
its other predecessors. That is, parents(Xi)⊆{X1,…,Xi−1} such that
P ( X i|X 1 , … , X i−1 )=P( X i∨ parents ( X i) ). This gives the chain rule:
n
P ( X 1 , X 2 ,… , X n ) =∏ P (X i∨ parents ( X i ) ).
i=1
The probability over all of the variables, P(X1,X2,…,Xn), is called the joint probability distribution. A
belief network defines a factorization of the joint probability distribution into a product of
conditional probabilities.
Thus, a belief network/ Bayesian network consists of
- a DAG, where each node is labelled by a random variable
- a domain for each random variable
- a set of conditional probability distributions giving P(X∣parents(X)) for each variable X.
The most common probabilistic inference task is to compute the posterior distribution of a query
variable, or variables, given some evidence, where the evidence is a conjunction of assignment of
values to some of the variables.
To represent a domain in a belief network, the designer of a network must consider the following
questions:
- What are the relevant variables?
o what the agent may observe in the domain.
o what information the agent is interested in knowing the posterior probability of.
o other hidden variables or latent variables that will not be observed or queried but
make the model simpler.
- What values should these variables take?
o level of detail at which the agent should reason to answer the sorts of queries that
will be encountered.
- What is the relationship between the variables?
o add arcs to the graph
- How does the distribution of a variable depend on its parents?
o conditional probability distributions
, Belief networks have often been called causal networks and provide representation of causality that
takes noise and probabilities into account.
8.4: Probabilistic inference
The main approaches for probabilistic inference in belief networks are:
- Exact inference
o probabilities are computed exactly.
- Approximate inference
o probabilities are only approximated. These methods are characterized by the
different guarantees they provide:
Guaranteed bounds on the probabilities.
they return a range [l, u], where the exact probability p is guaranteed
to have l ≤ p ≤ u.
Probabilistic bounds on the error produced
such algorithms might guarantee that the error, for example, is
within 0.1 of the correct answer 95% of the time.
They could make a best effort to produce an approximation that may be good
enough, even though there may be cases where they do not work very well.
One such class of techniques is called variational inference, where the idea is
to find an approximation to the problem that is easy to compute.
The variable elimination (VE) algorithm can be adapted to find the posterior distribution for a
variable in a belief network given conjunctive evidence. The algorithm is based on the notion that a
belief network specifies a factorization of the joint probability distribution. A function of variables is
called a factor. The VE algorithm for belief networks manipulates factors to compute posterior
probabilities.
A factor is a function from a set of random variables into a number. A factor f on variables X1,…,Xj is
written as f(X1,…,Xj). The variables X1,…,Xj are the variables of the factor f, and f is a factor on X1,
…,Xj.
Some of the variables of a factor can be assigned to values to make a new factor on the other
variables. This operation is called conditioning on the values of the variables that are assigned.
Factors can be multiplied together. Suppose f1 and f2 are factors, where f1 is a factor that contains
variables X1,…,Xi and Y1,…,Yj, and f2 is a factor with variables Y1,…,Yj and Z1,…,Zk, where Y1,…,Yj are
the variables in common to f1 and f2. The product of f1 and f2, written f1*f2, is a factor on the union
of the variables, namely X1,…,Xi,Y1,…,Yj,Z1,…,Zk, defined by:
( f 1∗f 2 )= ( X 1 , … , X i , Y 1 , … ,Y j , Z1 , … , Z k ) =f 1 ( X 1 , … , X i , Y 1 , … ,Y j )∗f 2(Y 1 ,… , Y j , Z1 , … , Z k )
The remaining operation is to sum out a variable in a factor. Given factor f(X1,…,Xj), summing out a
variable, say X1, results in a factor on the other variables, X2,…,Xj, defined by
(∑X1f)(X2,…,Xj)=f(X1=v1,X2,…,Xj)+⋯+f(X1=vk,X2…,Xj)
Given evidence Y1=v1, …, Yj=vj, and query variable or variables Q, the problem of computing the
posterior distribution on Q can be reduced to the problem of computing the probability of
conjunctions:
P(Q , Y 1=v 1 , … , Y j =v j)
P ( Q|Y 1=v 1 , … , Y j =v j )=
∑ P(Q ,Y 1=v 1 , … ,Y j=v j )
Q