Week 1
The sample space S is the set of all possible outcomes of one random experiment. The possible outcomes
depend on your chosen experiment. An event A is a subset of the sample space S, and we say that A
occurred if the actual outcome is in A.
Probability maps random events to numbers in the range [0, 1]. Essentially, probability is a function. The
codomain of this function is [0, 1], and the domain is the set of events: a collection of subsets of S, denoted
as {Ai ⊆ S, i ∈ I}.
Denote A be an event for an experiment with a finite sample space S, and each outcome is equally likely to
happen. Then the naive definition of probability is:
number of outcomes favorable to A |A|
PNaive (A) = =
number of outcomes in S |S|
The naive definition is applicable when there is symmetry in the problem that makes outcomes equally
likely(e.g. a deck of cards); when the outcomes are equally likely by design (e.g. a survey); when the naive
definition serves as a useful null model (we assume it applies, just to see what predictions would yield).
Some properties of events:
• Commutative laws:
A ∩ B = B ∩ A, A ∪ B = B ∪ A
• Associative laws:
(A ∪ B) ∪ C = A ∪ (B ∪ C), (A ∩ B) ∩ C = A ∩ (B ∩ C)
• Distributive laws:
(A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C), (A ∩ B) ∪ C = (A ∪ C) ∩ (B ∪ C)
• De Morgan’s laws:
(A ∪ B)c = Ac ∩ B c , (A ∩ B)c = Ac ∪ B c
– Here, Ac and B c are the complements of A and B.
In some problems, we can count the number of possibilities using the multiplication rule. This holds for
sampling with replacement. Consider a compound experiment consisting of two sub-experiments, A and B.
Suppose that A has a possible outcomes, and for each of these outcomes B has b possible outcomes. Then
the compound experiment has ab possible outcomes.
Consider n objects and making k choices from them, one at a time with replacement. Then there are
nk possible outcomes. Here, order matters.
Consider n objects and making k choices from them, one at a time without replacement. Then there
are n(n − 1) . . . (n − k + 1) possible outcomes for 1 ≤ k ≤ n), and 0 possibilities for k > n (where order
matters.
The factorial n! counts the number of different orders that you can arrange n different items, where
we take into account orders. It is without replacement.
The binomial coefficient nk counts the number of possible ways to choose a subset of objects of a given
numerosity from a larger set, where orders are ignored. It counts the number of ways to choose k out of n.
n n!
=
k k!(n − k)!
Probability Theory Wietske de Voogd 4
, n!
(n−k)! counts how many different ordered sequences of k you can make out of n.
To choose k times from a set of n objects with replacement, and order doesn’t matter, there are n+k−1
k
ways. However, this result should not be used in the naive definition of probability except in very special
circumstances.
A story proof is a proof by interpretation. For counting problems, this often means counting the same
thing in two different ways.
Some properties:
n n
=
k n−k
n−1 n
n =k
k−1 k
X k
m+n m n
=
k j=0
j k−j
n
n
X n
(x + y) = xk y n−k
k
k=0
n 2
X n 2(n − 1)
k2 = n2
k n−1
k=0
General definition of probability: a probability space consists of a sample space S and a probability
function P which takes an event A ⊆ S as input and returns P(A), a real number between 0 and 1, as
output. The function P must satisfy the following axioms:
1. P (∅) = 0, P (S) = 1
2. If A1 , A2 , .. are disjoint events, then
∞
[ ∞
X
P( Aj ) = P (Aj )
j=1 j=1
Probability has the following properties, for any events A and B :
1. P (Ac ) = 1 − P (A)
2. If A ⊆ B, then P (A) ≤ P (B)
3. P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
Inclusion-exclusion principle: for any events A1 , . . . , An ,
n
[ n
X X X
P( Ai ) = P (Ai ) − P (Ai ∩ Aj ) + P (Ai ∩ Aj ∩ Ak ) − · · · + (−1)n+1 P (A1 ∩ · · · ∩ An )
i=1 i=1 i<j i<j<k
Probability Theory Wietske de Voogd 5
, Week 2
Conditional probability of A given B (P (B) > 0): if A and B are events with P (B) > 0, then the
conditional probability of A given B, denoted by P (A|B), is defined as
P (A ∩ B)
P (A|B) =
P (B)
Properties of conditional probability:
• Conditional probability is a probability as it satisfies
– P (S|B) = 1 and P (∅|B) = 0
P
– P (∪i Ai |B) = i P (Ai |B) for Ai ∩ Aj = ∅ (∀, i ̸= j)
• In general, P (A|B) ̸= P (B|A)
P (B|A)P (A)
• Bayes’ Rule: suppose P (A), P (B) > 0, P (A|B) = P (B)
• Law of Total Probability (LOTP): let P disjoint Ai ’s be a partition of the sample space S such that
∪i Ai = S and P (Ai ) > 0, then P (B) = i P (B|Ai )P (Ai )
P (B|A)P (A) P (B|A)P (A)
• Bayes’ Rule + LOTP: P (A|B) = P (B) = P (B|A)P (A)+P (B|Ac )P (Ac )
The empty set or the sample space are always independent from other events. Events A and B are indepen-
dent if ’additional’ information on B does not change the probability value of event A.
A and B are independent if
P (A ∩ B) = P (A)P (B),
which implies that P (A|B) = P (A) (if P (B) > 0) and P (B|A) = P (B) (if P (A) > 0).
A and B are conditionally independent conditional on/given C if
P (A ∩ B|C) = P (A|C)P (B|C),
which implies that P (A|C, B) = P (A|C) (if P (B|C) > 0) and P (B|C, A) = P (B|C) (if P (A|C) > 0) where
P (A|C, B) = P (A|C ∩ B).
Conditional independence does not imply independence and vice versa. Conditional independence given
E does not imply conditional independence given E c . In practice, sometimes we use the following notations:
• P (A1 , A2 , . . . , An ) = P (A1 ∩ A2 ∩ · · · ∩ An ) = P (∩i Ai )
• P (C|A1 , A2 , . . . , An ) = P (C|A1 ∩ A2 ∩ · · · ∩ An ) = P (C| ∩i Ai )
A, B and C are independent if
• Pairwise independence:
– P (A ∩ B) = P (A)P (B), P (A ∩ C) = P (A)P (C), P (C ∩ B) = P (C)P (B)
– Pairwise independence does not necessarily lead to independence
• and additionally:
– P (A ∩ B ∩ C) = P (A)P (B)P (C)
For n events to be independent, above definition can be extended for not only each pair and triplet to satisfy
the conditions, but also each quadruplet, quintuplets, and so on.
Some more properties:
Probability Theory Wietske de Voogd 6