Probability Theory
Freeke Boerrigter
Chapter 1 & 2 – Probability and Counting
¿ ¿
Naïve definition of probability: Pnaïve = ¿ A∨
¿ S∨¿ ¿
A – the event that A occurs
S – sample space, all possible outcomes
Ac – the event that A does not occur
Pnaïve(Ac) = 1 – Pnaïve(A)
The binomial coefficient formula: ()
n = n!
k ( n−k ) ! k !
where k is picked out of a
total set n.
Choosing the complement: for any nonnegative integers n and k with k ≤ n we
have (nk)=(n−k
n
)
Non-naïve definition of probability – a probability space consists of a sample
space S and a probability function P which takes an event A ⊆ S as input and
returns P(A), a real number between 0 and 1, as output. The function P must
satisfy the following axioms:
1. P(∅) = 0 and P(S) = 1
∞
2. P( ¿ j=1 ¿ ∞ A j ) =∑ P ( A j ) -> this simply means the union of all probabilities
j=1
of A is the sum of all the probabilities of A
Multiplication Rule -> if A does not affect B (independent), then P(A and B) =
P(A)P(B)
Some Events:
P(Ac) = 1 – P(A)
If A⊆ B, then P(A) ≤ P(B)
P(A ∪ B) = P(A) + P(B) – P(A ∩ B)
P(S) = P(A) + P(Ac) = 1
Chapter 2 – Conditional Probability
Two events are dependent if knowing one of the outcomes occurred, affects the
other outcome.
The Conditional Probability of A given B is P(A|B)
P ( A ∩ B)
P ( A|B )=
P ( B)
We call P(A) the prior probability of A and P(A|B) the posterior probability
of A
P ( B∨A ) P ( A )
Bayes’ Rule: P ( A|B )=
P( B)
n
The Law of Total Probability (LOTP): P ( B )=∑ P ( B∨ Ai ) P( Ai )
i=1
1
, Simpson’s paradox occurs when groups of data show a particular trend, but
when the data is combined, the trend reverses.
Example: on Saturday you get 7/8 points (87.5%), and your friend gets 2/2 points
(100%). On Sunday, you get 1/2 points (50%), and your friend gets 5/8 points
(62.5%). Both days, your friend has a higher proportion of points however when
you combine the days you have 8/10 points, and your friend has 7/10 points.
This is the paradox, combining groups of data reverses the trend.
Chapter 3 – Random Variables and their Distributions
Probability Mass Function (PMF) is a function that gives the probability that
a discrete random variable is exactly equal to some value.
Bernoulli Distribution only has two outcomes, success or failure. Example is
tossing a coin, getting a head is chance 0.5 and getting a tail is change 1 – p = 1
– 0.5 = 0.5.
Random variable X with parameter p if P(X = 1) = p and P(X = 0) = 1 – p.
Where 0 < p < 1.
X Bern(p)
Binomial Distribution is the outcome of a Bernoulli distribution repeated
multiple times. It has two possible outcomes.
Let X be the number of successes, and n and p be the parameters where n
is a positive integer and 0 < p < 1.
X Bin(n, p)
The PMF of X if X Bin(n, p) is P(X = k) = (nk ) p ( 1− p)
k n−k
Hypergeometric Distribution is used when you want to determine the
probability of obtaining a certain number of successes without replacement from
a specific sample size.
X HGeom(w, b, n) -> w and b come from white and black balls
from the urn
The PMF of X if X HGeom(w, b, n) is P(X = k) =
( k )( n−k )
w b
for
(n)
w+ b
integers k satisfying 0≤ k ≤ w and 0 ≤ n – k ≤ b and P(X = k) = 0.
Discrete Uniform Distribution says that all outcomes are equally likely out of
a finite, nonempty set of numbers.
X DUnif(C) if the chosen number is X
1
The PMF of X if X DUnif(C) is P(X = x) =
¿ C∨¿ ¿
Cumulative Distribution Function (CDF) is a function that gives the
probability that any random variable is exactly equal to some value.
Any function of a random variable is also a random variable itself.
2