Week 1a1 Math/Statistics refresher
Math refresher
Functions part 1: linear and quadratic functions, polynomials, working with powers
Linear functions:
A function is a relationship between an input or set of inputs and an output. We write that y = f(x).
If the equation is linear, we have 𝑦 = 𝑏0 + 𝑏1𝑥. where y and x are called variables and b0 and b1
are parameters.
Quadratic functions:
A linear function is often not sufficiently flexible to accurately describe the relationship between two
series. A polynomial adds higher order powers of x into the function
2 𝑛
𝑦 = 𝑏0 + 𝑏1𝑥 + 𝑏2𝑥 + 𝑏33 +. . . + 𝑏𝑛𝑥 . Setting n = 2 is sufficient for many cases
2
𝑦 = 𝑎𝑥 + 𝑏𝑥 + 𝑐
where a, b, c are the parameters that describe the shape of the function. If a is positive, the function
will be ∪-shaped, while if a is negative it will be ∩-shaped.
The Roots of Quadratic functions:
A quadratic equation has two roots. The roots can be obtained either by factoring the equation
(contracting it into parentheses), or by using the
2
−𝑏 ± 𝑏 −4𝑎𝑐
abc-formule: 𝑥 = 2𝑎
2
- 𝑦 = 𝑥 − 4𝑥 > 𝑥=0 ∨𝑥=4
2
- 𝑦 = 𝑥 + 𝑥 − 6 > 𝑥= −3 ∨𝑥=2
2
- 𝑦 = 𝑥 − 3𝑥 + 1 > here factoring doesn’t work, so use the abc-formula
2 2
−(−3) − (−3) −4*1*1 3− 5 −(−3) + (−3) −4*1*1 3+ 5
𝑥 = 2*1
= 2
v𝑥 = 2*1
= 2
Powers of number or of variables:
3
𝑥 = 𝑥 · 𝑥 · 𝑥, here 3 we call the index
Seven frequently used rules:
0
- 𝑥 =1 everything to the power 0 is equal to 1
−2 1
- 𝑥 = 2
𝑥
3 3 3
- (𝑥𝑦) = 𝑥 𝑦
3
2 3 5 𝑥
- 𝑥 ·𝑥 =𝑥 2 =𝑥
𝑥
2 3 6
- (𝑥 ) = 𝑥
𝑛
𝑥 𝑛 𝑥
- (𝑦) = 𝑛
𝑦
1/2 1/𝑛 𝑛
- 𝑥 = 𝑥 𝑥 = 𝑥
Functions part 2: exponential/logarithmic functions
The Exponential function, e:
It is sometimes the case that the relationship between two variables is best described by an
exponential function. For example, when a variable grows (or reduces) at a rate in proportion to its
𝑥
current value, we would write 𝑦 = 𝑒 with e a simple number: 2.71828
Logarithms:
,Logarithms were invented to simplify cumbersome calculations, since exponents can then be added
or subtracted, which is easier than multiplying or dividing the original numbers Consider the power
3
relationship 2 = 8
Using logarithms, we would write this as 𝑙𝑜𝑔2 8 = 3, or ‘the log to the base 2 of 8 is 3’
𝑏
More generally, if 𝑎 = 𝑐, then we can also write 𝑙𝑜𝑔𝑎 𝑐 = 𝑏
A log to base e is known as a Natural logarithm, denoted interchangeably by ln(y) or log(y). Taking a
natural logarithm is the inverse of taking an exponential, so sometimes the exponential function is
called the antilog.
For variables x and y:
- 𝑙𝑛(𝑥𝑦) = 𝑙𝑛 𝑥 + 𝑙𝑛 𝑦 and 𝑙𝑛(𝑥/𝑦) = 𝑙𝑛 𝑥 − 𝑙𝑛 𝑦
𝑛
- 𝑙𝑛(𝑥 ) = 𝑛 𝑙𝑛 𝑥
- 𝑙𝑛(1) = 0 and 𝑙𝑛(𝑒) = 1
- 𝑙𝑛(1/𝑦) = 𝑙𝑛(1) − 𝑙𝑛(𝑦) = − 𝑙𝑛(𝑦)
𝑥 𝑙𝑛(𝑥)
- 𝑙𝑛(𝑒 ) = 𝑒 =𝑥
Sigma notation
If we wish to add together several numbers (or observations from variables), the sigma or summation
operator can be very useful. Σ means ‘add up all of the following elements.’ For instance, we might
4
write ∑ 𝑥𝑖 where the i subscript is an index, 1 is the lower limit and 4 is the upper limit of the sum.
𝑖=1
This would mean adding all of the values of x from 𝑥1 𝑡𝑜 𝑥4
Properties of the Sigma operator:
𝑛 𝑛 𝑛
- ∑ 𝑥𝑖 + ∑ 𝑧𝑖 = ∑ (𝑥𝑖 + 𝑧𝑖)
𝑖=1 𝑖=1 𝑖=1
𝑛 𝑛
- ∑ 𝑐𝑥𝑖 = 𝑐 ∑ 𝑥𝑖
𝑖=1 𝑖=1
𝑛 𝑛 𝑛
- ∑ 𝑥𝑖𝑧𝑖 ≠ ∑ 𝑥𝑖 ∑ 𝑧𝑖
𝑖=1 𝑖=1 𝑖=1
𝑛
- ∑ 𝑥 = 𝑥 + 𝑥 +... + 𝑥 = 𝑛𝑥
𝑖=1
𝑛
- ∑ 𝑥 = 𝑥1+ 𝑥2 + ... + 𝑥𝑛 = 𝑛𝑥
𝑖=1
Example:
4
2 2 2 2
∑ (𝑗 + 𝑗 + 3) = (2 + 2 + 3) + (3 + 3 + 3) + (4 + 4 + 3) = 47
𝑗=2
Pi notation:
Similar to the use of sigma to denote sums, the pi operator ( ∏ ) is used to denote repeated
𝑛
multiplications. For example ∏ 𝑥𝑖 = 𝑥1𝑥2. . . 𝑥𝑛
𝑖=1
,means ‘multiply together all of the xi for each value of i between the lower and upper limits.’ It also
follows that
𝑛 𝑛
𝑛
∏ (𝑐𝑥𝑖) = 𝑐 ∏ 𝑥𝑖
𝑖=1 𝑖=1
Example:
6
∏ 𝑖 = 3 * 4 * 5 * 6 = 360
𝑖=3
Differential calculus
The effect of the rate of change of one variable on the rate of change of another is measured by a
mathematical derivative Consider a variable y that is a function f of another variable x, i.e. y = f (x): the
derivative of y with respect to x is written:
𝑑𝑦 𝑑𝑓(𝑥)
𝑑𝑥
= 𝑑𝑥
or 𝑓'(𝑥)
For non-linear functions, the gradient at a certain point is tangent at that point.
Differentiation: the basics:
The derivative of a power function n of x:
𝑛 𝑑𝑦 𝑛−1
if 𝑦 = 𝑐𝑥 then 𝑑𝑥
= 𝑐𝑛𝑥
The derivative of the log of x is given by 1/x:
𝑑(𝑙𝑜𝑔(𝑥)) 1
𝑑𝑥
= 𝑥
𝑥 𝑥
The derivative of 𝑒 is 𝑒
Four rules for derivatives:
- The derivative of a sum is equal to the sum of the derivatives of the individual parts:
𝑑𝑦
𝑦 = 𝑓(𝑥) + 𝑔(𝑥) then 𝑑𝑥
= 𝑓'(𝑥) + 𝑔'(𝑥)
- The derivative of a product of two functions f (x)g(x) is given by
𝑑𝑦
𝑑𝑥
= 𝑓'(𝑥)𝑔(𝑥) + 𝑓(𝑥)𝑔'(𝑥)
𝑓(𝑥)
- The derivative of a quotient of two functions 𝑔(𝑥)
is given by
𝑑𝑦 𝑓'(𝑥)𝑔(𝑥)−𝑔'(𝑥)𝑓(𝑥)
𝑑𝑥
= 2
𝑔(𝑥)
- Suppose we would like to differentiate a function of a function, 𝑦 = 𝑓(𝑔(𝑥)). Then the chain
rule says:
𝑑𝑦 𝑑𝑦 𝑑𝑔
𝑑𝑥
= 𝑑𝑔 𝑑𝑥
Higher order derivatives:
It is possible to differentiate a function more than once to calculate the second order, third order
derivatives. The notation for the second order derivative, which is usually just termed the second
derivative, is
2 𝑑𝑦
𝑑𝑦 𝑑( 𝑑𝑥 )
2 = 𝑓''(𝑥) = 𝑑𝑥
𝑑𝑥
The second order derivative can be interpreted as the gradient of the gradient of a function – i.e., the
rate of change of the gradient. First and second order derivatives are useful when optimizing
functions! When a function reaches a maximum, its second derivative is negative, while it is positive
for a minimum.
Partial differentiation:
In the case where y is a function of more than one variable it may be of interest to determine the effect
that changes in each of the individual x variables would have on y. (Linear regression models!!!)
, 𝑦𝑖 = β0 + β1𝑥1𝑖 + β2𝑥2𝑖 + β3𝑥3𝑖 + ε𝑖
We calculate these partial derivatives one at a time, treating all of the other variables as if they were
constants.
Statistics refresher
Random variables:
A random variable is any variable whose value cannot be predicted exactly. There are discrete and
continuous random variables.
- discrete: specific set of possible values (events); (e.g. throw a dice). It is a variable with a
countable number of distinct values
- continuous: a continuous range of values (e.g. the temperature) the population is the set of all
possible values of the random variable. A numerical variable that can have any value within
an interval is continuous (e.g., 427.21 grams). Sometimes we round a continuous
measurement to an integer (e.g., 427 grams), but that does not make the data discrete.
Probability distributions:
A probability distribution of a discrete random variable lists all events and the probability that each
value will occur. A cumulative distribution function (cdf) is the probability that the random variable
is less than or equal to a particular value. For continuous random variables: probability distribution
becomes a probability density function (pdf) (or density function, or
density
Example: the Normal distribution!
A discrete PDF shows the probability of each X-value, while the CDF
shows the cumulative sum of probabilities, adding from the smallest to
the largest X- value. The figure illustrates a discrete PDF and the
corresponding CDF. Notice that the CDF approaches 1, and the PDF
values of X will sum to 1.
For sketching a normal distribution:
Exercise R.22:
A scalar multiple of a normally-distributed random variable also has a normal distribution. A random
variable X has a normal distribution with mean 5 and variance 10. Sketch the distribution of Z = X / 2.
Answer:
The mean and variance for random variable X are given (5 & 10) and also the relationship between Z
and X (Z = X/2)
𝑋 1 1
So, mean of Z is: 𝐸(𝑍) = µ = 𝐸( 2 ) = 2
𝐸(𝑋) = 2
× 5 = 2. 5
𝑋 1 1
Variance of Z: 𝑉𝑎𝑟(𝑍) = 𝑉𝑎𝑟( ) =
2 4
𝑉𝑎𝑟(𝑋) = 4
× 10 = 2. 5
So, 𝑍 ∼ 𝑁(2. 5, 2. 5) when we sketch this the mean (so the middle) of the distribution is on 2.5.
The variance is equal to 2.5 so the graph starts/ends on 2 * 𝑉𝑎𝑟(𝑋) = 2 * 2. 5 = 5. So the start of
the graph is on 2. 5 − 5 =− 2. 5 and the end of the Z distribution is on 2. 5 + 5 = 7. 5
Expected values and variance:
The expected value, expectation, or mean of a random variable Y, 𝐸(𝑌) is the long-run average
values of the random variable Discrete: Suppose Y takes n possible values.
[ ]
- Discrete: suppose Y takes n possible values 𝑦1, 𝑦2,..., 𝑦𝑛 and 𝑃𝑟 𝑌 = 𝑦𝑖 = 𝑝𝑖 then:
𝑛
𝐸(𝑌) = 𝑦1𝑝1 + 𝑦2𝑝2 +... + 𝑦𝑛𝑝𝑛 = ∑ 𝑦𝑖𝑝𝑖
𝑖=1
This is called the population mean
- continuous: probability-weighted average of the possible outcomes of the random variable