Linear algebra is essential in the field of deep learning, as it is used to represent and manipulate high-
dimensional data, and to optimize the parameters of deep neural networks.
- A scalar is a single value, such as a number or a constant. It can be any real or complex
number.
- A vector is an array of numbers or scalars.
o The magnitude of a vector is the length of the arrow, and is represented by the
absolute value of the vector or the Euclidean norm.
- A matrix is a rectangular array of numbers or scalars. It can be used to represent a linear
transformation or a system of linear equations.
o Matrix multiplication is not commutative, meaning that A∗B is not the same as B∗A
o The determinant of a matrix is a scalar value that represents the scaling factor of the
matrix. It can be used to determine if a matrix is invertible and to find the inverse of a
matrix.
- A tensor is a multi-dimensional array of numbers or scalars. It can be used to represent high-
dimensional data, such as images or videos.
o Tensor contraction and tensor product are the two most common operations used on
tensors.
1. Tensor contraction is the process of summing over a set of indices to
reduce the number of dimensions in a tensor.
2. Tensor product is the operation of combining two or more tensors to form
a new tensor.
- The dot product/inner product is a way of multiplying two vectors together.
o It is a scalar value that can be used to measure the similarity between two vectors or
the angle between them.
o Given vectors ⃗v =[a 1 a 2 , a3 ] and ⃗
w =[ b1 , b 2 , b3 ], the dot product
is ⃗v ⋅⃗
w =a1 b 1+ a2 b2 +a3 b3 .
o The dot product of two vectors is equal to the magnitude of one vector multiplied by
the magnitude of the other vector multiplied by the cosine of the angle between
them.
- A matrix-vector product is a way of multiplying a matrix and a vector together. It is a vector
that represents the result of the linear transformation of the input vector by the matrix.
o The result is a new vector with the same number of rows as the matrix and the same
number of columns as the vector.
o The elements of the resulting vector are obtained by taking the dot product of each
row of the matrix with the vector.
o Example
1. [ ]
A= 1 2 and ⃗x =[5,6 ]
3 4
2. A ⃗x =[5 ×1+2 ×6
3 ×5+ 4 × 6
= ][ ]
17
39
=[17,39]
- Matrix-matrix multiplication is a way of multiplying two matrices together. The resulting
matrix represents the composition of the two original matrices as linear transformations.
, - A norm is a function that assigns a scalar value to a vector or a matrix. It can be used to
measure the size or distance of a vector or a matrix.
o The most common norm used in linear algebra is the Euclidean norm.
o Other norms include the L1 norm, which is the sum of the absolute values of the
components, and the max norm, which is the maximum value of the components.
These norms can be used to measure the sparsity or the maximum value of the
vector or matrix.
o Norms are used in deep learning to measure the size or distance of the parameters of
the neural network, and to regularize the model to prevent overfitting.
Applications
- Linear algebra is particularly used in the areas of neural networks and deep learning
architectures.
- Linear algebra concepts such as matrix-vector products, matrix-matrix multiplication, and
norms are used in the computation of forward and backward propagation in neural networks.
- Tensor operations such as tensor contraction and tensor product are used in convolutional
neural networks and recurrent neural networks to extract features from images and
sequences.
- Linear algebra concepts and operations are also used in optimization algorithms such as
gradient descent and stochastic gradient descent to adjust the parameters of the neural
network.
Lecture 2: Calculus Refresher
Calculus is essential in the field of deep learning, as it is used to optimise the parameters of deep
neural networks and to study the properties of activation functions used in these networks.
- The derivative of a function is a measure of the rate of change of the function at a certain
point.
' df ( a ) f ( x ) −f ( a ) f ( a+ h )−f ( a )
o f ( a )= =lim =lim
dx x→ a x −a h →0 h
o f′(x) is called the prime notation, and df(x)/dx is called the Leibniz notation.
o There are several rules for computing the derivatives of the basic functions and the
combined functions:
, o A partial derivative is the derivative of a multivariable function with respect to one
variable, while keeping the other variables constant. It measures the rate of change
of the output of the function with respect to one of its inputs, while ignoring the
effect of the other inputs.
' ∂f
1. f x ( x 1 , x 2 , … , x n ) = (x , x , … , xn )
i
∂ xi 1 2
o A gradient is a vector of partial derivatives of a multivariable function.
1. It represents the direction of the steepest ascent of the function, and can
be used in optimisation algorithms like gradient descent to update the
parameters of a model and improve its accuracy.
∂f ∂f ∂f
2. Let f ( x 1 , x 2 , … , x n ) , then the gradient of f is ∇ f =[ , ,…, ]
∂ x1 ∂ x2 ∂ xn
3. Example
Let f ( x , y )=x 2− y 2
Partial derivatives:
∂f
o =2 x
∂x
∂f
o =−2 y
∂y
[ ][
∂f
Gradient: ∇ f =
∂f −2 y ]
∂ x = 2 x =[2 x −2 y ]
∂y
- Chain rule
o The derivative of the composition of two or more functions is equal to the derivative
of the outer function evaluated at the inner function, multiplied by the derivative of
the inner function.
df ( g ( x ) ) df ( u ) du
o f ( g ( x ) )= = ⋅
dx du dx
o Example
1. f ( x )=sin ( x ) g ( x ) =x2
df ( g ( x ) )
2. =cos (x 2 )⋅ 2 x
dx
o The chain rule is a crucial concept in
deep learning because it allows us to
compute the gradient of complex
functions, which are often
represented as the composition of
multiple simpler functions.
o The gradient is used in optimisation
algorithms like gradient descent to
update the weights of a deep learning model and improve its accuracy.
o By applying the chain rule, we can find the gradient of the loss function with respect
to the parameters of the model, which can be used to update the parameters in a
direction that reduces the loss.
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller donjaschipper. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $6.00. You're not tied to anything after your purchase.