100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached
logo-home
Week 3 Review $2.99
Add to cart

Class notes

Week 3 Review

 3 views  0 purchase

Lecture Notes from Week 3

Preview 3 out of 19  pages

  • December 8, 2024
  • 19
  • 2024/2025
  • Class notes
  • Hosseinzadeh taher
  • All classes
All documents for this subject (8)
avatar-seller
4point0
Week 3 Review
Recurrent Neural Networks

Typical neural networks don’t have access to any sort of memory - Decisions
are independent of one another

RNN’s Keep track of previous outputs, use them as inputs

Ideal for sequence learning (Time series events)

memory through ‘recurrent’ connections

info from the previous step is fed back into the network as input

Can be seen as a series of multiple ANNs connected through time (also
called ‘unfolding’)

Passage of hidden state (ht ) from one state to another helps the network





remember previous states.

The hidden state is multiplied by its own weight matrix and added to the same
layer at the next sequence step.

ht = σ(Uxt + W ht−1 )
​ ​ ​




U(xt ) = current inputs





(ht − 1)= previous hidden state





(W )= weight matrix
(σ =non-linear activation)

Take the hidden state calculated above and run another linear combination
with a matrix (V ), (ϕ= another non-linear activation)

yt = ϕ(V ht )= overall output
​ ​




Forward Propagation of Hidden State

The current hidden state is a function of all prior hidden states and all prior
and current inputs




Week 3 Review 1

, Can be written as a composite function

This allows us to calculate the gradients

Back Propagation Through Time

Same as BP, only with
connections in time

partial derivatives allow us to
update the weight functions

Error function (MSE) has one
more summation
(i)
E = 12 ∑N N
t=1 ∑i=1 ∣∣yt −
​ ​ ​




(i)
gt ∣∣2  ​




Back Propagation through time

apply chain rule to incorporate hidden state

kis the time step
θis the weights of the RNN (wij ) ​




∂L t ∂L T ∂ht ∂hk
∂x





​ = ∑k=1 ( ∂h t











∂hk ∂θ





)​











ht and hk are two hidden steps that happen at two different time steps.
​ ​




We are multiplying by the partial derivative of one with respect to
∂h
the other, ∂h t .










k ​




∂ht
depends on the derivative of the activation function, which





∂hk










we’ve seen generates values smaller than one.




Week 3 Review 2

, When using the chain rule to connect the loss function to parameters
in a prior timestep, this connection must be made through every
hidden state between the loss and the parameter at the timestep.

Vanishing/ Exploding gradients

Gradients quickly shrink to negligible values (vanish) when W <1

Timesteps further removed from the network’s output have little to no
influence.

Gradients grow in some exponential curve that leads to nonsensical results
(explode)

Sigmoid derivative
1
Activation function: σ= 1−e −x 





Derivative: σ’ = (1 − σ)σ 
df
Max value of dx = .25





A Network with 5 steps through time would mean that we multiply by
the derivative of sigma 5 times.

.255 = 0.00097← very small
Gradient clipping

Evaluate the norm and rescale to within an allowed threshold

the threshold hyperparameter has to be selected for each case

In gradient clipping, if the magnitude of a gradient is greater than a
predefined threshold, we can simply scale the gradient’s magnitude back
while maintaining its direction.

Initialization

Ensure the eigen values of the recurrent weight matrix are equal to one.

Initialize weight vectors to have eigenvalues = 1.

an identity matrix (ones on the diagonal)

or an orthogonal basis




Week 3 Review 3

The benefits of buying summaries with Stuvia:

Guaranteed quality through customer reviews

Guaranteed quality through customer reviews

Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.

Quick and easy check-out

Quick and easy check-out

You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.

Focus on what matters

Focus on what matters

Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller 4point0. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $2.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

50064 documents were sold in the last 30 days

Founded in 2010, the go-to place to buy study notes for 14 years now

Start selling
$2.99
  • (0)
Add to cart
Added