Typical neural networks don’t have access to any sort of memory - Decisions
are independent of one another
RNN’s Keep track of previous outputs, use them as inputs
Ideal for sequence learning (Time series events)
memory through ‘recurrent’ connections
info from the previous step is fed back into the network as input
Can be seen as a series of multiple ANNs connected through time (also
called ‘unfolding’)
Passage of hidden state (ht ) from one state to another helps the network
remember previous states.
The hidden state is multiplied by its own weight matrix and added to the same
layer at the next sequence step.
ht = σ(Uxt + W ht−1 )
U(xt ) = current inputs
(ht − 1)= previous hidden state
(W )= weight matrix
(σ =non-linear activation)
Take the hidden state calculated above and run another linear combination
with a matrix (V ), (ϕ= another non-linear activation)
yt = ϕ(V ht )= overall output
Forward Propagation of Hidden State
The current hidden state is a function of all prior hidden states and all prior
and current inputs
Week 3 Review 1
, Can be written as a composite function
This allows us to calculate the gradients
Back Propagation Through Time
Same as BP, only with
connections in time
partial derivatives allow us to
update the weight functions
Error function (MSE) has one
more summation
(i)
E = 12 ∑N N
t=1 ∑i=1 ∣∣yt −
(i)
gt ∣∣2
Back Propagation through time
apply chain rule to incorporate hidden state
kis the time step
θis the weights of the RNN (wij )
∂L t ∂L T ∂ht ∂hk
∂x
= ∑k=1 ( ∂h t
∂hk ∂θ
)
ht and hk are two hidden steps that happen at two different time steps.
We are multiplying by the partial derivative of one with respect to
∂h
the other, ∂h t .
k
∂ht
depends on the derivative of the activation function, which
∂hk
we’ve seen generates values smaller than one.
Week 3 Review 2
, When using the chain rule to connect the loss function to parameters
in a prior timestep, this connection must be made through every
hidden state between the loss and the parameter at the timestep.
Vanishing/ Exploding gradients
Gradients quickly shrink to negligible values (vanish) when W <1
Timesteps further removed from the network’s output have little to no
influence.
Gradients grow in some exponential curve that leads to nonsensical results
(explode)
Derivative: σ’ = (1 − σ)σ
df
Max value of dx = .25
A Network with 5 steps through time would mean that we multiply by
the derivative of sigma 5 times.
.255 = 0.00097← very small
Gradient clipping
Evaluate the norm and rescale to within an allowed threshold
the threshold hyperparameter has to be selected for each case
In gradient clipping, if the magnitude of a gradient is greater than a
predefined threshold, we can simply scale the gradient’s magnitude back
while maintaining its direction.
Initialization
Ensure the eigen values of the recurrent weight matrix are equal to one.
Initialize weight vectors to have eigenvalues = 1.
an identity matrix (ones on the diagonal)
or an orthogonal basis
Week 3 Review 3
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller 4point0. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $2.99. You're not tied to anything after your purchase.