Resume

Samenvatting Reinforcement Learning, ISBN: 9780262193986 Reinforcement Learning (6013B0359Y)

Name: Samenvatting Reinforcement Learning, ISBN: 9780262193986 Reinforcement Learning (6013B0359Y)
SKU: doc_2509493
Rating: 5.00 (1 reviews)
Author: feanne1

1 vérifier

4 fois vendu

Cours
Reinforcement Learning (6013B0359Y)

Établissement
Universiteit Van Amsterdam (UvA)

Book
Reinforcement Learning

Dit is een uitgebreide samenvatting van de lectures van Reinforcement met daarbij nog een aantal tips & aantekeningen. De samenvatting is net zoals het vak in het engels en er zijn veel formules toegevoegd.

[Montrer plus]

Aperçu 3 sur 28 pages

Voir l'exemple

Livre entier ? Oui
Publié le 24 mars 2023
Nombre de pages 28
Écrit en 2022/2023
Type Resume

reinforcement learning
markov process
monte carlo
q learning
marl
agents

Titre de l’ouvrage:Reinforcement Learning

Auteur(s):Richard S. Sutton, Andrew G. Barto

Édition:Inconnu
ISBN:9780262193986
Édition:Inconnu

Établissement
Universiteit van Amsterdam (UvA)
Cours
Econometrics
Cours
Reinforcement Learning (6013B0359Y)

1 vérifier

Par: zulkarneync • 1 année de cela

feanne1

Membre depuis 1 année 7 documents vendus

$12.71

Egalement disponible en groupe à partir de $22.87

Ajouter au panier

Enregistrer

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Document également disponible en groupe (1)

Econometrie 2e en 3e jaars vakken UvA voordeelbundel

$ 32.22 $ 22.87 3 éléments

1. Resume - Samenvatting reinforcement learning, isbn: 9780262193986 reinforcement learning (601...
2. Resume - Samenvatting corporate finance, global edition, isbn: 9781292304151 finance (6011p02...
3. Resume - Life insurance mathematics - samenvatting soln man actu maths life contingen risks, i...
Montrer plus

Reinforcement Learning Summary
Lecture 1

RL = what to do to maximize a numerical reward
1 action for each situation -> highest reward

Problem of RL:
• Sense of the state of the environment
• Take actions -> affect the state
• Goal relating to the state

Learning from interactions, directly from its environment

Exploration-Exploitation dilemma:
Exploitation: profit from your experience
Exploration: look for better options in the future

Elements of RL:
1. Policy
Behavior of learning agent in time
Action to be taken
Policies may be stochastic
2. Reward Rt
Goal of the problem
Reward-signal
3. Value function V (s)
Total amount of reward expected to accumulate over the future, starting from state s
Long-run desirability of the state, considering future states & rewards
Actions based on value judgements
Value estimation
4. Model environment
Model-free: trial & error learner
Model-based given state & action, model predicts next state & R

→ What action to take as a function of the state signal
→ Learn while interacting

Multi-armed bandits

k-armed bandit → k options & 1 situation: non-associative feedback problem
k: number of actions
t: time step
At: action at t
q* (a): true value of action a (expected reward) = E[Rt | At = a]

q* (a) unknown → We use Qt (a) as an estimation at time t

𝞹

, Action-value methods
Types of actions: Greedy approach: exploiting, so choose a with highest Qt(a)
Exploring actions

= number of times a has been selected until time t

If Nt (a) = 0 , then Qt (a) = c, some default time
If Nt (a) → ♾ , then Qt (a) → q* (a)

Selection:
1. Random selection: P[ At = a ] = 1/k
2. Greedy action selection method At = arg maxa {Qt (a)}
3. -greedy action selection: with prob. select randomly from all actions with equal probability,
otherwise greedy

Offline computing: all data already available : computationally inefficient
Qn = (R1 + … + Rn-1) / (n - 1)
Qn+1 = (R1 + … + Rn-1) / n

Online computing:

new estimate = old estimate + stepwise ( target - old estimate )

Non-stationary: rewards probabilities change over time
give more weight to recent reward than to long-past reward

Varying step-size n (a): convergence for n (a) = 1/2
& no convergence for n (a) = and varying Qn+1
Sample average: bias disappears when actions are selected at least once

Optimistic initial value for Q1(a) → forces to select all options at least once
→ Qt (a) to proper level

𝜺 𝜶 𝜺 𝜶 𝜶 𝜶

, Lecture 2

Observations multi-armed bandits

Optimistic initial value Q1(a):
• Qt+1 = Qn(a) + [Rn(a) - Qn(a)], Q1(a) = c, influence high for small
• Qn+1 = Qn(a) + 1/n [Rn(a) - Qn(a)], whatever Q1(a) = c → Q2(a) = R1(a)

Greedy (with average reward value):
• t > t0 → At = a0, action a0 as long as Qt(a0) > 0 and if q*(a0) > 0
• t→♾ → Qt(a0) → q*(a0), for the ‘absorbing’ action

-Greedy (with average reward value):
• t→♾ → Qt(a) → q*(a), for all actions
• t→♾ → P[At = a*] = ( 1 - ) + / k, with a* = arg maxa {q* (a)}: optimal

Greedy selection (optimal initial value, exp. regency weighted average):
• Higher average reward (high Rt: slow decrease Qt), ‘absorbing’ action often optimal action

Do better by: giving exploiting priority and when we explore:
• Avoid low reward actions (no random selections)
• Good choices for pi: select greedy with high prob. for p1
• Low reward actions will still be selected

Upper-confidence-bound action selection
Explore non-greedy actions with high potential & keep exploring, also in the long-run

Behaviour:
• Sqrt-term measure of uncertainty, like confidence interval
• Value of actions increases in time, even when not selected
• When selected the uncertainty term decreases
• Subtle favoring of less-frequent selected actions

Multi-bandit problem non-associative task: find or track the best action for a single situation (or
state), either stationary or non-stationary

Contextual Bandits
Associative task: find best action for multiple situations (or states), i.e. learn a policy
Associative search: trial-and-error learning and association of actions to situations

Full RL associative task: Actions affect next situation (or state)

𝜺

𝜶 𝜺 𝜺 𝜶

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

Les clients de Stuvia ont évalués plus de 700 000 résumés. C'est comme ça que vous savez que vous achetez les meilleurs documents.

L’achat facile et rapide

Vous pouvez payer rapidement avec iDeal, carte de crédit ou Stuvia-crédit pour les résumés. Il n'y a pas d'adhésion nécessaire.

Focus sur l’essentiel

Vos camarades écrivent eux-mêmes les notes d’étude, c’est pourquoi les documents sont toujours fiables et à jour. Cela garantit que vous arrivez rapidement au coeur du matériel.

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur feanne1. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour $12.71. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis)

64257 résumés ont été vendus ces 30 derniers jours

Fondée en 2010, la référence pour acheter des résumés depuis déjà 15 ans

Commencez à vendre!

Récemment vu par vous

Examen ·

(0)

AMT AIRFRAME WRITTEN TEST (FAA) QUESTIONS AND ANSWERS WITH SOLUTIONS 2024

Examen ·

(0)

REDUCED Full marks 30/30 A* essay A level Essay - Evaluate the view that social factors determine the outcome of elections

Resume ·

(0)

Resume

Samenvatting Reinforcement Learning, ISBN: 9780262193986 Reinforcement Learning (6013B0359Y)

Infos sur le Document

Sujets

Livre connecté

École, étude et sujet

1 vérifier

Vendeur

Avis reçus

Aperçu du contenu

Les avantages d'acheter des résumés chez Stuvia:

Qualité garantie par les avis des clients

L’achat facile et rapide

Focus sur l’essentiel

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Garantie de remboursement : comment ça marche ?

Auprès de qui est-ce que j'achète ce résumé ?

Est-ce que j'aurai un abonnement?

Peut-on faire confiance à Stuvia ?

Récemment vu par vous

Examen ·

AMT AIRFRAME WRITTEN TEST (FAA) QUESTIONS AND ANSWERS WITH SOLUTIONS 2024

Examen ·

*REDUCED* Full marks 30/30 A* essay A level Essay - Evaluate the view that social factors determine the outcome of elections

Resume ·

Samenvatting History of International Relations

REDUCED Full marks 30/30 A* essay A level Essay - Evaluate the view that social factors determine the outcome of elections