Tentamen (uitwerkingen)

Reinforcement Learning + Markov Decision Processes

0 keer bekeken 0 keer verkocht

Vak
Reinforcement Learning Ma-rk-ov Decision Processes

Instelling
Reinforcement Learning Ma-rk-ov Decision Processes

Reinforcement Learning + Markov Decision Processes Reinforcement learning generally ️️given inputs x and outputs z but the outputs are used to predict a secondary output y and function with the input y=f(x) z Markov Decision Process ️️in reinforcement learning we want our agent to l...

[Meer zien]

Voorbeeld 2 van de 12 pagina's

Bekijk voorbeeld

Geupload op 30 oktober 2024
Aantal pagina's 12
Geschreven in 2024/2025
Type Tentamen (uitwerkingen)
Bevat Vragen en antwoorden

reinforcement learning markov decision processes

Instelling Reinforcement Learning Ma-rk-ov Decision Processes
Vak Reinforcement Learning Ma-rk-ov Decision Processes

Volgen

CertifiedGrades

Lid sinds 1 jaar 80 documenten verkocht

€9,88

Ook beschikbaar in voordeelbundel v.a. €23,53

Toegevoegd

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Ook beschikbaar in voordeelbundel (1)

Markov Decision Processes Complete Bundle Compilation

€ 57,60 € 23,53 6 items

1. Tentamen (uitwerkingen) - Markov decision processes verified solutions
2. Tentamen (uitwerkingen) - Markov decision processes & q-learning verified a+
3. Tentamen (uitwerkingen) - So 2 markov decision processes
4. Tentamen (uitwerkingen) - Reinforcement learning + markov decision processes
5. Tentamen (uitwerkingen) - Markov decision processes
6. Tentamen (uitwerkingen) - Markov decision processes finals v2
Meer zien

Reinforcement Learning + Markov Decision Processes

Reinforcement learning generally ✔️✔️given inputs x and outputs z but the outputs are used to
predict a secondary output y and function with the input

y=f(x) z

Markov Decision Process ✔️✔️in reinforcement learning we want our agent to learn a ___ ___ ___.

For this we need to discretize the states, the time and the actions.

states in MDP ✔️✔️states are the set of tokens that represent every state that one could be in (can
include a state even if we never go there)

model in MDP ✔️✔️aka transition function

the rules of the game, function of state action and another state - and it gives the probability of
transitioning to the another state given that you were in the first state and you took the action

actions in MDP ✔️✔️things you can do in a particular state (up,down,left,right) or allowed to do

✔️✔️

how to get around the markovian property and why the workaround could be bad ✔️✔️you can make
the state remember everything you need from the past

but this means that you might be in every state once which would make it hard to learn anything

properties of markov decision making ✔️✔️-only the present matters

- the rules don't change over time (stationary)

, reward in mdp ✔️✔️- a scalar value for being in a state - if you get to the goal you get a dollar, or if
you get to the bad one you lose a dollar

- different types of ways to look at rewards R(s), R(s,a), R(s,a,s')

- usually delayed reward

policy in mdp ✔️✔️function that takes in a state and returns an action (as a command)

- not a sequence of actions but just an action to take in a particular state

kinda the next best thing

- kinda looks like a vector field

how to find the solution in MDP ✔️✔️find the optimal policy that maximizes the long term expected
reward

given a bunch of states (x), actions, and rewards (z), find the function that gives the optimal action (y)

temporal credit assignment problem ✔️✔️-refers to the fact that rewards, especially in fine grained
state-action spaces, can occur terribly temporally delayed

-such reward signals will only very weakly affect all temporally distant states that have preceded it

-almost as if the influence of a reward gets more and more diluted over time and this can lead to bad
convergence properties of the RL mechanism

-Many steps performed by any iterative reinforcement-learning algorithm to propagate the influence of
delayed reinforcement to all states and actions that have an effect on that reinforcement

why do you have a small negative reward for each step before terminating? ✔️✔️-similar to walking
across a hot beach into the ocean - encourages you to end the game and not stay where you are

why do minor changes matter in MDP? ✔️✔️- because if you change your reward function to less
negative, could lead you to end up in the bad area more than if you had a harsher reward

- if the reward is too harsh, then the bad outcome may be better than staying in the game

what part of MDP can incorporate our domain knowledge? ✔️✔️the reward - how important it is to
get to the end

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, creditcard of Stuvia-tegoed voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper CertifiedGrades. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €9,88. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 71184 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire Universiteiten

Populaire Hogescholen

Populaire Scholen

Populaire samengevatte studieboeken voor Communicatie en Taal

Populaire samengevatte studieboeken voor Economie en Bedrijf

Populaire samengevatte studieboeken voor Exact en Informatica

Populaire samengevatte studieboeken voor Gedrag en Maatschappij

Populaire samengevatte studieboeken voor Gezondheid en Geneeskunde

Populaire samengevatte studieboeken voor Onderwijs en Opvoeding

Populaire samengevatte studieboeken voor Recht en Bestuur

De beste samenvattingen om je Wft-diploma te behalen

De beste samenvattingen om je theorie examens te behalen

De beste samenvattingen voor je cursus in de Veiligheidsbranche

De beste samenvattingen voor Gezondheid & Hygiëne cursussen

De beste samenvattingen voor zakelijke cursussen

De beste samenvattingen voor je PABO WisCAT cursus

Populaire vakken

Populaire vakken

Populaire vakken

Boekverslagen en samenvattingen

Tentamen (uitwerkingen)

Reinforcement Learning + Markov Decision Processes

Document informatie

Onderwerpen

Geschreven voor

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

Verzekerd van kwaliteit door reviews

Snel en makkelijk kopen

Focus op de essentie

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?