Tentamen (uitwerkingen)

SO 2 Markov Decision Processes

0 keer bekeken 0 keer verkocht

Vak
SO 2 Ma-rk-ov Decision Processes

Instelling
SO 2 Ma-rk-ov Decision Processes

SO 2 Markov Decision Processes What is a Markov decision process (MDP) and what are it's components? ️️An MDP is a model for sequential decision problems. It consists of: Decision epochs System states Actions Transition probabilities: depend only on present state and present actio...

[Meer zien]

Voorbeeld 2 van de 5 pagina's

Bekijk voorbeeld

Geupload op 30 oktober 2024
Aantal pagina's 5
Geschreven in 2024/2025
Type Tentamen (uitwerkingen)
Bevat Vragen en antwoorden

so 2 markov decision processes

Instelling SO 2 Ma-rk-ov Decision Processes
Vak SO 2 Ma-rk-ov Decision Processes

Volgen

CertifiedGrades

Lid sinds 1 jaar 80 documenten verkocht

€9,29

Ook beschikbaar in voordeelbundel v.a. €23,28

Toegevoegd

In winkelwagen

Op verlanglijstje

100% tevredenheidsgarantie
Direct beschikbaar na betaling
Zowel online als in PDF
Je zit nergens aan vast

Ook beschikbaar in voordeelbundel (1)

Markov Decision Processes Complete Bundle Compilation

€ 57,00 € 23,28 6 items

1. Tentamen (uitwerkingen) - Markov decision processes verified solutions
2. Tentamen (uitwerkingen) - Markov decision processes & q-learning verified a+
3. Tentamen (uitwerkingen) - So 2 markov decision processes
4. Tentamen (uitwerkingen) - Reinforcement learning + markov decision processes
5. Tentamen (uitwerkingen) - Markov decision processes
6. Tentamen (uitwerkingen) - Markov decision processes finals v2
Meer zien

SO 2 Markov Decision Processes

What is a Markov decision process (MDP) and what are it's components? ✔️✔️An MDP is a model for
sequential decision problems.

It consists of:

Decision epochs

System states

Actions

Transition probabilities: depend only on present state and present action.

Rewards

What are decision epochs? what's our notation for them and what restrictions do we impose?
✔️✔️Decision epochs are the points of time when decisions are made and actions taken. T denotes
the set of all. We consider models where T={t0,t1...} is a countable set and can be represented as N.

Finite horizon T={1,....,N} finite set of integers.

Infinite Horizon T= N.

What are Actions? what's our notation for them and what restrictions do we impose? ✔️✔️Actions
are the effects on the future behaviour of the system caused by the agents decisions. A denotes the set
of all actions available to the decision maker and is called the action space. Yt is the random variable
representing the action taken at t (even given all information decision can still be randomized).

We only consider models where the action set is finite.

What are states? what's our notation for them and what restrictions do we impose? ✔️✔️The state of
a system is the information about the system, past and present which together with future action,
enables us to predict (uniquely in a statistical sense - distribution) the system behaviour in the future

S denotes the set of all states the system can be in. We restrict to the case when our state space is finite.
Ns is the number of states.

, A(s)⊂ A is the set of all admissible actions when the system is in state s.

What are the transition probabilities? what's our notation for them and what restrictions do we impose?
✔️✔️Pt( |s,a) are a paramatized family of PMFs on the state space; indexed by a state s (current) and
action (current taken).

Pt(s|s,a) is the probability of the process transitioning to state z at t+1 conditional on the system being
in state s and action a being taken at time t.

Transitions from one state to another obey a state-action Markov property assumption: P(Xt+1=st+1 |
X0=s0, Y=a0,...Xt,Yt)=pt(st+1|st,at)

essentially the future of the process given the present state of the process and the present action taken
is independent of the past system states and actions taken.

What are rewards? what's our notation for them and what restrictions do we impose? ✔️✔️Rewards
are the immediate consequences of actions taken. rt(s,a)∈R is the reward recieved at time t if the
system is in state s and the agent selects action a both at time t.

What are decision rules? ✔️✔️Informally, A decision rule is a procedure for selecting an action in each
state at the specified decision epoch. In the process of selecting an action to take the rule has access to
the present state along with all past states and actions.

Formally, a general decision rule is a distribution on Action set A. we consider 4 rule classes:

History dependent randomized (HR)

History dependent deterministic (HD)

Memoryless randomized (MR)

Memoryless deterministic (MD)

The rule classes are related as follows: MR⊂HR⊃HD⊃MD

Voordelen van het kopen van samenvattingen bij Stuvia op een rij:

√ Verzekerd van kwaliteit door reviews

Stuvia-klanten hebben meer dan 700.000 samenvattingen beoordeeld. Zo weet je zeker dat je de beste documenten koopt!

Snel en makkelijk kopen

Je betaalt supersnel en eenmalig met iDeal, Bancontact of creditcard voor de samenvatting. Zonder lidmaatschap.

Focus op de essentie

Samenvattingen worden geschreven voor en door anderen. Daarom zijn de samenvattingen altijd betrouwbaar en actueel. Zo kom je snel tot de kern!

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper CertifiedGrades. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €9,29. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews)

Afgelopen 30 dagen zijn er 77858 samenvattingen verkocht

Opgericht in 2010, al 14 jaar dé plek om samenvattingen te kopen

Start met verkopen

Populaire universiteiten

Populaire hogescholen

Populaire studieboeken voor Communicatie en Taal

Populaire studieboeken voor Economie en Bedrijf

Populaire studieboeken voor Exact en Informatica

Populaire studieboeken voor Gedrag en Maatschappij

Populaire studieboeken voor Gezondheid en Geneeskunde

Populaire studieboeken voor Recht en Bestuur

Tentamen (uitwerkingen)

SO 2 Markov Decision Processes

Document informatie

Onderwerpen

Geschreven voor

Verkoper

Ontvangen beoordelingen

Voorbeeld van de inhoud