SO 2 Markov Decision Processes
What is a Markov decision process (MDP) and what are it's components? ️️An MDP is a model for
sequential decision problems.
It consists of:
Decision epochs
System states
Actions
Transition probabilities: depend only on present state and present actio...
What is a Markov decision process (MDP) and what are it's components? ✔️✔️An MDP is a model for
sequential decision problems.
It consists of:
Decision epochs
System states
Actions
Transition probabilities: depend only on present state and present action.
Rewards
What are decision epochs? what's our notation for them and what restrictions do we impose?
✔️✔️Decision epochs are the points of time when decisions are made and actions taken. T denotes
the set of all. We consider models where T={t0,t1...} is a countable set and can be represented as N.
Finite horizon T={1,....,N} finite set of integers.
Infinite Horizon T= N.
What are Actions? what's our notation for them and what restrictions do we impose? ✔️✔️Actions
are the effects on the future behaviour of the system caused by the agents decisions. A denotes the set
of all actions available to the decision maker and is called the action space. Yt is the random variable
representing the action taken at t (even given all information decision can still be randomized).
We only consider models where the action set is finite.
What are states? what's our notation for them and what restrictions do we impose? ✔️✔️The state of
a system is the information about the system, past and present which together with future action,
enables us to predict (uniquely in a statistical sense - distribution) the system behaviour in the future
S denotes the set of all states the system can be in. We restrict to the case when our state space is finite.
Ns is the number of states.
, A(s)⊂ A is the set of all admissible actions when the system is in state s.
What are the transition probabilities? what's our notation for them and what restrictions do we impose?
✔️✔️Pt( |s,a) are a paramatized family of PMFs on the state space; indexed by a state s (current) and
action (current taken).
Pt(s|s,a) is the probability of the process transitioning to state z at t+1 conditional on the system being
in state s and action a being taken at time t.
Transitions from one state to another obey a state-action Markov property assumption: P(Xt+1=st+1 |
X0=s0, Y=a0,...Xt,Yt)=pt(st+1|st,at)
essentially the future of the process given the present state of the process and the present action taken
is independent of the past system states and actions taken.
What are rewards? what's our notation for them and what restrictions do we impose? ✔️✔️Rewards
are the immediate consequences of actions taken. rt(s,a)∈R is the reward recieved at time t if the
system is in state s and the agent selects action a both at time t.
What are decision rules? ✔️✔️Informally, A decision rule is a procedure for selecting an action in each
state at the specified decision epoch. In the process of selecting an action to take the rule has access to
the present state along with all past states and actions.
Formally, a general decision rule is a distribution on Action set A. we consider 4 rule classes:
History dependent randomized (HR)
History dependent deterministic (HD)
Memoryless randomized (MR)
Memoryless deterministic (MD)
The rule classes are related as follows: MR⊂HR⊃HD⊃MD
Los beneficios de comprar resúmenes en Stuvia estan en línea:
Garantiza la calidad de los comentarios
Compradores de Stuvia evaluaron más de 700.000 resúmenes. Así estas seguro que compras los mejores documentos!
Compra fácil y rápido
Puedes pagar rápidamente y en una vez con iDeal, tarjeta de crédito o con tu crédito de Stuvia. Sin tener que hacerte miembro.
Enfócate en lo más importante
Tus compañeros escriben los resúmenes. Por eso tienes la seguridad que tienes un resumen actual y confiable.
Así llegas a la conclusión rapidamente!
Preguntas frecuentes
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
100% de satisfacción garantizada: ¿Cómo funciona?
Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.
Who am I buying this summary from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller CertifiedGrades. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy this summary for 9,33 €. You're not tied to anything after your purchase.