Markov Decision Processes Verified Solutions
Markov decision processes ️️MDP - formally describe an environment for reinforcement learning - environment is fully observable - current state completely characterizes the process - Almost all RL problems can be formalised as MDP - optimal con...
Markov decision processes ✔️✔️MDP - formally describe an environment for reinforcement learning
- environment is fully observable
- current state completely characterizes the process
- Almost all RL problems can be formalised as MDP
- optimal control primarily deals with continuous MDPs
- Partially observable problems can be converted into MDPs
- Bandits are MDPs with one state
Markov Property ✔️✔️- future is independent of the past given the present
-the state captures all relevant information from the history
- once the state is known the history can be thrown away
- the state is a sufficient statistic of the future
State transition Matrix ✔️✔️- markov state s and successor state s', the state transition probability
- state transition matrix P defines transition probabilities from all states s to all successor states s'
Markov Process ✔️✔️- markov process is a memoryless random process i.e, a sequence of random
states S1, S2... with the markov property
-Markov process (or Markov Chain) is a tuple <S,P>
- S is a (finite) set of states
- P is a state transition probability matrix
Markov reward process ✔️✔️- A markov reward process is a Markov Chain with values
- Markov reward process is a tuple <S,P,R,Y>
- S is a finite set of a states
- P is a state transition probability matrix
, - R is a reward function
-Y is a discount factor
Return ✔️✔️- Return Gt is the total discounted reward from time-step t
- the discount Y is the present value of future rewards
- value of receiving reward R after k+1 time-steps is Y^k R
- values immediate reward above delayed reward
- y lose to 0 leads to "myopic" evaluation
- y close to 1 leads to "far sighted" evaluation
Discount ✔️✔️- mathematically convenient to discount rewards
- Avoids infinite returns in cyclic Markov Processes
- Uncertainty about the future may not be fully represented
- if reward is financial, immediate rewards may earn more interest than delayed rewards
- animal/human behavior shows preference for immediate reward
- sometimes possible to use undiscounted Markov reward processes if all sequences terminate
Value Function ✔️✔️-Value function v(s) gives the long-term value of state s
- state value function v(s) of an MRP is the expected return starting from state s
Bellman Equation for MRPs ✔️✔️the value function can be decomposed into two parts:
- immediate reward Rt+1
- discounted value of successor state Yv(St+1)
Bellman Equation in Matrix Form ✔️✔️- Bellman equation can be expressed concisely using matrices,
v=R+yPv
v is a column vector with on entry per state
Los beneficios de comprar resúmenes en Stuvia estan en línea:
Garantiza la calidad de los comentarios
Compradores de Stuvia evaluaron más de 700.000 resúmenes. Así estas seguro que compras los mejores documentos!
Compra fácil y rápido
Puedes pagar rápidamente y en una vez con iDeal, tarjeta de crédito o con tu crédito de Stuvia. Sin tener que hacerte miembro.
Enfócate en lo más importante
Tus compañeros escriben los resúmenes. Por eso tienes la seguridad que tienes un resumen actual y confiable.
Así llegas a la conclusión rapidamente!
Preguntas frecuentes
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
100% de satisfacción garantizada: ¿Cómo funciona?
Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.
Who am I buying this summary from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller CertifiedGrades. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy this summary for 9,34 €. You're not tied to anything after your purchase.