Examen

Reinforcement Learning + Markov Decision Processes

0 vista 0 veces vendidas

Grado
Reinforcement Learning Ma-rk-ov Decision Processes

Institución
Reinforcement Learning Ma-rk-ov Decision Processes

Reinforcement Learning + Markov Decision Processes Reinforcement learning generally ️️given inputs x and outputs z but the outputs are used to predict a secondary output y and function with the input y=f(x) z Markov Decision Process ️️in reinforcement learning we want our agent to l...

[Mostrar más]

Vista previa 2 fuera de 12 páginas

Ver ejemplo

Subido en 30 de octubre de 2024
Número de páginas 12
Escrito en 2024/2025
Tipo Examen
Contiene Preguntas y respuestas

reinforcement learning markov decision processes

Institución Reinforcement Learning Ma-rk-ov Decision Processes
Grado Reinforcement Learning Ma-rk-ov Decision Processes

Seguir

CertifiedGrades

Miembro desde 1 año 80 documentos vendidos

9,83 €

También disponible en un lote de 23,41 €

Añadido

Añadir al carrito

Añadir a la lista de deseos

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada

Documento también disponible en un lote (1)

Markov Decision Processes Complete Bundle Compilation

€ 57,30 € 23,41 6 artículos

1. Examen - Markov decision processes verified solutions
2. Examen - Markov decision processes & q-learning verified a+
3. Examen - So 2 markov decision processes
4. Examen - Reinforcement learning + markov decision processes
5. Examen - Markov decision processes
6. Examen - Markov decision processes finals v2
Mostrar más

Reinforcement Learning + Markov Decision Processes

Reinforcement learning generally ✔️✔️given inputs x and outputs z but the outputs are used to
predict a secondary output y and function with the input

y=f(x) z

Markov Decision Process ✔️✔️in reinforcement learning we want our agent to learn a ___ ___ ___.

For this we need to discretize the states, the time and the actions.

states in MDP ✔️✔️states are the set of tokens that represent every state that one could be in (can
include a state even if we never go there)

model in MDP ✔️✔️aka transition function

the rules of the game, function of state action and another state - and it gives the probability of
transitioning to the another state given that you were in the first state and you took the action

actions in MDP ✔️✔️things you can do in a particular state (up,down,left,right) or allowed to do

✔️✔️

how to get around the markovian property and why the workaround could be bad ✔️✔️you can make
the state remember everything you need from the past

but this means that you might be in every state once which would make it hard to learn anything

properties of markov decision making ✔️✔️-only the present matters

- the rules don't change over time (stationary)

, reward in mdp ✔️✔️- a scalar value for being in a state - if you get to the goal you get a dollar, or if
you get to the bad one you lose a dollar

- different types of ways to look at rewards R(s), R(s,a), R(s,a,s')

- usually delayed reward

policy in mdp ✔️✔️function that takes in a state and returns an action (as a command)

- not a sequence of actions but just an action to take in a particular state

kinda the next best thing

- kinda looks like a vector field

how to find the solution in MDP ✔️✔️find the optimal policy that maximizes the long term expected
reward

given a bunch of states (x), actions, and rewards (z), find the function that gives the optimal action (y)

temporal credit assignment problem ✔️✔️-refers to the fact that rewards, especially in fine grained
state-action spaces, can occur terribly temporally delayed

-such reward signals will only very weakly affect all temporally distant states that have preceded it

-almost as if the influence of a reward gets more and more diluted over time and this can lead to bad
convergence properties of the RL mechanism

-Many steps performed by any iterative reinforcement-learning algorithm to propagate the influence of
delayed reinforcement to all states and actions that have an effect on that reinforcement

why do you have a small negative reward for each step before terminating? ✔️✔️-similar to walking
across a hot beach into the ocean - encourages you to end the game and not stay where you are

why do minor changes matter in MDP? ✔️✔️- because if you change your reward function to less
negative, could lead you to end up in the bad area more than if you had a harsher reward

- if the reward is too harsh, then the bad outcome may be better than staying in the game

what part of MDP can incorporate our domain knowledge? ✔️✔️the reward - how important it is to
get to the end

Los beneficios de comprar resúmenes en Stuvia estan en línea:

Garantiza la calidad de los comentarios

Compradores de Stuvia evaluaron más de 700.000 resúmenes. Así estas seguro que compras los mejores documentos!

Compra fácil y rápido

Puedes pagar rápidamente y en una vez con iDeal, tarjeta de crédito o con tu crédito de Stuvia. Sin tener que hacerte miembro.

Enfócate en lo más importante

Tus compañeros escriben los resúmenes. Por eso tienes la seguridad que tienes un resumen actual y confiable. Así llegas a la conclusión rapidamente!

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller CertifiedGrades. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for 9,83 €. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews)

45,681 summaries were sold in the last 30 days

Founded in 2010, the go-to place to buy summaries for 14 years now

Empieza a vender

Institución educativa

Libros populares

Examen

Reinforcement Learning + Markov Decision Processes

Información del documento

Temas

Escuela, estudio y materia

Vendedor

Comentarios recibidos

Vista previa del contenido