Automated planning/ behavior The outcome of our actions is
synthesis/ sequential decision non-deterministic.
making
taking an action might take us
Search problems - find a to our desired state or it can
path from V (start vertex) to take us to some random state.
G (end vertex)
The cause of the stochasticity
Graph representation with may be due to our own actions
vertices separated by or due to the other objects in
directed edges, edges are the environment.
actions between vertices,
A policy tells an agent what
labeling function that maps
action to take at any given
each edge to the
state.
corresponding action and
the cost of that action The next state only depends on
a finite portion of the history
Option 1: naive
representation (atomic) A terminal state is a state
where the world “ends,” and all
n cities, p packages, t
the actions has no effect.
trucks, d drivers.
Actions cannot change the
at least np+t+d vertices state or produce rewards once
a terminal state is reached.
creating or maintaining
this representation is a Infinite horizons are easier to
problem solve but can lead to infinite
rewards.
Option 2: Factored
representations for infinite horizon problems pi-
star is independent of the time
successor functions
step, but it's also independent
factor the state of the starting state
representation into a
Week 7 Review 1
, bunch of variables The optimal policy for a finite
allows actions that horizon problem depends on
focus on specific time, but for an infinite horizon
variables, i.e. only problem the optimal policy
modeling the effect of does not depend on time.
an action on one state.
For which value (or values) of
representation is more gamma (γ) does discounted
concise, which makes it reward behave like additive
easier to express and reward?
reduces memory
γ = 1 means no discount,
usage.
and the utility is simply
Problems include addition of all the rewards.
repetition of variables,
the expected utility (value) of a
successor functions
policy is the expected sum of
will need to be updated
utilities
when new things show
up. transition function
In factored provides a distribution
representation, we will over the next states that
have a separate may occur when the agent
variable for each entity performs a given action in
of interest. a given state.
So we will have provides distribution over
loc_t1 (location of next states when we
truck 1), loc_t2, and perform an action in any
so on...Similarly: given state. Mathematically
loc_p1 (location of it is written as P(S’ | S, a).
package 1), loc_p2, reward function
and so on...
specifies the reward that
We need location the agent receives in a
variables for 12 given state.
trucks and 11
maps states to real
packages - Total =
numbers.
12 + 11 = 23
Week 7 Review 2
The benefits of buying summaries with Stuvia:
Guaranteed quality through customer reviews
Stuvia customers have reviewed more than 700,000 summaries. This how you know that you are buying the best documents.
Quick and easy check-out
You can quickly pay through credit card or Stuvia-credit for the summaries. There is no membership needed.
Focus on what matters
Your fellow students write the study notes themselves, which is why the documents are always reliable and up-to-date. This ensures you quickly get to the core!
Frequently asked questions
What do I get when I buy this document?
You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.
Satisfaction guarantee: how does it work?
Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.
Who am I buying these notes from?
Stuvia is a marketplace, so you are not buying this document from us, but from seller 4point0. Stuvia facilitates payment to the seller.
Will I be stuck with a subscription?
No, you only buy these notes for $2.99. You're not tied to anything after your purchase.