Multi Agent Systems Course VU
Exam Goals
April 12, 2021
1 Game Theory
1.1 Pay-off matrix, BR, Pareto, dominated strategies and IESDS
• We have a Pay-off matrix to determine payoff of simultaneous moves of 2 players. This can be put as
normal-form game which is a tuple of (N,A,u) with N being set of n players, A being actions (each
elements in A called action profile) and utility function (payoff u for each agent depends on the joint
actions of all agents).
• The best response in a pay-off game is the action agent i can choose to get the highest increase in
utility when the action of agent j is known. For a continuous state space, we find the maximum utility
for agent i by taking the partial derivative with respect to action of agent i and setting the derivative
to zero:
ui (x, y) = 2(x + y + bxy) − x2
Where y is the known action of other agent
ui
= 2(1 + by) − 2x = 0
x
xopt = 1 + by
• Pareto optimality is a solution property. A strategy a is Pareto dominated by another strategy a0
if the utility for a0 is higher or equal than for a with at least 1 instance which is strictly higher. A
Pareto dominant strategy is the highest mutual pay-off (but not necessarily the highest social welfare);
deviating from this strategy could increase utility for player 1 but will then decrease for player 2.
• Strictly vs weak dominance: if a set of strategies I can chose are all dominant over another set of
strategies then the first set is strictly dominant. Weakly dominant is for some point equal and other
points dominant.
• An elimination strategy we can use is IESDS: we look at what not to do; if a row or column is strictly
dominant over other row, we delete other row. We can do this recursively. By the nature of first looking
at a dominated column, than row, than column etc, we could end up with an equilibrium which does
not have to imply Pareto optimality, since the Pareto optimal strategy could be deleted.
1.2 Nash equilibrium, Vickrey auction
• A Nash equilibrium is a solution concept based on conditions instead of an algorithm. Nash can be
strict(>) or weak (≥); Nash supposes players have no regrets even though they know what other player
will do; best response to all other agents. This may not be the response with highest welfare (think
of prisoner’s dilemma). Pure Nash can be strict or weak; mixed Nash is always weak. For mixed NE,
agent makes opponent indifferent (for matrix games only). There alwyas exist a NE; but this NE may
be a mixed Nash.
• Computation of Mixed NE: assign p and (1-p) to choices of player 1, assign q and (1-q) to player 2.
Calculate utility via the pay-off matrix. For example with Battle of the Sexes: player 1 wants to choose
p so that player 2 is indifferent (which is the definition of mixed Nash), see fig. 1 for the calculation.
1
, Figure 1: Calculating utility for choosing mixed Nash equilibrium. Utility is 2/3, which is lower than choosing
the same movie; so mixed Nash is not always optimal.
• However, a mixed Nash does not always exists; for example in the Prisoner’s dilemma, the probability
of p and q end up to be higher than 1; this is not possible. Since for every game, a NE must exists,
this implies that the Prisoner’s dilemma has a Pure NE.
• Vickrey auction: n players do a bid for an item; the highest bid will get the item and only has to
pay the second-highest price. This is interesting since no players will have regrets afterwards and the
winner will have positive utility.
1.3 Repeated games, ultimatum game, extensive form, backward induction,
SPNE
• Repeated games: One-state games, states do not change (rock-paper-scissors). Maximize total reward
of finite known number of repetitions. Infinite number of repetitions; maximize discounted total rewards
(future rewards less valuable then present rewards):
inf
X
R= δ t−1 · rt
t=1
Using mathematical derivation, we can rewrite expected R to:
1
E(R) = (r1 + δ · r2 + δ 2 · r3 ...)
1−δ
Xinf
E(R) = δ t−1 · rt
t=1
• Repeated Prisoner’s Dilemma has an interesting strategy: Grim Trigger strategy.
Start by cooperating (best social welfare) until someone defects; then always defect.
If we look at table 1, we can derive the utility for always cooperating and always defecting:
Coop Defect
C 3,3 4,1
D 1,4 2,2
Table 1: Form of Prisoner’s dilemma
2