Lecture 3 · Game Theory & Nash Equilibria

1the setup

What is a game?

Same machinery as decision theory — but we replace the faceless environment with a roomful of self-interested, rational agents.

🎙 the one change

Basically we refine the environment: instead of an abstract environment, we have a collection of self-interested and rational agents.

★Definition — a game

A game is a set of agents O = {o₁,…,oₘ} where each agent oᵢ has a set of actions Aᵢ (its pure strategies), plus a utility function that gives every agent a payoff for each combination of actions.

U : A₁ × … × Aₘ → ℝᵐU(ā)ᵢ = the utility agent oᵢ gets when everyone plays the action profile ā

Agents needn't commit to one action. A strategy lets them randomise:

⤮Pure vs mixed strategy

A pure strategy = commit to one action. A (mixed) strategy sᵢ = a probability distribution over Aᵢ — flip a weighted coin over your actions. A strategy profile s̄ = (s₁,…,sₘ) picks one strategy per agent.

Writing it down: normal form

Just like decision matrices, we lay a (2-player) game in a table — its normal form. Rows = Agent 1's actions, columns = Agent 2's actions, and each cell holds both payoffs: Agent 1, Agent 2.

iWhat normal form hides

Normal form flattens the structure: in chess one "action" would be an entire game of moves. When that structure matters you'd use an extensive representation instead — but for this lecture, the table is enough.

2four games to know

A little zoo of games

Three famous 2×2 games cover the whole spectrum from pure cooperation to pure competition.

Driving game — pure coordination

1 ↓ / 2 →	Left	Right
Left	1, 1	−1, −1
Right	−1, −1	1, 1

Two drivers approach each other; they're both happy if they pick the same side. A pure coordination game: every agent gets the same payoff (U(ā)ᵢ = U(ā)ⱼ), so there's no reason to compete — coordination helps everyone.

Matching Pennies — zero-sum

1 ↓ / 2 →	Heads	Tails
Heads	1, −1	−1, 1
Tails	−1, 1	1, −1

Player 1 wants to match, Player 2 wants to mismatch. A zero-sum game: two agents, and in every cell the payoffs sum to zero — one wins exactly what the other loses. Purely competitive. Remember this game — your exam question is a 99-action version of it.

🎙 zero-sum is everywhere

Chess, checkers — pretty much all table games are zero-sum games. And all sports are typically zero-sum: one team wins, the other loses.

Prisoner's Dilemma — a bit of both

1 ↓ / 2 →	Cooperate	Defect
Cooperate	−1, −1	−4, 0
Defect	0, −4	−3, −3

Two suspects are interrogated separately. "Cooperate" = confess with your partner; "Defect" = blame them. The most realistic kind of game — it mixes cooperation and competition. (The numbers are arbitrary; only their order matters.) We'll return to its unsettling solution in §4.

◇ Check yourself

In every cell of a game the two payoffs add up to 0. What kind of game is it?

Payoffs summing to zero in every cell = a zero-sum (purely competitive) game. Coordination games have equal payoffs; the prisoner's dilemma is neither.

3solution concepts · part 1

Analysing a game: Pareto

In decision theory we hunted for the optimal action. In a game there's a twist.

🎙 the twist that changes everything

The optimal strategy of an agent depends also on what the other agents do.

Because of that, there's no single "best." Instead we study solution concepts — sets of outcomes that are interesting in some defined sense. The first takes an outside observer's view.

★Pareto dominance & optimality

Profile s̄ Pareto dominates t̄ if it's at least as good for every agent and strictly better for at least one. s̄ is Pareto optimal if nothing Pareto-dominates it. (It's exactly decision theory's dominance, lifted to many agents.)

🎙 who was Pareto?

Here "Pareto" is the name of a person — an Italian mathematician and social scientist; the name of this relationship derives from his.

!Why it's weak (just like dominance)

In a zero-sum game every profile is Pareto optimal — one agent's gain is the other's loss, so no profile beats another for both. So Pareto optimality often can't single anything out. We need a concept that takes the agent's own view.

4solution concepts · part 2 · the big one

Nash equilibrium

The most important solution concept in all of game theory — and the one your exam asks you to define.

Start from one agent's point of view. Suppose you knew everyone else's strategies (write them s̄₋ᵢ). Then your problem collapses to ordinary decision-under-risk: just pick what maximises your utility.

★Best response

Given the others' fixed strategies s̄₋ᵢ, a best response is a strategy that maximises your own payoff: sᵢ* ∈ argmax_sᵢ U(sᵢ, s̄₋ᵢ)ᵢ.

★Nash equilibrium

A strategy profile s̄ is a Nash equilibrium when every agent is simultaneously playing a best response to everyone else. Strict if each best response is the unique best; weak if some agent is tied between strategies.

🎙 why it's called stability

If I select the best response to the others, then I have no reason for changing my decision — as long as the other agents do not change theirs. No incentive to deviate unilaterally.

🎙 it's not just a board game — GANs

Generative Adversarial Networks use exactly this idea: a generator tries to produce images that fool a discriminator, which tries not to be fooled. That's a game. Training them is computing a Nash equilibrium — the point where the generator reproduces the real data and the discriminator can no longer tell real from fake.

See it live 🎮

Below is a 2×2 game. Pick a classic, or edit any payoff. Each agent's best responses light up in their colour (P1 blue in each column, P2 magenta in each row); a cell where both are best-responding is a pure Nash equilibrium.

🎮 Game Explorer

A pure Nash equilibrium = a cell highlighted in both colours at once.

P1 ↓ / P2 →	Left	Right
Left	,NE	,NE
Right	,NE	,NE

P1 best response P2 best response Nash equilibrium

🔍 Why these cells? — the best-response test

It isn't magic — it's just mechanising the professor's best-response test, one player at a time.

① Blue = P1's best replies. Fix what P2 does — i.e. look down one column at a time. In that column, which row gives P1 (the first number) the most? That cell lights blue. Translation: "if P2 plays this, P1's best answer is here."

② Magenta = P2's best replies. Now fix what P1 does — look across one row at a time. Which column gives P2 (the second number) the most? That cell lights magenta.

③ Green = both at once = Nash. A cell that is blue and magenta means P1 is best-responding to P2's choice and P2 is best-responding to P1's choice — simultaneously. Neither can do better by moving alone, so the best-response "walk" gets stuck here. That's exactly the definition of a Nash equilibrium.

🔴 Live reasoning — follows the numbers above ↑ (edit any payoff and watch it recompute)

Two examples worth memorising

Driving game (try it above): the two Nash equilibria — (Left,Left) and (Right,Right) — exactly coincide with the Pareto-optimal outcomes. Everyone agreeing on a side is stable and good.

!The Prisoner's Dilemma — the famous tragedy

Its unique Nash equilibrium is (Defect, Defect) → payoff (−3, −3). But that's Pareto-dominated by (Cooperate, Cooperate) → (−1, −1)! Rational self-interest drives both to an outcome that's worse for everyone. (Switch the Explorer to "Prisoner's Dilemma" and watch only one cell light up green.)

↻How the professor finds it: the "dynamical process"

Don't test all four cells blindly — walk the table. Start at (Cooperate, Cooperate). Ask Agent 1: "given Agent 2 cooperates, is cooperating my best response?" No — defecting gives 0 > −1, so move to (Defect, Cooperate). Now ask Agent 2 the same: cooperating isn't their best response either, so move to (Defect, Defect). Here the walk stops — neither wants to move. A cell where the best-response walk gets stuck is a Nash equilibrium. Try the same walk in the Explorer.

◇ Check yourself

A strategy profile is a Nash equilibrium when…

Nash = mutual best responses: nobody can do better by unilaterally deviating. Maximising the total isn't required (see the prisoner's dilemma); "better for at least one" describes Pareto, not Nash.

5existence & mixed strategies

Do they even exist?

Four natural questions: do Nash equilibria always exist, how many are there, why are they interesting, and how do we compute them?

✓Nash's Theorem (existence)

In any game with finitely many agents, each with finitely many actions, at least one Nash equilibrium exists. (The proof uses Brouwer's fixed-point theorem — you won't be asked to prove it.)

But wait — try Matching Pennies in the Explorer: no cell lights up green. Run the professor's best-response walk and it never stops — every pure cell, the loser wants to switch, so it cycles forever. There's no Nash equilibrium in pure strategies! How does that square with the theorem?

★The answer: go mixed

The theorem promises an equilibrium — just not necessarily a pure one. Matching Pennies has a unique mixed equilibrium: both players play (½ Heads, ½ Tails). Randomising 50/50 makes you unpredictable, so the opponent can't exploit you — neither can do better by deviating.

s̄ = ( (½ Heads, ½ Tails), (½ Heads, ½ Tails) )the unique Nash equilibrium of Matching Pennies

🎙 rock–paper–scissors: a strict mixed equilibrium

"Always play rock" is a terrible strategy — easily exploited. The only sensible play is to randomise uniformly: rock, paper, scissors each with probability ⅓. That's a Nash equilibrium, and it's strict: if you deviate, you might win a single round, but in the long run your opponent's uniform play beats you. There's literally no reason to do anything else — that's the stability a strict equilibrium gives you.

How many?

Not unique in general. The slides note that once you have two equilibria s̄ and t̄, you can interpolate between them:

p·s̄ + (1−p)·t̄ is an equilibrium for every p ∈ [0, 1]→ two equilibria imply infinitely many

6meaning & computation

Why care, and can we compute it?

Beyond "stability," Nash equilibria have a beautiful interpretation in two-player zero-sum games — and that's also where they're easy to compute.

Recall maximin from Lecture 1 — the pessimist who maximises their worst case. Lift it to games:

sᵢ^maximin = argmax_sᵢ min_s₋ᵢ U(sᵢ, s₋ᵢ)ᵢplay as if the others will do their worst to you

imaximin vs minimax

Maximin = "I maximise my utility assuming everyone else gangs up to minimise it." Minimax is the dual, aimed at a specific agent j: everyone (including j themselves) is treated as trying to force agent j into their worst case. In a two-player zero-sum game these two collapse onto the same answer.

★The Minimax Theorem (von Neumann)

In any finite two-player zero-sum game, Nash equilibria coincide with the minimax = maximin strategies. Each player's maximin value equals their minimax value — that common number is the value of the game (one number that characterises the whole game). So a Nash equilibrium is what you get when both players play as cautious pessimists.

🎙 this is the interpretation of Nash

Why is it rational to assume your opponent is actively out to hurt you? Because in a zero-sum game it's true — when one of us wins, the other loses, so anything that lowers my utility raises yours. So believing you'll play against me, and best-responding to that, is exactly sensible — and that pessimistic, worst-case reasoning converges to a Nash equilibrium. That's the answer to "discuss the interpretation of a Nash equilibrium": at least in two-player zero-sum games, it's the natural outcome of mutual caution, not just an abstract fixed point.

Computation

The professor lays out three tiers — and the dividing line is whether you can still reduce the problem to a linear program:

Tier 1 · tap+

2-player zero-sum → easy ✅

The minimax theorem turns it into a linear program, and LPs are solvable in polynomial time. So a Nash equilibrium here is computable efficiently — in P.

Tier 2 · tap+

Zero-sum, >2 players → harder

With more than two players you can no longer reduce to an LP. Believed (but not proven) to be harder than P yet easier than NP — the in-between class known as PPAD.

Tier 3 · tap+

Non-zero-sum → hardest ❌

Even with just 2 players, a non-zero-sum game becomes a non-linear program (you directly maximise utility differences). In general not solvable in polynomial time.

!Exam-ready one-liner

2-player zero-sum: reduces to linear programming → in P (efficient). The moment you break that — more than two players, or non-zero-sum — the LP reduction dies and the problem is believed intractable (the general 2-player case is PPAD-complete). Existence ≠ easy computation.

◇ Check yourself

Computing a Nash equilibrium is efficient (in P) for which games?

The minimax theorem turns two-player zero-sum games into a linear program → P. General games are believed intractable (PPAD-complete) — existence (Nash's theorem) doesn't imply easy computation.

7a friendlier equilibrium

Correlated equilibria

A broader notion that's both more cooperative and easier to compute than Nash.

★The idea

Add a trusted coordination mechanism — a shared random signal with joint distribution π — and let each agent's strategy σᵢ depend on the private signal it sees. It's a correlated equilibrium if no agent can do better by ignoring its recommendation. Every Nash equilibrium is a correlated equilibrium, but not the reverse.

Battle of the Sexes: a couple prefer being together but disagree on where — Ballet or Fight.

1 ↓ / 2 →	Ballet	Fight
Ballet	2, 1	0, 0
Fight	0, 0	1, 2

Its only Nash equilibrium is mixed, and it pays each player just 2/3. But add a fair coin both can see: "Heads → both Fight, Tails → both Ballet." Now they always end up together, for an expected payoff of (2 + 1)/2 = 1.5 each — far better than 2/3. The shared coin let them coordinate.

🎙 separate rooms vs one table

Picture the Nash version: you and I reason in two separate rooms. Not knowing what the other will pick, each of us randomises — and a big chunk of the time we miss each other (one picks Ballet, the other Fight) and walk away with 0. That wasted disagreement is exactly what drags the average down to 2/3. The correlated version is different: we sit at the same table and agree in advance, "let's flip a coin and both follow it." We never miss. Same game, but a shared signal turns wasted mismatches into guaranteed coordination.

🚦You use this every day: the traffic light

At a crossroads, two drivers must coordinate or crash. The Nash mindset is "guess what the other will do." The correlated mindset is: don't leave it to the drivers — install a shared signal (the traffic light) that tells each who goes. You're acting as the game's designer, adding a mechanism that makes coordination effortless.

✓Better behaved than Nash in every way

Correlated equilibria always exist, and they can be found via linear programming — so essentially every reasonable question (does one exist? is it unique? is it Pareto-optimal? what payoff is guaranteed?) is answerable efficiently, in P. That's a stark contrast with Nash, where those same questions are hard. And since every Nash equilibrium is itself a correlated equilibrium, you lose nothing by working with the broader notion.

★study like the exam

The Exam Lab

Lecture 3 is the heart of the DSS exam. The first question below is your actual sample exam — fully worked.

📋 Format — straight from the professor

He said it himself: there's no official past DSS paper yet (the course just changed hands), but the structure mirrors Advanced Data Management — open questions where you discuss the topic. His exact example: "tell me what is a Nash equilibrium and discuss about its interpretation." So practise the skeleton: ① Define precisely in your own words · ② Apply / discuss the interpretation · ③ Motivate. No theorem proofs required — state & use them.

★ Your sample exam · Nash equilibrium

Define, in your own words but as precisely as possible, the notion of a Nash equilibrium. Then: Alice and Bob each bet on a natural number between 1 and 99. Alice wins if the two numbers coincide; Bob wins if they differ. Alice claims there is no Nash equilibrium; Bob claims Alice is wrong. Who is correct? Motivate.

① DEFINEA Nash equilibrium is a strategy profile — one strategy (pure or mixed) per player — in which each player's strategy is a best response to the others'. Equivalently: no player can increase their utility by unilaterally changing their strategy.

② APPLYThis is a finite two-player game (2 players, 99 actions each) — essentially Matching Pennies scaled to 99: Alice wants to match, Bob to mismatch (zero-sum). Two facts:

• By Nash's theorem (finite players + finite actions) → at least one equilibrium exists. • There is no pure equilibrium: for any fixed pair of numbers the loser can always deviate — if they coincide Bob switches to differ; if they differ Alice switches to match. No pure profile is stable.

③ DECIDEBob is correct that a Nash equilibrium exists. Alice is only partially right: there is no pure equilibrium, but the unique equilibrium is in mixed strategies — each player picks a number uniformly at random from 1…99. Motivate: if Alice is uniform, every Bob choice is equally (un)likely to match, so he can't improve by deviating — and symmetrically for Alice. Neither has an incentive to change → it's an equilibrium.

★ His literal example · "define a Nash equilibrium and discuss its interpretation"

Define, in your own words, a Nash equilibrium, and discuss its interpretation.

① DEFINEA Nash equilibrium is a strategy profile in which every player is playing a best response to the others — equivalently, no player can increase their utility by unilaterally changing their strategy. It is strict if each player's best response is unique, weak if some player is tied between equally good options.

② INTERPRET(a) Stability: nobody has any incentive to deviate alone, so the situation is self-enforcing. (b) Pessimism / minimax: in a two-player zero-sum game the minimax theorem says Nash = maximin = minimax — the equilibrium is exactly what you get when each player cautiously assumes the opponent is out to minimise their payoff, which is rational there because one's gain is the other's loss (the common value is the value of the game). (c) Does it happen in practice? Yes — e.g. GANs train by computing a Nash equilibrium between a generator and a discriminator.

③ CAVEATSBy Nash's theorem at least one always exists (finite players + actions) — but possibly only in mixed strategies (Matching Pennies, rock–paper–scissors). It need not be unique, and crucially it need not be Pareto-optimal — the prisoner's dilemma's equilibrium (Defect, Defect) is worse for everyone than cooperating. So stability does not imply efficiency.

Question · Game types & Pareto

Define a zero-sum game and a pure coordination game. Classify Matching Pennies and the Driving game. What does Pareto optimality tell you in a zero-sum game, and why? Motivate.

① DEFINEA zero-sum game has two agents and, in every action profile, the payoffs sum to 0 (purely competitive). A pure coordination game has all agents receiving the same payoff in every profile (purely cooperative).

② APPLYMatching Pennies is zero-sum (each cell sums to 0). The Driving game is a pure coordination game (both get 1, or both get −1).

③ MOTIVATEIn a zero-sum game every strategy profile is Pareto optimal: any gain for one agent is an exact loss for the other, so no profile is better for both. Hence Pareto optimality is uninformative there — much like dominance in decision theory.

Question · The dilemma

Define best response and Nash equilibrium (strict vs weak). For the Prisoner's Dilemma — (C,C)=−1,−1; (C,D)=−4,0; (D,C)=0,−4; (D,D)=−3,−3 — find the Nash equilibrium and explain why it is paradoxical. Motivate.

① DEFINEA best response to the others' fixed strategies maximises your own utility. A Nash equilibrium is a profile where every player plays a best response; strict if each is the unique best response, weak if some player is tied.

② APPLYWhatever the opponent does, Defect beats Cooperate (0 > −1 and −3 > −4). So Defect is a dominant best response for both → the unique Nash equilibrium is (Defect, Defect) = (−3, −3).

③ MOTIVATEIt is paradoxical because (D,D) is Pareto-dominated by (Cooperate, Cooperate) = (−1, −1): both players would be better off cooperating, yet rational self-interest pushes each to defect. Stability (no incentive to deviate) does not imply efficiency.

Question · Existence & computation

State Nash's existence theorem and the Minimax theorem. What is the computational complexity of finding a Nash equilibrium in (a) two-player zero-sum games and (b) general games? Discuss.

① STATENash's theorem: every game with finitely many agents and finitely many actions has at least one Nash equilibrium (possibly mixed). Minimax theorem: in any finite two-player zero-sum game, Nash equilibria coincide with the maximin = minimax strategies, whose common value is the value of the game.

② COMPLEXITY(a) Zero-sum: the minimax theorem reduces it to linear programming → solvable efficiently, in P. (b) General games: a non-linear problem, believed intractable — PPAD-complete.

③ DISCUSSSo existence (guaranteed by Nash) does not imply efficient computation: only the zero-sum case is known to be easy. Practical solvers handle real instances despite the worst-case hardness.

Question · Correlated equilibria

Define a correlated equilibrium and explain how, in the Battle of the Sexes, a shared fair coin beats the mixed-Nash payoff. How do Nash and correlated equilibria relate? Motivate.

① DEFINEA correlated equilibrium adds a shared coordination mechanism (a joint distribution π over signals) plus strategies σ that map each agent's signal to an action, such that no agent can do better by deviating from its recommended action.

② APPLYBattle of the Sexes' mixed Nash pays each 2/3: reasoning in separate rooms, the players randomise and often miss each other (landing on a 0,0 mismatch), which drags the average down. With a fair coin both observe — "Heads → both Fight, Tails → both Ballet" — they sit at one table and always coordinate, for expected payoff (2+1)/2 = 1.5 each, well above 2/3.

③ MOTIVATEEvery Nash equilibrium is a correlated equilibrium, but not vice versa — the shared signal enlarges what's achievable (think a traffic light coordinating drivers). They also behave better computationally: correlated equilibria always exist and are found by linear programming, so uniqueness / Pareto-optimality / guaranteed-payoff questions are all in P — a stark contrast with Nash.

🗣 Say these out loud (cover the page)

• Define a Nash equilibrium in one precise sentence — then discuss its interpretation (stability + the zero-sum pessimist/minimax reading + GANs).
• Why does Matching Pennies have no pure equilibrium — and what is its mixed one? Same for rock–paper–scissors.
• Prisoner's dilemma: what's the equilibrium, and why is it a "tragedy"? Walk the best-response "dynamical process" to find it.
• Zero-sum vs coordination vs prisoner's dilemma — one line each.
• The three computation tiers: 2-player zero-sum (P) → zero-sum >2 players → non-zero-sum. Where does the LP reduction break?
• How does a correlated equilibrium let players beat their mixed-Nash payoff — and why is it better-behaved computationally?

When everyone is deciding

🎯 Why this lecture matters most for the exam

What is a game?

Writing it down: normal form

A little zoo of games

Driving game — pure coordination

Matching Pennies — zero-sum

Prisoner's Dilemma — a bit of both

Analysing a game: Pareto

Nash equilibrium

See it live 🎮

Two examples worth memorising

Do they even exist?

How many?

Why care, and can we compute it?

Computation

2-player zero-sum → easy ✅

Zero-sum, >2 players → harder

Non-zero-sum → hardest ❌

Correlated equilibria

The Exam Lab