Lecture 1 · Decision Theory Foundations

1where it all starts

Decisions → support systems

Before any maths, the professor sets up one chain of ideas. Everything in the course hangs off it.

The whole course is built on a little cascade of three concepts. Each one builds on the last:

Step 1 · tap to reveal+

Decision

The behaviour of choosing among alternatives — and the result of that choice. Ordering at a restaurant is both the act of choosing and the dish you get.

Step 2 · tap to reveal+

Decision-making

The process of arriving at a decision from a set of stimuli — the events or conditions that trigger you to choose.

Step 3 · tap to reveal+

Support system

Any system — a program, or even rules written on paper — whose goal is to improve decision-making.

↯The logic of the whole course

We care about decision support systems (DSS). To improve decision-making, we first have to understand decisions: what they are, how they're made, how they should be made, and how to write them down mathematically. That's why a course on "support systems" spends its time on decision theory.

🎙 from the lecture

A decision support system is any system whose goal is to improve decision-making processes… these are not intended to replace humans, but rather to support humans.

That word support matters. A DSS isn't meant to replace the doctor or the manager — it's meant to help them decide better. Which leads straight to the first big question of the field…

2the map

What is decision theory?

The study of how agents make — or should make — decisions in a decision-making setting.

Two words in that definition are doing all the work, and the professor highlights both.

"Make" vs "should make"

👁Descriptive (behavioural) — how we make decisions

Studies how real humans actually decide, biases and mistakes included. How do doctors really choose to treat? How do analysts really decide to invest?

✓Normative — how we should make decisions

Starts from assumptions (e.g. "the agent is rational") and derives the optimal behaviour those assumptions imply. This course is mostly normative.

🎙 why normative

We don't want systems that are copies of humans, because sometimes humans make wrong decisions. We want systems that can help us improve.

But we still study the descriptive side a little — because a support system has to support a human, and you can't support someone if you don't understand how they reason.

One field, many settings

Decision theory is huge. Which sub-field you're in depends entirely on the shape of the setting — how many agents, and how they relate:

The setting	The sub-field
One agent vs. the environment	Decision theory ← we start here
Many agents, each self-interested	Non-cooperative game theory
Many agents forced to cooperate	Coalitional game theory
Many agents + one central planner	Social choice theory
An agent learning by interacting	Reinforcement learning

Same theory, different "instantiations" of the setting. Lectures 2–3 move into game theory.

◇ Check yourself

A team must cooperate to split the reward from a shared project. Which sub-field is that?

Agents that are required to work together (and share the payoff) put you in coalitional game theory — exactly the professor's "do the project in groups" example.

3writing it down

The model & the decision matrix

We take the simplest setting — one agent vs. the environment — and turn it into maths.

The professor models everything abstractly, with just three pieces:

A = { a₁, …, a_n } the AGENT = its set of actions (the rows) S = { s₁, …, s_m } the ENVIRONMENT = its set of states (the columns) O(a, s) → outcome the OUTCOME = a function of (your action, the world's state)

🎙 the doctor example

The doctor is the agent; the world of patients is the environment. We model the doctor by the actions they can take: treat with medicine A, treat with medicine B, or do not treat. The states are: sick with this disease, sick with that one, or not sick at all.

The heart of it: you choose the row (your action); the world chooses the column (the state). You only ever control half of what happens. The outcome sits where your row meets the world's column.

	state s₁	…	state sₘ
action a₁	O₁₁	…	O₁ₘ
⋮		⋱
action aₙ	Oₙ₁	…	Oₙₘ

A decision matrix: n actions × m states. (Only works when both sets are finite — otherwise the table is infinite.)

A concrete one: the insurance decision

Should a DSS advise you to buy fire insurance for your house? Two actions, two states:

	🔥 Fire	No fire
Buy insurance	No house, +100 000€	Keep house (paid premium)
No insurance	No house, +100€	Keep house, +100€

Insurance is brilliant in the Fire column and a mild waste in the No fire column. So which row do you commit to — before you know which column the world picks?

!The fork that splits the whole lecture

Notice what's missing: nothing tells us how likely a fire is. Everything from here depends on one question:

Do you know the probabilities of the states?
• No idea → Decision under ignorance (§5)
• You have a probability distribution → Decision under risk (§6)

🎙 a surprising warning

This setting [ignorance] seems simpler, because we don't deal with probabilities — but in fact it is harder, in the sense that there is no single optimal behaviour.

4the one assumption

Rationality & utility

Normative theory needs a starting assumption. That assumption is rationality — and it's more precise than the everyday word.

Step 1 — preferences that make sense

We assume the agent can rank outcomes by preference, and that this ranking is a pre-order: it obeys two common-sense rules.

Reflexivity: o is at least as good as o any outcome is as good as itself — tautological

Transitivity: if o_i ≤ o_j and o_j ≤ o_k, then o_i ≤ o_k

🎙 transitivity, plainly

If you prefer bananas to apples, and oranges to bananas, then reasonably you'll prefer oranges to apples.

Step 2 — pre-order vs. linear order

A pre-order allows incomparable things. A linear order (like the numbers) doesn't — any two items can always be compared.

🎙 incomparability

I can say a cheeseburger is better than salmon — but I cannot say whether an airplane is better than a cheeseburger. They're just not comparable.

For decision-making we add completeness: inside our setting, every pair of outcomes must be comparable. If you can't compare two outcomes, you haven't finished specifying the problem.

Completeness: for any o_i, o_j — either o_i ≤ o_j or o_j ≤ o_i

Step 3 — collapse it into a number: utility

★The utility function

Reflexivity + transitivity + completeness, together, let us line the outcomes up and assign each a real number U such that U(oᵢ) ≤ U(oⱼ) exactly when oⱼ is (weakly) preferred to oᵢ. Drop any one of the three and this becomes impossible.

🎙 why utility is everything

The utility function is the central notion. Once you've derived it, you can forget about the preference — and even forget the outcomes. Algorithms work on the utility function.

An agent is rational if it acts to maximise its utility. So the messy insurance outcomes ("lose the house but get 100 000€"…) collapse into bare numbers we can compute with:

	🔥 Fire	No fire
Buy insurance	1	4
No insurance	−100	5

This agent ranks the outcomes 5 ▸ 4 ▸ 1 ▸ −100. "No insurance, no fire" is best; "no insurance, fire" is catastrophic.

Utilities are personal: I might rate cheeseburger above salmon while you do the opposite. Different agents → different utility functions. There's no universal "correct" number.

One subtlety the professor flags: the scale

Scale · tap to reveal+

Ordinal

Only the order matters. The numbers are arbitrary — add 1000 to all of them and it's the same utility function. No averaging allowed.

Scale · tap to reveal+

Cardinal

The distances between numbers carry meaning. Add 1000 and you break it. You can average — required for expected utility (§6).

!Easy exam point

Expected utility (§6) takes a weighted average of utilities — so it needs a cardinal scale. On an ordinal scale an average is meaningless, because the spacing between the numbers means nothing. State this in one sentence and you've earned marks.

◇ Check yourself

Your utilities are 1, 4, 5, −100 on an ordinal scale. You add 1000 to every value. What happened?

On an ordinal scale only the order matters, so a uniform shift leaves the ranking — and therefore the utility representation — unchanged. (On a cardinal scale that same shift would corrupt the distances.)

5no probabilities

Decision under ignorance

You know the actions, the states and the utilities — but nothing about how likely each state is. "Simpler" to set up, but genuinely harder to solve.

First tool: dominance

★Dominance — a relation between actions

(Preference was between outcomes; dominance is between actions.) Action a_j weakly dominates a_i if, in every state, a_j's outcome is at least as good (by utility). It strongly dominates if it's also strictly better in at least one state.

🎙 in plain words

B weakly dominates A when, no matter the actual state of the environment, action B always gives you an at-least-as-preferable outcome.

The golden rule: a rational agent should never pick a dominated action. Here's the menu — order before you know if the chef is any good:

	Good chef	Bad chef
Monkfish	4	1
Hamburger	3	3
No main course	2	2

No main course (2, 2) is strongly dominated by Hamburger (3, 3) — the burger wins in both columns, so a rational agent deletes it. But dominance can't choose between Monkfish and Hamburger: each is better in a different column.

!Why ignorance is "harder"

Dominance is a very weak rule — it deletes the obviously bad, but rarely names a winner. To go further you must add an extra assumption about the agent's attitude, and each assumption gives a different rule. There is no single best one — only the requirement that every rule must agree with dominance.

The rules — a playground 🎛️

Below is the professor's medication example, live. You feel ill but don't know the cause. Pick a rule and watch which action it selects, and why. Edit any number to test your understanding.

🎛️ Decision-Rule Explorer

Click a rule — the winning action lights up green and the reasoning appears below.

	Bacterial	Viral	Stress	Worst case
Probability				—
Take antibiotics				–
Take anti-fever				–
No medication				–

Rule · tap to reveal+

Maximin

Maximise the worst case. Take each action's minimum, then pick the biggest. The cautious, risk-averse agent.

Rule · tap to reveal+

Maximax

Maximise the best case. Take each action's maximum, then pick the biggest. The optimist.

Rule · tap to reveal+

Minimax Regret

Regret = your utility − the best possible in that state. Minimise your worst regret. Regret is a loss function — big in ML.

Rule · tap to reveal+

Averaging (OWA)

Compromise: weight the outcomes and average. Weights are NOT probabilities — under ignorance there are none. They only encode optimism.

!Two traps the professor loves

1. OWA weights are not probabilities — under ignorance there are none. They encode the agent's optimism, nothing more.

2. The indifference principle (give every state 1/|S|, then maximise expected utility) quietly invents probabilities. Popular, but contested: if you truly know nothing, why assume every state is equally likely?

✓The headline of §5

Under ignorance there is no single best rule. Each is justified by a different assumption, and they can disagree (try it in the playground!). The only universal law: stay coherent with dominance.

6probabilities arrive

Decision under risk

Now the agent has a probability distribution over the states. Suddenly everything gets clean — and there's one rule to rule them all.

p : S → [0, 1] with Σ_s p(s) = 1 e.g. "fire is 5% likely, no fire 95%" — a probability distribution over states

Two flavours of that p: frequentist (the real long-run frequency) or subjective (the agent's degree of belief). Both feed the same machine.

★Expected utility

The expected utility of an action is the probability-weighted average of the utilities it can produce — "how much utility do I expect on average if I take it?"

EU(a) = Σ_s p(s) · U(O(a, s))

✓The contrast with §5 — and the headline of §6

Under ignorance: many rules, no winner. Under risk: exactly one rule everyone agrees on.

Expected Utility Maximisation — pick the action with the highest EU.

Run the medication example again with probabilities 0.05 / 0.15 / 0.8 (hit Expected Utility in the playground):

	Bacterial (.05)	Viral (.15)	Stress (.8)	EU
Antibiotics	1	−1	−1	−0.9
Anti-fever	0.5	0.5	−0.5	−0.3
No medication	−1	−1	0	−0.2 ✓

The winner is No medication — even though it's never best in any single column! Because Stress is overwhelmingly likely (0.8), EU rewards what's safest in the world that probably happens. That counter-intuitive result is exactly the kind of thing the professor likes you to explain.

!Don't forget the scale

EU averages utilities — so it needs a cardinal scale (§4). On an ordinal scale the average is meaningless.

Why is EU "the" rule? Two theorems

It isn't arbitrary. Two famous results prove a rational agent must behave like an EU-maximiser. The professor won't ask you to prove them — just to state and discuss them.

1Von Neumann–Morgenstern (VNM)

A lottery is a probability distribution over utilities; under risk, every action is a lottery. If preferences over lotteries satisfy four axioms — completeness, transitivity, continuity, independence — then there's a utility function U (unique up to positive affine transformation) with L ≤ M ⟺ EU(L) ≤ EU(M). In words: the agent provably acts as if maximising expected utility.

The catch: VNM assumes the probabilities are already given. But where does p come from?

2Savage (Subjective EU)

Starting only from preferences over actions (plus axioms like the "sure-thing principle"), with finitely many states, there exists a unique subjective probability Q over states and a utility function such that preference matches EU comparison. So the probabilities aren't an input — they're derived from how the agent chooses. That's the "subjective" in subjective EU.

!Worth a sentence in the exam

Savage's Q is the agent's belief and may differ from the true probability. A rational agent's beliefs should converge to the truth (De Finetti's "Dutch book" argument), but Savage alone doesn't guarantee it — and the assumption that a single, knowable probability even exists isn't universally accepted (hence imprecise / multiple-priors models).

◇ Check yourself

What's the single most important difference between VNM and Savage?

Both justify expected-utility maximisation. The leap: VNM assumes a given probability, while Savage constructs a unique subjective probability from preferences alone — answering "where does p come from?"

★study like the exam

The Exam Lab

Knowing the theory isn't enough — you have to answer his way. Here's how the exam works, the answer skeleton, then four full worked examples in Lecture-1 territory.

📋 How the DSS exam works

Written exam on the course contents — mandatory, you must score ≥ 18 to pass. Then an optional essay: an individual academic document on a topic related to the course, discussed orally, worth up to +4 points. Good news: he will not ask you to prove theorems — he wants understanding, not derivations.

✍️ The answer skeleton (copy this every time)

① Define the concept "in your own words but as precisely as possible," using the exact vocabulary (utility, dominance, EU, cardinal/ordinal). ② Apply it to his scenario — show the computation/reasoning. ③ Decide & motivate — state what the rational agent does (or who's right) and why.

Question · Decision under risk

Define, in your own words but as precisely as possible, expected utility and the rule of expected-utility maximisation. Then an agent faces three states — Bacterial (p=.05), Viral (p=.15), Stress (p=.8) — with utilities Antibiotics (1, −1, −1), Anti-fever (.5, .5, −.5), No medication (−1, −1, 0). Which action should a rational agent take? Motivate.

① DEFINEUnder risk the states carry a probability distribution p. The expected utility of action a is the probability-weighted average of its outcome-utilities, EU(a)=Σ p(s)·U(O(a,s)). EU maximisation says a rational agent picks the action with the greatest EU. It needs a cardinal scale — averaging ordinal utilities is meaningless.

② APPLY

EU(Antibiotics) = .05·1 + .15·(−1) + .8·(−1) = −0.90 EU(Anti-fever) = .05·.5 + .15·.5 + .8·(−.5) = −0.30 EU(No medication) = .05·(−1) + .15·(−1) + .8·0 = −0.20

③ DECIDENo medication has the highest EU (−0.20), so a rational agent takes no medication — even though it's never best in any single state. The reason: the most probable state (Stress, .8) makes the medications costly, and EU rewards what's safest in expectation.

Question · Conceptual

Explain the difference between decision under ignorance and decision under risk. Under ignorance, is there a single "best" decision rule? Motivate.

① DEFINEUnder risk the agent has a probability distribution over the states; under ignorance it has no information about how likely they are — only which states are possible and which outcomes they produce.

② CONTRASTUnder risk there is essentially one agreed rule, expected-utility maximisation (justified by VNM and Savage). Under ignorance there are many — maximin, maximax, OWA averaging, minimax regret, the indifference principle — each justified by a different assumption about the agent's attitude.

③ DECIDENo — under ignorance there is no single best rule. The only universal constraint is coherence with dominance (never choose a dominated action). Which rule is "best" depends on assumptions added beyond bare rationality, so it's a modelling choice, not a theorem.

Question · Decision under ignorance

Define weak and strong dominance. Then for Monkfish (Good 4, Bad 1), Hamburger (3, 3), No main course (2, 2): identify dominated actions, and say whether dominance alone determines the choice. Motivate.

① DEFINEaⱼ weakly dominates aᵢ if U(O(aᵢ,s)) ≤ U(O(aⱼ,s)) for every state s; it strongly dominates if the inequality is strict for at least one state. A rational agent never picks a dominated action.

② APPLYNo main course (2, 2) is strongly dominated by Hamburger (3, 3) — strictly better in both states — so it's discarded. Monkfish (4, 1) and Hamburger (3, 3) do not dominate each other (each wins in a different state).

③ DECIDENo, dominance alone isn't enough. It removes "No main course" but can't rank the two survivors. Dominance is a very weak rule; to choose between Monkfish and Hamburger we need a further rule (maximin, or EU if probabilities are known).

Question · Foundations

State the Von Neumann–Morgenstern theorem: its assumptions, what it establishes, and its main limitation. Discuss.

① DEFINEA lottery is a probability distribution over utilities; under risk every action is a lottery. The theorem assumes preference ≤ over lotteries satisfies completeness, transitivity, continuity (if L ≤ M ≤ N, some mix of L and N is indifferent to M) and independence (mixing both sides with a third lottery preserves the preference).

② ESTABLISHThen a utility function U exists, unique up to positive affine transformation, with L ≤ M ⟺ EU(L) ≤ EU(M). So an agent obeying the axioms provably behaves as an EU-maximiser, and its scale is cardinal.

③ DISCUSSSignificance: it makes EU maximisation the rational rule under risk (not one option among many, as under ignorance). Limitation: it assumes the probabilities are already given — which is exactly what Savage's theorem removes by deriving a subjective probability from preferences alone.

🗣 Say these out loud (cover the page)

• What's the cascade decision → decision-making → DSS, and why "support" not "replace"?
• Define a utility function; which three assumptions make it possible?
• Why does expected utility require a cardinal scale?
• Give the menu matrix: which action does dominance remove, and why can't it finish the job?
• State maximin, maximax, minimax regret — what attitude does each encode?
• Why is the indifference principle "contested"?

How to decide well

🎯 Why you're here

Decisions → support systems

Decision

Decision-making

Support system

What is decision theory?

"Make" vs "should make"

One field, many settings

The model & the decision matrix

A concrete one: the insurance decision

Rationality & utility

Step 1 — preferences that make sense

Step 2 — pre-order vs. linear order

Step 3 — collapse it into a number: utility

One subtlety the professor flags: the scale

Ordinal

Cardinal

Decision under ignorance

First tool: dominance

The rules — a playground 🎛️

Maximin

Maximax

Minimax Regret

Averaging (OWA)

Decision under risk

Why is EU "the" rule? Two theorems

The Exam Lab