Decision theory
The normative study of how a single rational agent should choose among actions whose outcomes carry different value (utility) — it prescribes what an ideal agent ought to do, not what people actually do.
▸MoreLess
- “Normative” = it prescribes what an ideal rational agent ought to do — not what real people do (that is Lecture 5).
- Three ingredients: a set of actions, the possible states/outcomes, and a utility over those outcomes.
- Two regimes: decision under risk (probabilities known → maximise expected utility) vs decision under ignorance (no probabilities → apply a rule).
- It is the yardstick the rest of the course measures everything against.
🎯 In the examAlways first decide which regime you are in — risk (have probabilities) or ignorance (don’t) — because it picks the tool.
Utility
A real-valued function U(o) measuring how much an agent prefers outcome o; acting rationally means choosing so as to maximise it (a concave U over money = risk-averse).
▸MoreLess
- Only order and relative spacing matter: utility is an interval scale, so any positive affine transform (
a·U+b, a>0) represents the same preferences. - It need not be money — a concave utility over money encodes risk aversion (a sure €50 preferred to a 50/50 shot at €100).
- For a utility to exist at all, preferences must be complete and transitive (part of the VNM axioms).
- The recurring thread: utility becomes the reward in L4, the DSS’s objective in L6, and net benefit in L7.
🎯 In the examNote it is on an interval scale and that concavity = risk aversion.
Expected utility
The probability-weighted average of an action’s outcome utilities; under risk (probabilities known) a rational agent picks the action with the highest expected utility — justified by the von Neumann–Morgenstern axioms.
▸MoreLess
- VNM theorem: if your preferences over lotteries obey four axioms (completeness, transitivity, continuity, independence) you behave as if maximising expected utility.
- This is the rule for decision under risk (probabilities given) — not ignorance.
- Maximising expected value (money) is the special case where U is linear; expected utility lets you encode a risk attitude.
- L5 shows humans systematically break this (Allais paradox, prospect theory) — exactly why the course pivots to behavior.
ExampleAction A gives 0.5·U(win)+0.5·U(lose); compute EU for each action and choose the largest.
🎯 In the examClassic prompt: define expected utility, give the formula, and cite the VNM justification.
Decision under ignorance
Choosing when the outcome probabilities are unknown: you cannot take an expectation, so you apply a decision rule that encodes your attitude to the unknown.
▸MoreLess
- Maximin (Wald): maximise the worst case — pessimistic / cautious.
- Maximax: maximise the best case — optimistic.
- Hurwicz: blend worst and best with an optimism coefficient α.
- Minimax-regret (Savage): minimise the largest regret (gap from the best you could have done in each state).
- Laplace / insufficient reason: assume all states equally likely, then maximise expected utility.
ExampleInvesting with no idea of the odds: maximin picks the option whose worst outcome is least bad.
🎯 In the examBe able to compute each rule from a payoff table and state the attitude it encodes.
Bayesian network
A directed acyclic graph whose nodes are random variables and whose edges encode direct dependence; each node stores P(X|parents) and the product of these gives the full joint distribution compactly.
▸MoreLess
- Directed and acyclic (a DAG): arrows go from parent (cause) to child (effect); no cycles.
- Each node only needs
P(node | its parents), so a sparse graph means dramatically fewer parameters. - It encodes independencies: a node is independent of its non-descendants given its parents (the local Markov property).
- Main use is inference: observe some variables, compute the posterior over the rest (diagnosis, prediction).
ExampleRain → WetGrass ← Sprinkler: store P(Rain), P(Sprinkler|Rain), P(Wet|Rain,Sprinkler) instead of one 8-row joint table.
Conditional independence
X and Y are conditionally independent given Z when, once Z is known, Y carries no extra information about X (P(X|Y,Z)=P(X|Z)) — the property that lets the network factorise.
▸MoreLess
- It is conditional: X and Y can be dependent overall yet independent once Z is fixed (and vice-versa).
- Fewer dependencies ⇒ fewer edges ⇒ smaller tables ⇒ a compact model.
- “d-separation” lets you read these independencies straight off the graph’s shape.
- It is the reason the joint isn’t a full 2ⁿ table and why inference can sometimes be efficient.
ExampleGiven it’s raining (Z), whether the sprinkler is on (Y) tells you nothing more about the season (X).
Decision network
A Bayesian network extended with decision nodes and a utility node; solving it means choosing the decisions that maximise expected utility (also called an influence diagram).
▸MoreLess
- Three node types: chance (ovals = random variables), decision (rectangles = your choices), utility (diamond = the objective).
- Solving it = pick the decision(s) that maximise expected utility given the evidence.
- It directly unifies L1 (expected utility) with L2’s compact representation.
- It is the basis of the “probabilistic” model paradigm catalogued in Lecture 6.
🎯 In the examName the three node types and that the output is the EU-maximising decision.
Inference is #P-hard
Even though the network is compact to store, computing an exact marginal or posterior is #P-hard in general (as hard as counting a Boolean formula’s solutions) — so in practice we use approximate inference.
▸MoreLess
- #P is the counting analogue of NP: not “is there a solution?” but “how many?” — believed even harder than NP-complete.
- Exact inference is only efficient for special structures (e.g. trees / low treewidth).
- So in practice we use approximate inference: sampling (MCMC), variational methods, loopy belief propagation.
- The lesson generalises: a good representation does not guarantee cheap computation — a recurring DSS theme.
🎯 In the examKey word is #P-hard; the takeaway is “use approximate inference”. (Complexity was flagged as lighter for the exam.)
Game theory
The study of decisions among several rational agents where each one’s payoff depends on the others’ choices, so each must reason about the others’ reasoning.
▸MoreLess
- Key shift from L1: the outcome of your action depends on what others do, so you must reason about their reasoning.
- Non-cooperative game theory: players cannot make binding agreements; each maximises its own payoff.
- Central questions: what is a “solution” to a game? (Nash) and can players do better by coordinating? (correlated equilibrium).
- Games can be zero-sum (pure competition) or general-sum (mixed motives, like the prisoner’s dilemma).
🎯 In the examThe course focuses on non-cooperative games with the Nash equilibrium as the solution concept.
Normal form
A representation of a game as a matrix giving each player’s payoff for every combination of the players’ strategies (it assumes simultaneous moves).
▸MoreLess
- Also called strategic form; it assumes simultaneous moves (neither sees the other’s choice first).
- Contrast with extensive form (a game tree) used for sequential moves.
- From the matrix you find dominant strategies, best responses, and equilibria.
- A strategy may be pure (one action) or mixed (a probability distribution over actions).
ExampleThe prisoner’s dilemma is a 2×2 matrix where (Defect, Defect) is the equilibrium even though (Cooperate, Cooperate) is better for both.
Nash equilibrium
A strategy profile in which no player can increase their own payoff by changing strategy unilaterally — every player is simultaneously best-responding. It always exists (possibly mixed) but need not be efficient.
▸MoreLess
- “Unilateral”: hold the others fixed — if no single player gains by deviating alone, it is an equilibrium.
- Existence (Nash’s theorem): every finite game has at least one equilibrium, possibly in mixed strategies.
- It need not be unique, Pareto-efficient, or “fair” — the prisoner’s dilemma equilibrium is worse for both than cooperating.
- Interpretation: a self-enforcing convention / fixed point of mutual best response — not necessarily a prediction of real behaviour.
- Computing one is generally hard (PPAD-complete in general games).
ExamplePrisoner’s dilemma → (Defect, Defect): neither prisoner gains by switching alone, though both prefer (Cooperate, Cooperate).
🎯 In the examThe prof’s verbatim prompt: “define a Nash equilibrium and discuss its interpretation.” Hit mutual best response · always exists (maybe mixed) · not always efficient.
Mixed strategies
A strategy that plays each action with some probability rather than deterministically; allowing mixtures is what guarantees every finite game has at least one Nash equilibrium.
▸MoreLess
- A pure strategy is the special case that puts probability 1 on a single action.
- In a mixed equilibrium each player is indifferent among the actions they randomise over — that indifference is what makes it stable.
- Why needed: games like rock-paper-scissors or matching pennies have no pure-strategy equilibrium.
- Connects to L4’s stochastic policies and to the adversarial dynamics of GAN training mentioned in lecture.
🎯 In the examExplain why mixed strategies are needed (unpredictability) and that they guarantee existence.
Correlated equilibrium
An equilibrium with respect to a shared random signal: given its private recommendation no player wants to deviate. It generalises Nash and can reach joint payoffs that beat every Nash equilibrium.
▸MoreLess
- A trusted “mediator” privately recommends an action to each player; following it is a best response if everyone else follows too.
- Every Nash equilibrium is a correlated equilibrium, but not vice-versa — the set is strictly larger.
- It can achieve expected payoffs that beat every Nash equilibrium.
- It is computationally easier to find than Nash (it is a linear program).
ExampleTraffic light: given your light colour you have no incentive to deviate, and collisions are avoided.
Knowledge spectrum
Sequential decision-making = acting over time where today’s choice changes tomorrow’s state; which method you use depends on what you know — model → dynamic programming · no model → reinforcement learning · simulator → MCTS.
▸MoreLess
- The knowledge spectrum: know the model → plan with dynamic programming · no model → learn with reinforcement learning · have a simulator → search with MCTS.
- A policy π maps each state to an action; the goal is the policy maximising expected discounted return.
- The discount γ trades immediate vs future reward and keeps infinite-horizon sums finite.
- Everything in the lecture hinges on the Bellman optimality condition.
🎯 In the examThe prof said he will NOT ask for formulas here — only intuition (e.g. “explain Bellman optimality in intuitive terms”).
MDP ⟨S,A,P,R,γ⟩
A Markov Decision Process — states, actions, stochastic transitions P(s′|s,a), reward and discount γ — obeying the Markov property: the next state depends only on the current state and action.
▸MoreLess
- Markov property: the future depends only on the current state and action, not the whole past — this is what makes it solvable.
- Transitions are stochastic: the same action can lead to different next states (the uncertainty thread continuing from L2).
- Reward
R(s,a)is the per-step payoff; the agent maximises the discounted sum (the “return”). - γ near 0 = myopic / short-sighted; γ near 1 = far-sighted.
ExampleA robot on a grid: state = cell, actions = N/E/S/W, moves may “slip”, reward = +10 at the goal.
Bellman equation
The recursive optimality condition: a state’s value equals the best immediate reward plus the discounted expected value of the next state — the identity every Lecture-4 algorithm solves.
▸MoreLess
- It is a self-consistency condition: the optimal value is defined in terms of itself one step later.
Q*(s,a)is the same idea for state-action pairs; the optimal policy isπ*(s) = argmaxₐ Q*(s,a).- Solving this equation is solving the MDP — value iteration, policy iteration and Q-learning all do it.
- The “max” makes it nonlinear, which is why we iterate rather than solve in closed form.
🎯 In the examThe prof’s example: “explain the Bellman optimality condition intuitively” — emphasise the now-vs-future split, not the algebra.
Value / policy iteration
Dynamic-programming algorithms that, when the model is known, repeatedly apply the Bellman update until they converge to the optimal value function and policy.
▸MoreLess
- Value iteration: repeatedly apply the Bellman update to V until it stops changing, then read off the greedy policy.
- Policy iteration: alternate evaluating the current policy and improving it greedily — fewer but heavier steps.
- Defining assumption: you must KNOW P and R (the full model).
- Guaranteed to converge to the optimal policy for a finite discounted MDP.
ExampleThe lesson’s robot corridor converges to V(1)=5.39, V(2)=7.1, V(3)=9 with policy “always go right”.
Reinforcement learning
Learning an optimal policy from sampled experience when the model is unknown (e.g. Q-learning), trading off exploration against exploitation.
▸MoreLess
- Model-free: the agent never builds P explicitly; it learns values directly from experience tuples (s, a, r, s′).
- Q-learning nudges
Q(s,a)towardr + γ·maxₐ′ Q(s′,a′)— a sampled Bellman update. - Exploration vs exploitation: you must try new actions to learn but exploit known-good ones to score (e.g. ε-greedy).
- Common distinction: off-policy (Q-learning) vs on-policy (SARSA).
🎯 In the examKey contrast with DP: RL has no model and learns from interaction.
Monte Carlo Tree Search
A planning method that, given a simulator, estimates action values by sampling many look-ahead roll-outs and selectively growing the search toward promising branches (selection via UCB).
▸MoreLess
- Four steps per iteration: Selection → Expansion → Simulation (roll-out) → Backpropagation.
- UCB balances trying under-explored moves vs exploiting moves that look good (the explore/exploit theme again).
- Needs a generative model / simulator, not the full probability tables — a middle ground between DP and RL.
- Powers AlphaGo / AlphaZero when paired with learned value and policy networks.
🎯 In the examBe able to name the four steps and say what UCB is for.
Normative → descriptive (the pivot)
The shift from normative models (how an agent should decide — L1–4) to descriptive ones (how people actually decide). A DSS does not operate in isolation, so it must model human decision-making, not only optimal decisions.
▸MoreLess
- Normative models (decision theory, game theory, MDPs/RL) say how agents should decide — enough only if systems acted in isolation.
- But a DSS’s recommendations are interpreted by humans who may ignore or misinterpret them and have cognitive limits.
- So a DSS must account for not just the optimal decision but actual human decision-making behaviour.
- Behavioral decision theory is far less structured than normative theory — a set of important ideas, not one clean axiom system.
🎯 In the examFrame the normative→descriptive pivot — it is the conceptual hinge of the whole syllabus.
Failures of rationality
A set of empirical experiments showing humans systematically violate expected-utility theory: preference reversal, the Allais paradox (violating the independence axiom) and the Ellsberg paradox (ambiguity aversion).
▸MoreLess
- Preference reversal: people choose A over B yet price B above A → no single stable utility
Ufits both tasks (preferences aren’t representation-invariant). - Allais paradox: typical choices (A≻B, D≻C) violate the independence axiom — preferences shouldn’t depend on a shared/irrelevant alternative.
- Ellsberg paradox: people prefer the urn with known proportions → they separate risk (known probabilities) from ambiguity (unknown) and are ambiguity-averse.
- Two readings: humans are irrational, or the classical rationality assumptions are too strong — either way it reshapes DSS design.
ExampleEllsberg: bet on “black” then on “red” — most people pick the 49/51 known urn both times, which is internally inconsistent.
🎯 In the examName the three violations and what each breaks; stress risk-vs-ambiguity for Ellsberg.
Bounded rationality
Herbert Simon’s principle that agents are limited by cognitive resources, time and incomplete information, so perfect optimisation is often infeasible; they optimise only over the set of feasible (computable) strategies C.
▸MoreLess
- Agents are bounded by cognitive resources, time constraints and incomplete information.
- Optimal decisions are often computationally infeasible (combinatorial optimisation, planning, equilibrium computation — many are NP-hard).
- Classical
π* = argmaxₚ E[U]becomesπ ≈ argmax_{π∈C} E[U]over computable strategies C. - Critique: it still requires optimising and doesn’t say which rules humans actually use → that motivates heuristics next.
🎯 In the examAttribute to Simon; give the feasible-set view and note it doesn’t say which rules humans use.
Procedural rationality
Rubinstein’s proposal to judge the decision procedure, not just the outcome: a decision rule is a mapping f : I → A from available information to actions, so rationality depends on the computational structure used.
▸MoreLess
- Classical and bounded rationality both focus on outcomes; Rubinstein focuses on the procedure.
- A state s is described by a set of cues
c(s); a heuristic maps cue-sets to actions,h(c(s)) ≈ argmaxₐ U(O(a,s)). - It replaces optimisation with rule-based computation.
- This frames the specific decision rules studied next — the heuristics.
🎯 In the examDefine a decision rule f : I → A and the procedural (vs outcome) view; name Rubinstein.
Heuristic decision rules
The specific simple rules that replace optimisation with sequential, rule-based search over cues: satisficing, take-the-best, the recognition heuristic, and elimination-by-aspects.
▸MoreLess
- Satisficing (Simon): pick the first action whose utility clears an aspiration level θ — early stopping; depends on the order of actions.
- Take-the-best: evaluate cues in order, stop at the first cue that discriminates — depends on the order of cues.
- Recognition heuristic: if you recognise one option and not the other, choose the recognised one (e.g. infer the larger city by name recognition).
- Elimination-by-aspects (Tversky): pick a cue by importance, eliminate options that fail it, repeat — sequential filtering.
🎯 In the examName the four rules and each one’s distinguishing feature (satisficing = order of actions; take-the-best = order of cues).
From heuristics to biases
Because a heuristic only approximates the optimum it carries an error ε(s)=U(a*)−U(h(c(s)))≥0; a bias is when this approximation error is systematic (directional) rather than random.
▸MoreLess
- Heuristics cut computational, memory and information cost — but they approximate, so
ε(s)=U(a*)−U(h(c(s))) > 0. - A random error averages out:
E[ε]=0; a bias does not:E[ε]≥0. - So biases are directional, predictable and statistically regular deviations.
- Behavioral decision theory studies these regularities empirically.
🎯 In the examGive the formal contrast E[ε]=0 (random) vs E[ε]≥0 (bias) — that is why biases are predictable.
Cognitive biases (taxonomy)
A large catalogue of systematic biases grouped into six families; not all stem from heuristics, but all are predictable enough that a DSS can be designed to counter them.
▸MoreLess
- Judgement under uncertainty: availability, representativeness, anchoring, over/optimism bias.
- Probabilistic reasoning: base-rate neglect, conjunction fallacy, gambler’s fallacy, confirmation, illusion of control.
- Memory & attention: hindsight, salience, recency, the framing effect, selective perception.
- Temporal: present bias, hyperbolic discounting, status-quo bias, sunk-cost fallacy, planning fallacy.
- Social/attribution: attribution error, in-group, authority, halo, self-serving. Decision/choice: loss aversion, endowment, decoy, preference reversal, ambiguity aversion.
🎯 In the examName the six categories; for any bias, define it + give an example (the define→apply pattern).
Ecological rationality
Gigerenzer’s view that there is no universally optimal decision rule — rationality is relative to the environment, and a heuristic’s quality is its expected performance over an environment distribution.
▸MoreLess
- Classical optimisation assumes complete information, stable environments and accurate models; real ones are uncertain, noisy and non-stationary.
- Under uncertainty, simple heuristics can generalise better than highly optimised procedures, which may overfit (“less-is-more”).
- Quality is judged ecologically:
Perf(h,E) = E_{s∼E}[U(h(s))]over an environment distribution E. - Some heuristics are well-adapted to specific environments; none is universally best.
🎯 In the examState that rationality is environment-relative and give the less-is-more / overfitting argument; name Gigerenzer.
Fast-and-frugal heuristics & trees
The mechanism behind ecological rationality: fast-and-frugal heuristics use sequential cue search, limited cue integration, early stopping and no global optimisation. The classic example is the fast-and-frugal tree (FFT).
▸MoreLess
- Four defining features: sequential information search, limited cue integration, early stopping, no global optimisation.
- Ecological rationality is the criterion; fast-and-frugal heuristics are the mechanisms (take-the-best, recognition are examples).
- An FFT is a decision tree with sequential binary cue checks and a possible exit at every node — decide after inspecting only a subset of cues.
- Medical triage: chest pain? → admit; else abnormal ECG? → admit; else high BP? → medium risk; else discharge — no probabilistic aggregation.
🎯 In the examList the four fast-and-frugal properties and describe an FFT (early exit, only a subset of cues).
Naturalistic decision making (RPD)
Klein’s account of how real experts decide under time pressure in dynamic, uncertain settings without explicit optimisation. The Recognition-Primed Decision (RPD) model formalises it.
▸MoreLess
- NDM shifts focus from environment structure to real-world expert behaviour under time pressure.
- How do experts decide well without optimisation? Answer: recognition + experience + mental simulation.
- RPD model: (1) recognise a familiar situation → (2) retrieve a plausible action → (3) mentally simulate its consequences → (4) execute if satisfactory.
- Options are evaluated sequentially, not optimised simultaneously — so RPD is ecological and connects to satisficing.
🎯 In the examGive the four RPD steps and note it is sequential/satisficing-like, not optimisation; name Klein.
Prospect theory
Kahneman & Tversky’s model of preference under risk: it replaces a stable utility U(x) with a value function V(x−r) over gains/losses relative to a reference point r, so an identical outcome can be a gain or a loss depending on context.
▸MoreLess
- It targets the deeper failure:
Umay not be stable across contexts — the issue is how outcomes are represented & evaluated, not just computed. - Reference dependence: the same outcome is a gain or a loss depending on r (
x ↦ V(x−r)). - S-shaped value function:
V(x)=x^αforx≥0,−λ(−x)^βforx<0, with α,β∈(0,1) — concave for gains, convex for losses. - It complements process models (bounded rationality, heuristics): “Behavioral DT = Computation + Representation”.
ExampleA “+€1000 raise” feels like a loss if you expected +€2000 — the reference point flips the sign.
🎯 In the examWrite U(x) → V(x−r); stress reference dependence and the S-shape (concave gains / convex losses).
Loss aversion & probability weighting
The two further departures of prospect theory: losses loom larger than equal gains (|V(−x)| > V(x), λ>1), and objective probabilities are replaced by a nonlinear weighting w(p).
▸MoreLess
- Loss aversion: the value curve is steeper for losses,
|V(−x)| > V(x)for x>0 (λ≈2) — losses hurt more than equal gains please. - It explains the endowment effect, status-quo bias and risk aversion in gains.
- Probability weighting:
p → w(p), nonlinear — small probabilities over-weighted, medium/high under-weighted (why we buy both lottery tickets and insurance). - Decision value
PU(L)=Σ w(pᵢ)V(xᵢ) ≠ EU(L); limitation: prospect theory models valuation, not the decision process.
🎯 In the examState λ>1 (loss aversion) and the w(p) shape; note PU ≠ EU and that PT covers valuation, not process.
DSS Architecture & Design
Formal view of a DSS
The view of a DSS as a function D : 𝓘×𝓢×𝓤 → 𝓐 from information, world-state and user to a recommended action — its output is advice, leaving the final decision to the human.
▸MoreLess
𝓘= available information/data ·𝓢= state of the world/problem ·𝓤= the user (skills, preferences, context) ·𝓐= the action/recommendation space.- Output ≠ decision: the system recommends; the human decides and is accountable.
- Including
𝓤is exactly what makes it a support system rather than an autonomous agent. - It generalises the decision rules of L1 into a system-level mapping.
🎯 In the examWrite the signature and stress that 𝓤 (the user) is an input and the output is a recommendation.
4-part architecture
The canonical DSS structure — a data, a knowledge, a model and a user-interface subsystem joined by a workflow — built to separate representation from computation.
▸MoreLess
- Data subsystem: stores and serves the raw inputs/observations.
- Knowledge subsystem: domain rules, constraints and expert knowledge.
- Model subsystem: the reasoning/computation engine — where the L1–L5 paradigms live.
- User interface: how recommendations and their rationale are communicated — critical for trust and usability.
- Modularity payoff: you can swap the model without rebuilding the data layer or the UI.
🎯 In the examList the four components and the “separate representation from computation” principle.
Model paradigms
The families of model that can fill the model subsystem — knowledge-based, model-driven, probabilistic, machine-learning, sequential or hybrid — i.e. every earlier lecture reused as a building block.
▸MoreLess
- Knowledge-based: explicit rules / expert systems — transparent but brittle.
- Model-driven: optimisation and decision-theoretic models (L1).
- Probabilistic: Bayesian / decision networks (L2).
- Learning: machine-learning models fit from data; Sequential: MDP/RL planners (L4); plus hybrids.
- Trade-offs everywhere: accuracy vs transparency vs data needs vs robustness — no single best choice.
🎯 In the examMap each paradigm back to its lecture and note the accuracy/transparency trade-off.
Human factors & trust
The design constraints coming from the user — cognitive load, framing, and trust calibration: matching the user’s reliance to the system’s real reliability (avoiding over- and under-reliance).
▸MoreLess
- Trust calibration: ideally reliance tracks actual reliability — trust the system when it’s right, override it when it’s wrong.
- Over-reliance / automation bias: blindly following the system, including its errors.
- Under-reliance / algorithm aversion: ignoring a system that is actually better than you.
- Design levers: transparency, explanations, communicating uncertainty, and managing cognitive load and framing.
- This sets up Lecture 7, where trust and reliance become things you must measure.
🎯 In the examDefine trust calibration and name the two failure modes (over- and under-reliance).
Why evaluation is hard
Assessing whether a DSS genuinely improves decisions means judging the whole human-plus-system — which is socio-technical and multi-dimensional — not just the model’s predictive accuracy.
▸MoreLess
- Why it’s hard: it’s socio-technical (a human+model team), outcome vs process, and inherently multi-dimensional.
- Predictive metrics (accuracy, etc.) are necessary but far from sufficient.
- You must also check calibration, cost/utility (net benefit), fairness, robustness to shift, explainability, usability and adoption.
- And the human-DSS interaction itself — the team can be worse than either part alone.
🎯 In the examLead with “good DSS ≠ good predictor” and enumerate the extra evaluation dimensions.
Confusion-matrix metrics
Metrics built from the 2×2 table of true/false positives & negatives — precision=TP/(TP+FP), recall=TP/(TP+FN), specificity, F1 — where plain accuracy misleads when one class is rare.
▸MoreLess
- Accuracy = (TP+TN)/total — but “always predict the majority” scores high on imbalanced data (the “accuracy is lying” trap).
- Precision: of those flagged positive, how many really are (the cost of false alarms).
- Recall / sensitivity: of the real positives, how many you catch (the cost of misses).
- Specificity = TN/(TN+FP); F1 = harmonic mean of precision & recall; balanced accuracy averages recall across classes.
- Which to optimise depends on the costs of the two error types — a decision-theoretic choice.
ExampleRare disease (1% prevalence): predicting “healthy” for everyone is 99% accurate but has 0% recall — useless.
🎯 In the examDefine precision vs recall and explain when accuracy misleads.
Calibration & ROC
Calibration = predicted probabilities match observed frequencies (a reliability diagram on the diagonal); ROC/AUC compares performance across all thresholds and net-benefit analysis weighs the cost of errors.
▸MoreLess
- Calibration vs discrimination: a model can rank cases well (high AUC) yet output badly-scaled probabilities.
- Reliability diagram: plot predicted vs observed frequency; on the diagonal = well-calibrated.
- ROC curve: true-positive vs false-positive rate across all thresholds; AUC summarises it threshold-free.
- Net benefit / decision-curve analysis turns a threshold into expected utility by weighing false-positive vs false-negative costs — the utility thread from L1.
🎯 In the examDistinguish calibration (probabilities honest) from discrimination (ranking / AUC).
Fairness & robustness
The concerns beyond accuracy — distribution shift (covariate/label/concept), out-of-distribution robustness, fairness criteria (often mutually incompatible) and explanation quality.
▸MoreLess
- Distribution shift: covariate shift (inputs change), label shift (base rates change), concept shift (the input→output relation changes).
- Robustness / OOD: behaviour on inputs unlike the training data, and whether it fails gracefully.
- Fairness criteria — demographic parity, equalized odds, calibration-within-groups — are provably incompatible in general; you cannot satisfy all at once.
- Explainability: explanations must be faithful and actually understood — perceived understanding ≠ real understanding.
🎯 In the examName the three shift types and note that fairness criteria are mutually incompatible.
Human–DSS interaction
Evaluation of the human-DSS team — which can perform worse than either part alone — distinguishing trust from reliance and reliance from agreement.
▸MoreLess
- Complementarity is not automatic — a strong model + a human can underperform either one solo.
- Trust (an attitude) ≠ reliance (a behaviour) ≠ agreement (did they follow it this time).
- Appropriate reliance = following when the system is right and overriding when it is wrong (the calibrated trust from L6).
- Methods: RCTs, A/B tests, simulation studies, longitudinal field studies; plus usability (e.g. SUS) and adoption.
🎯 In the examStress “the team can be worse than either alone” and the trust / reliance / agreement distinctions.
🧵 Threads the ideas that recur
💰 Utility
The value every decision tries to maximise.
🎲 Uncertainty
We reason over probability distributions, never certainties.
👥 Other agents
Others’ goals interact with yours.
🧠 Humans aren’t rational
The fact that flips theory into real systems.
⚖️ Optimal ≠ effective
An optimal answer a human won’t accept is useless.
🔁 The big shape
The arc of the whole course, in one breath.
🧮 Decision rules lecture 1 quick reference
| Rule | What it does | Attitude |
|---|---|---|
| Maximin | Maximise the worst-case outcome | Pessimistic / cautious |
| Maximax | Maximise the best-case outcome | Optimistic |
| Hurwicz | Blend worst & best with optimism coefficient α | Tunable |
| Minimax-regret | Minimise the largest regret vs the best you could’ve done | Avoids “if only…” |
| Laplace | Assume all states equally likely → maximise expected utility | Indifferent |
| Expected utility | Probabilities known → maximise Σ P·U | Rational under risk |