We have the rule. Now run it.
Lecture 1 gave us decision rules. None of them told us how to implement the thing on a computer. That's this lecture.
So far we saw the principle of decision under ignorance and decision under risk, and different decision rules. However, we have not discussed how these rules can be implemented — and this will be the main subject.
The professor focuses on decision under risk, for a good reason:
Under ignorance you just have a table of actions × states. Any rule (maximin, regret…) only needs you to walk the cells — that's roughly quadratic, nothing clever. Under risk, expected-utility maximisation needs the probability of every state, and that's where the real computational drama lives.
A quick word on complexity
The professor was explicit: this complexity section is NOT examinable. He includes it only so the words "P", "NP", "#P-hard" don't feel alien when they appear later. Read it once for the vocabulary, then move on. Everything from §3 onward is examinable.
A decision rule is an algorithm (maximin is an algorithm; "compute every EU and take the max" is an algorithm). So we care how many resources — time and space — it needs, in the worst case, as the input grows.
Complexity theory studies the resources a problem needs, independent of any specific algorithm. The cast you'll meet:
P
Problems solvable in polynomial time — intuitively, "solvable efficiently." Sorting (n²), matrix multiply (n³).
NP
Problems whose proposed solution can be checked efficiently by a verifier given a certificate. "Is this formula true? here's a proof — verify it."
P = NP?
Is finding a solution as hard as checking one? Almost certainly no — but unproven. The famous ~$1M open question.
#P ("sharp-P")
Not yes/no but counting: how many certificates (solutions) does an NP problem have? This is the one that bites us when computing expected utility (§3).
EXP · PSPACE
Solvable in exponential time (EXP), or using polynomial memory (PSPACE). Bigger, slower neighbourhoods.
C-hard / C-complete
C-complete = the hardest problems in C. C-hard = at least as hard as anything in C. So "#P-hard" = at least as hard as the hardest counting problem.
Finding a proof for a theorem was quite difficult, but once the professor showed you a proof, it is easier to understand. Is finding a solution as hard as checking one? Intuitively no — checking is easier — but we do not know this for sure.
When states explode
Expected utility needs the probability of every state. The trouble starts when a state is built from many smaller variables.
A composite state is a state defined by a combination of atomic variables. With n binary variables there are 2ⁿ states — and you must specify 2ⁿ − 1 probabilities (the last is fixed because they sum to 1).
Imagine a decision support system for a satellite. It's powered by a battery, charged by a solar panel. There can be a battery failure (B) or a solar-panel failure (S); either causes an electrical-system failure (E); which can cause a trajectory deviation (D) or a communication loss (C).
Five binary variables — B, S, E, D, C — give 2⁵ = 32 states, so 31 probabilities to pin down. The obvious move is to store the whole decision matrix as one big table. Fine for 32… but:
Its size grows as O(2ⁿ). That breaks down when the device is memory-constrained (a satellite isn't a data centre) or when states are built from hundreds or thousands of variables. And actions can be composite too, making it worse.
Play with the explosion yourself — drag n upward and watch the naive table run away:
Full table vs. a Bayesian Network (§4). How many probability values must you store?
For the satellite the real BN needs just 10 values (the n·2ᵈ figure is a worst-case bound). The gap only widens with n.
Bayesian Networks
Don't store one giant joint distribution — break it into small local pieces along a graph.
A Bayesian Network has two parts: (1) a directed acyclic graph (DAG) whose nodes are the variables; (2) for each node, a conditional probability table giving the distribution of that node for every combination of its parents' values.
Here's the satellite as a Bayesian Network: B and S cause E; E causes D and C. Tap a node to see its local table and how few numbers it needs.
Instead of 31 numbers, the network needs 1 + 1 + 4 + 2 + 2 = 10. In general, with n nodes and max in-degree d, storage is O(n · 2ᵈ) — usually far below O(2ⁿ).
Getting the joint back
To actually compute EU you still need joint probabilities. Rebuild them by multiplying the local tables, following a topological order (roots → leaves):
A bonus: BNs handle partial observability — if you only observe some variables, you marginalise (sum) over the unobserved ones:
Saving space does not save time. To compute EU you must unroll the local tables back into the full joint, which is exponential. Exact inference — computing P(H|E), or the EU — is #P-hard: it's as hard as counting the satisfying assignments of a SAT formula. Not believed solvable efficiently.
Approximate inference — e.g. Markov-Chain Monte Carlo (MCMC). Simulate instead of computing exactly. (Detailed in the Unsupervised Learning course.)
Decision Networks
A Bayesian Network only models states. A decision problem also has actions and utilities — so we extend it.
An extension of a Bayesian Network with three kinds of node and three kinds of edge. It gives a compact representation of the whole decision problem — states and actions.
○ Chance node
A random variable — exactly like a Bayesian-Network node. Makes up the states.
▢ Action node
Represents an action (a decision) the agent can take.
◇ Utility node
Represents an outcome / utility. Can only be a leaf (nothing depends on it).
Conditional edge
Ends in a chance node. Same meaning as in a BN — it factorises the joint distribution.
Information edge
Ends in an action node. Means the action is taken knowing the value of the parent.
Functional edge
Ends in a utility node. Conditional + functional edges together are called causal edges.
The professor's example: three diagnostic tests O₁, O₂, O₃ hint at a Disease; we decide whether to Treat?; the utility depends on the disease and our treatment choice.
Solid arrows from Disease to the tests are conditional; the dashed arrows into Treat? are information edges (we decide knowing the test results); arrows into U are functional.
A DN compactly decomposes states and actions (e.g. in chess the "action" is a whole game — decompose it into per-move actions). But, exactly like BNs, running EU maximisation needs unrolling the network → it's again #P-hard. Efficient practical algorithms exist; the worst case stays hard.
You do not need to implement all this about decision networks by yourself — there are libraries in Python that do it. For example the pyAgrum library, which is quite well supported and efficient.
The Exam Lab
Lecture 2 in his define → apply → motivate format. Note carefully what's in and what's out.
Out: the computational-complexity primer (P, NP, #P, …) — context only, he won't ask it. In: composite states & the 2ⁿ blow-up, Bayesian Networks (definition, factorisation, storage savings, why inference is still hard, marginalisation), and Decision Networks (the 3 node types & 3 edge types). Same skeleton as always: ① define · ② apply · ③ motivate, and no theorem proofs.
Define a Bayesian Network. For the satellite (B, S → E → D, C, all binary), say how many probability values it needs versus the full table, write the joint factorisation, and motivate why it is more compact.
P(B,S,E,D,C) = P(B)·P(S)·P(E|B,S)·P(D|E)·P(C|E).O(n·2ᵈ) (here d=2) instead of O(2ⁿ) — 10 vs 31, and the gap grows fast with n.A Bayesian Network represents states far more compactly than a full table. Does this efficiency carry over to computing expected utility? State the complexity of exact inference and motivate.
P(H|E), or the EU) is #P-hard — equivalent to counting the satisfying assignments of a SAT instance, a problem we don't believe is solvable efficiently.Define a Decision Network (influence diagram): its three node types and three edge types. How does it extend a Bayesian Network, and what does it let us decompose?
What is a composite state, and why does it make expected-utility maximisation hard? Use the satellite example and motivate with the storage cost.
O(2ⁿ), which is infeasible on memory-constrained devices or with hundreds of variables. That blow-up is exactly why we move to compact representations (Bayesian / Decision Networks) — though, as noted, they fix the space problem, not the time one.
• Why does decision under risk (not ignorance) raise the interesting computational question?
• What two parts make up a Bayesian Network? Write the joint factorisation.
• Satellite: 31 vs 10 — where does each number come from?
• Why is exact inference #P-hard even with a compact BN?
• Name the 3 node types and 3 edge types of a Decision Network.
• What does "marginalise" mean, and when do you need it?