Lecture 2 · Representing Decisions: Bayesian & Decision Networks

1from theory to machine

We have the rule. Now run it.

Lecture 1 gave us decision rules. None of them told us how to implement the thing on a computer. That's this lecture.

🎙 the transition (start of lecture 2)

So far we saw the principle of decision under ignorance and decision under risk, and different decision rules. However, we have not discussed how these rules can be implemented — and this will be the main subject.

The professor focuses on decision under risk, for a good reason:

↯Why ignorance is boring (computationally)

Under ignorance you just have a table of actions × states. Any rule (maximin, regret…) only needs you to walk the cells — that's roughly quadratic, nothing clever. Under risk, expected-utility maximisation needs the probability of every state, and that's where the real computational drama lives.

2vocabulary detour

A quick word on complexity

✎Not in the exam — but worth a skim

The professor was explicit: this complexity section is NOT examinable. He includes it only so the words "P", "NP", "#P-hard" don't feel alien when they appear later. Read it once for the vocabulary, then move on. Everything from §3 onward is examinable.

A decision rule is an algorithm (maximin is an algorithm; "compute every EU and take the max" is an algorithm). So we care how many resources — time and space — it needs, in the worst case, as the input grows.

Complexity theory studies the resources a problem needs, independent of any specific algorithm. The cast you'll meet:

Class · tap+

P

Problems solvable in polynomial time — intuitively, "solvable efficiently." Sorting (n²), matrix multiply (n³).

Class · tap+

NP

Problems whose proposed solution can be checked efficiently by a verifier given a certificate. "Is this formula true? here's a proof — verify it."

Class · tap+

P = NP?

Is finding a solution as hard as checking one? Almost certainly no — but unproven. The famous ~$1M open question.

Class · tap+

#P ("sharp-P")

Not yes/no but counting: how many certificates (solutions) does an NP problem have? This is the one that bites us when computing expected utility (§3).

Class · tap+

EXP · PSPACE

Solvable in exponential time (EXP), or using polynomial memory (PSPACE). Bigger, slower neighbourhoods.

Class · tap+

C-hard / C-complete

C-complete = the hardest problems in C. C-hard = at least as hard as anything in C. So "#P-hard" = at least as hard as the hardest counting problem.

🎙 the intuition for P vs NP

Finding a proof for a theorem was quite difficult, but once the professor showed you a proof, it is easier to understand. Is finding a solution as hard as checking one? Intuitively no — checking is easier — but we do not know this for sure.

3the real problem · examinable

When states explode

Expected utility needs the probability of every state. The trouble starts when a state is built from many smaller variables.

★Composite state

A composite state is a state defined by a combination of atomic variables. With n binary variables there are 2ⁿ states — and you must specify 2ⁿ − 1 probabilities (the last is fixed because they sum to 1).

🎙 the satellite example

Imagine a decision support system for a satellite. It's powered by a battery, charged by a solar panel. There can be a battery failure (B) or a solar-panel failure (S); either causes an electrical-system failure (E); which can cause a trajectory deviation (D) or a communication loss (C).

Five binary variables — B, S, E, D, C — give 2⁵ = 32 states, so 31 probabilities to pin down. The obvious move is to store the whole decision matrix as one big table. Fine for 32… but:

!The table doesn't scale

Its size grows as O(2ⁿ). That breaks down when the device is memory-constrained (a satellite isn't a data centre) or when states are built from hundreds or thousands of variables. And actions can be composite too, making it worse.

Play with the explosion yourself — drag n upward and watch the naive table run away:

📦 Storage Explorer

Full table vs. a Bayesian Network (§4). How many probability values must you store?

variables 5 (atomic components of a state) max in-degree d 2 (most parents any node has)

Full table 2ⁿ − 131

Bayesian Network ≤ n · 2ᵈ20

For the satellite the real BN needs just 10 values (the n·2ᵈ figure is a worst-case bound). The gap only widens with n.

◇ Check yourself

A state is built from 10 independent binary variables. How many probabilities does the full joint distribution require?

n binary variables give 2ⁿ states, so 2ⁿ − 1 free probabilities (the last is fixed since they sum to 1). For n=10 that's 1023 — already showing why the table doesn't scale.

4compact storage · examinable

Bayesian Networks

Don't store one giant joint distribution — break it into small local pieces along a graph.

★Definition

A Bayesian Network has two parts: (1) a directed acyclic graph (DAG) whose nodes are the variables; (2) for each node, a conditional probability table giving the distribution of that node for every combination of its parents' values.

Here's the satellite as a Bayesian Network: B and S cause E; E causes D and C. Tap a node to see its local table and how few numbers it needs.

B Battery failS Solar-panel failE Electrical failD Trajectory dev.C Comms loss

✓The win

Instead of 31 numbers, the network needs 1 + 1 + 4 + 2 + 2 = 10. In general, with n nodes and max in-degree d, storage is O(n · 2ᵈ) — usually far below O(2ⁿ).

Getting the joint back

To actually compute EU you still need joint probabilities. Rebuild them by multiplying the local tables, following a topological order (roots → leaves):

P(X₁,…,Xₙ) = ∏_i P( Xᵢ | Pa(Xᵢ) )each variable, conditioned on its parents

A bonus: BNs handle partial observability — if you only observe some variables, you marginalise (sum) over the unobserved ones:

P(X₁,…,X_k) = Σ_{x_k+1} … Σ_xₙ ∏_i P( Xᵢ | Pa(Xᵢ) )sum out everything you didn't observe

!The catch — and the big exam point

Saving space does not save time. To compute EU you must unroll the local tables back into the full joint, which is exponential. Exact inference — computing P(H|E), or the EU — is #P-hard: it's as hard as counting the satisfying assignments of a SAT formula. Not believed solvable efficiently.

↯So what do we do in practice?

Approximate inference — e.g. Markov-Chain Monte Carlo (MCMC). Simulate instead of computing exactly. (Detailed in the Unsupervised Learning course.)

◇ Check yourself

A Bayesian Network slashes storage from 2ⁿ to about n·2ᵈ. Does it also make computing expected utility fast?

The compression is purely about storage. To compute EU you must unroll back to the exponential joint, and exact inference is #P-hard (≈ counting SAT solutions). Practice leans on approximate inference like MCMC.

5adding actions · examinable

Decision Networks

A Bayesian Network only models states. A decision problem also has actions and utilities — so we extend it.

★Decision Network (a.k.a. Influence Diagram)

An extension of a Bayesian Network with three kinds of node and three kinds of edge. It gives a compact representation of the whole decision problem — states and actions.

Node · tap+

○ Chance node

A random variable — exactly like a Bayesian-Network node. Makes up the states.

Node · tap+

▢ Action node

Represents an action (a decision) the agent can take.

Node · tap+

◇ Utility node

Represents an outcome / utility. Can only be a leaf (nothing depends on it).

Edge · tap+

Conditional edge

Ends in a chance node. Same meaning as in a BN — it factorises the joint distribution.

Edge · tap+

Information edge

Ends in an action node. Means the action is taken knowing the value of the parent.

Edge · tap+

Functional edge

Ends in a utility node. Conditional + functional edges together are called causal edges.

The professor's example: three diagnostic tests O₁, O₂, O₃ hint at a Disease; we decide whether to Treat?; the utility depends on the disease and our treatment choice.

chance node action node utility node causal edge information edge

Solid arrows from Disease to the tests are conditional; the dashed arrows into Treat? are information edges (we decide knowing the test results); arrows into U are functional.

✓What it buys you — and the familiar catch

A DN compactly decomposes states and actions (e.g. in chess the "action" is a whole game — decompose it into per-move actions). But, exactly like BNs, running EU maximisation needs unrolling the network → it's again #P-hard. Efficient practical algorithms exist; the worst case stays hard.

🎙 you don't build this by hand

You do not need to implement all this about decision networks by yourself — there are libraries in Python that do it. For example the pyAgrum library, which is quite well supported and efficient.

◇ Check yourself

In a Decision Network, an edge that ends in an action node is called…

Edges into an action node are information edges: they say the decision is made knowing the parent's value. Conditional edges end in chance nodes; functional edges end in utility nodes.

★study like the exam

The Exam Lab

Lecture 2 in his define → apply → motivate format. Note carefully what's in and what's out.

📋 What's examinable here

Out: the computational-complexity primer (P, NP, #P, …) — context only, he won't ask it. In: composite states & the 2ⁿ blow-up, Bayesian Networks (definition, factorisation, storage savings, why inference is still hard, marginalisation), and Decision Networks (the 3 node types & 3 edge types). Same skeleton as always: ① define · ② apply · ③ motivate, and no theorem proofs.

Question · Bayesian Networks

Define a Bayesian Network. For the satellite (B, S → E → D, C, all binary), say how many probability values it needs versus the full table, write the joint factorisation, and motivate why it is more compact.

① DEFINEA Bayesian Network is a pair: a directed acyclic graph whose nodes are the variables, and, for each node, a conditional probability table giving its distribution for every combination of its parents' values. It factorises a joint distribution into local conditional ones.

② APPLYFull table: 2⁵ = 32 states → 31 values. The network needs one value per parent-combination per node:

B: 1 · S: 1 · E|B,S: 4 · D|E: 2 · C|E: 2 → total = 10

Joint: P(B,S,E,D,C) = P(B)·P(S)·P(E|B,S)·P(D|E)·P(C|E).

③ MOTIVATEIt is compact because it exploits conditional independence: each variable depends only on its parents, so storage is O(n·2ᵈ) (here d=2) instead of O(2ⁿ) — 10 vs 31, and the gap grows fast with n.

Question · The catch

A Bayesian Network represents states far more compactly than a full table. Does this efficiency carry over to computing expected utility? State the complexity of exact inference and motivate.

① CLAIMNo. The space saving does not translate into a time saving.

② MOTIVATETo compute expected utility we must unroll the local tables back into the full joint distribution, whose size is exponential in the number of variables. Exact inference (computing P(H|E), or the EU) is #P-hard — equivalent to counting the satisfying assignments of a SAT instance, a problem we don't believe is solvable efficiently.

③ PRACTICESo in practice we use approximate inference (e.g. Markov-Chain Monte Carlo). Libraries such as pyAgrum implement this and work well on real cases, though the worst case stays hard.

Question · Decision Networks

Define a Decision Network (influence diagram): its three node types and three edge types. How does it extend a Bayesian Network, and what does it let us decompose?

① DEFINEA Decision Network extends a Bayesian Network to whole decision problems. Nodes: chance (random variables, as in a BN), action (decisions), utility (outcomes; leaves only). Edges: conditional (into a chance node — factorises the distribution), information (into an action node — the action is taken knowing the parent's value), functional (into a utility node). Conditional + functional = causal edges.

② EXTENDA BN models only states; a DN adds actions and utilities, so it represents the entire decision problem and lets us decompose both states and actions (e.g. a whole chess game decomposed into per-move actions).

③ MOTIVATELike BNs it saves space, but EU maximisation still needs unrolling → #P-hard in the worst case; practical solvers (pyAgrum) exploit the structure.

Question · Composite states

What is a composite state, and why does it make expected-utility maximisation hard? Use the satellite example and motivate with the storage cost.

① DEFINEA composite state is a state defined by a combination of atomic variables. With n binary variables there are 2ⁿ states, needing 2ⁿ − 1 probabilities.

② APPLYThe satellite's state is (B, S, E, D, C) — 5 binary variables → 2⁵ = 32 states → 31 probabilities. EU maximisation must consider the probability of every state.

③ MOTIVATEStoring/scanning the full table is O(2ⁿ), which is infeasible on memory-constrained devices or with hundreds of variables. That blow-up is exactly why we move to compact representations (Bayesian / Decision Networks) — though, as noted, they fix the space problem, not the time one.

🗣 Say these out loud (cover the page)

• Why does decision under risk (not ignorance) raise the interesting computational question?
• What two parts make up a Bayesian Network? Write the joint factorisation.
• Satellite: 31 vs 10 — where does each number come from?
• Why is exact inference #P-hard even with a compact BN?
• Name the 3 node types and 3 edge types of a Decision Network.
• What does "marginalise" mean, and when do you need it?

Making it computable

🎯 What this lecture is really about

We have the rule. Now run it.

A quick word on complexity

P

NP

P = NP?

#P ("sharp-P")

EXP · PSPACE

C-hard / C-complete

When states explode

Bayesian Networks

Getting the joint back

Decision Networks

○ Chance node

▢ Action node

◇ Utility node

Conditional edge

Information edge

Functional edge

The Exam Lab