Bilevel Optimisation — DO Radar Concepts

Why it's needed

Every standard optimisation model answers the same question: given a set of constraints, what's the best decision you can make? That question carries a quiet assumption — the constraints are fixed, external, indifferent to your choice. But what if one of your constraints is itself a rational agent with their own objective?

A regulator sets prices. A market player responds by adjusting their own behaviour to maximise their own profit. A network operator chooses which links to defend. An attacker re-routes around the defended ones. In every case the second agent is not a scenario drawn from a distribution — they are an optimiser reacting to the first agent's decision. Standard single-level optimisation cannot model this. You need a framework where one optimisation problem is nested inside another.

Where this structure actually appears:

Toll design: a road authority sets prices; commuters choose routes to minimise their own travel cost (including the toll).
Algorithm configuration: an outer loop sets solver parameters; the inner problem is the solver running to completion on an instance with those parameters.
Supply chain contracting: a manufacturer sets wholesale prices; retailers maximise their own margins by choosing stocking quantities.
Network interdiction: a defender removes links; an attacker re-routes optimally around the gaps.

The hard problem: the leader cannot enumerate all possible follower responses and pick the best one — the follower's optimal response depends on what the leader does, and the leader's feasible set depends on what the follower would do. The two optimisation problems are coupled.

The two naïve approaches both fail:

✗ IGNORE THE FOLLOWER

Solve your own problem as if the follower has no choice. The toll authority sets £8 per trip expecting full revenue.

At £8, every commuter takes the free surface street. Revenue: £0.

or

✗ MERGE THE PROBLEMS

Combine both objectives into one weighted sum. The combined problem treats commuter welfare and toll revenue as parts of the same goal.

The result is not a bilevel solution — it assumes the follower would cooperate with the leader's objective, which they won't.

Neither approach captures the real structure: the leader and follower each optimise their own objective, the follower reacts after observing the leader's choice, and the leader must commit knowing in advance how the follower will react.

What it does

Bilevel optimisation models the interaction as a two-stage game where the leader moves first and the follower responds rationally. The framework has four pieces:

The leader's decision (x) — the variable the leader controls and commits to first: toll levels, defender allocations, wholesale prices, solver parameters.
The follower's decision (y) — the variable the follower chooses after observing x: route choices, attack plans, stocking quantities, solver behaviour.
The follower's optimisation problem — given x, the follower solves their own problem and produces their best response y*(x). This is the nested constraint.
The leader's optimisation problem — the leader maximises (or minimises) their own objective F(x, y*(x)), but can only choose among values of x for which the follower's response y*(x) actually exists and is optimal for the follower.

The leader does not pick any (x, y) pair they like — they pick x and then the follower pins down y. The leader's feasible set is therefore not a geometric region they can explore freely; it is the set of rational-response pairs, one per value of x.

Why the follower's optimality is a constraint, not just an outcome. In a standard two-variable optimisation problem, both variables are free. In bilevel, y is not free — it is locked to the follower's best response to whatever x the leader chooses. The follower's optimality condition replaces the usual geometric constraint "y ∈ feasible set" with a functional constraint "y = argmin f(x, y)". That replacement is what makes bilevel structurally different from any single-level problem.

Core idea

In the notation below, x is the leader's decision vector, y is the follower's decision vector, F is the leader's objective function, f is the follower's objective function, and C is the set of joint feasibility constraints that both must satisfy.

min x F(x, y) \leftarrow leader minimises their own cost subject to: y \in argmin y' f(x, y') \leftarrow follower optimises given x (x, y) \in C \leftarrow joint feasibility

Symbol-to-concept map:
x — the leader's committed decision (tolls, defender allocations, wholesale prices)
y — the follower's response (route choices, attack plan, stocking quantities)
F(x, y) — what the leader is trying to minimise or maximise (revenue, network flow, profit)
f(x, y) — what the follower is trying to minimise (travel cost, rerouting cost, margin)
argmin f — the set of follower decisions that are optimal for the follower given x
C — hard constraints that both leader and follower must satisfy jointly (capacity, budget)

In the vocabulary above: the inner constraint "y ∈ argmin f(x, y')" is not a modelling choice — it is the mathematical statement that the follower will not be coerced into a suboptimal decision. The leader must find an x such that, when the follower responds rationally, the outcome F(x, y*(x)) is as good as possible for the leader.

Why bilevel is harder than single-level. The argmin constraint is not convex even when F, f, and C are all convex. The feasible set for the joint problem (x, y) — the set of pairs where y is truly the follower's best response — can be disconnected and non-convex. This means standard convex optimisation guarantees do not carry over. Bilevel problems are in general NP-hard even in the linear case.

Leader–follower structure: the leader commits x first; the follower observes x and returns y*(x); the leader's objective depends on both.

Concrete example

A road authority controls two highway routes. Their objective is to maximise toll revenue. Commuters choose whichever route minimises their own travel cost, which includes the toll and their private driving time. The authority must commit to toll levels before observing which routes commuters take.

Step by step

The setup. Route A has low driving time (30 minutes) but the authority can set a toll. Route B is the free surface street with high driving time (50 minutes) and no toll. Each commuter will choose whichever minimises (driving time + toll cost).

What goes wrong without bilevel structure. The authority sets toll = £8, expecting all commuters to pay it. But at £8, Route A costs commuters (30 min equivalent + £8) while Route B costs (50 min equivalent + £0). If commuters value time at £0.30/min, Route B costs only £15, cheaper than £8 + £9 = £17. All commuters divert. Toll revenue: £0.

What the bilevel model produces. The authority models the commuter routing decision as a nested optimisation problem: for any toll t it sets, each commuter minimises their own cost. The authority's outer problem searches over toll values that, once commuters respond optimally, maximise revenue. The answer: set toll = £5.99 (just below the indifference threshold). At this toll, commuters are indifferent between routes; the authority can attract enough traffic to generate positive revenue. This is the bilevel-optimal solution — the best the authority can achieve knowing the follower will respond rationally.

The symbols, for the record

x = t — the toll (leader's decision variable, one-dimensional in this example)
y = route(i) — each commuter i's route choice (follower's decision)
F(x, y) = t × (number of commuters on Route A) — the authority's revenue
f(x, y_i) = driving time on chosen route + toll if Route A — commuter i's cost
The nested constraint: each y_i minimises f(x, y_i) given the toll x.

Common misreads

MISREAD 1: "This is just two-stage stochastic programming"

The most common confusion is with two-stage stochastic programming, which also has an inner problem. In two-stage programming, a single decision-maker splits their decisions across time: a first-stage decision before uncertainty is revealed, and a second-stage recourse decision after. The inner problem represents adapting to a scenario drawn from a probability distribution — the "follower" is not an agent, it's a passively revealed outcome.

In bilevel, the follower is a rational optimiser with their own objective function. A commuter choosing the cheapest route is not a scenario — they will actively change their behaviour in response to the leader's toll level. The bilevel constraint "y ∈ argmin f(x, y)" asserts that the follower is optimising, not that a random draw has occurred. If you replace the follower with a probability distribution over outcomes, you have left bilevel and entered stochastic programming.

MISREAD 2: "This is min-max robust optimisation"

The second misread confuses bilevel with min-max robust optimisation. In min-max, a single decision-maker chooses an action to perform as well as possible against the worst-case scenario. The inner "max" problem finds the adversarial scenario — but that adversary is assumed to do whatever is worst for the decision-maker.

In bilevel, the follower minimises their own objective, which is not necessarily the leader's worst case. A commuter minimising travel cost may or may not produce the outcome worst for the toll authority, depending on the network structure. Bilevel does not assume adversarial intent; it assumes rational self-interest. The two formulations coincide only in zero-sum games — games where the follower's gain is exactly the leader's loss. In the general case, treating a rational follower as a worst-case adversary will produce overly conservative (and often infeasible) leader decisions.

MISREAD 3: "The leader just optimises over the follower's constraints"

A subtler misread: the leader doesn't get to choose y directly, even if they know the follower's constraints. The leader's feasible set only contains (x, y) pairs where y is the follower's optimal response to x — not just any feasible response. If the follower has multiple optimal responses (ties), the problem statement must specify which one the leader assumes (usually either optimistic — the follower picks the tie-break that helps the leader — or pessimistic — the worst tie-break for the leader). This tie-breaking assumption matters and must be stated explicitly.

Where it shows up

The diagnostic question: does the inner decision-maker have their own objective function, or are they a passive scenario drawn from a distribution? If the former, the problem is bilevel.

Toll & Tariff Design

A regulator sets prices; users optimise their own consumption against those prices. Classic formulation: Stackelberg toll problem on a road network.

Network Interdiction

A defender removes links from a supply or communication network; an attacker (or flow) re-routes optimally around the damage. The attacker's rerouting is the inner problem.

Algorithm Configuration

As in the E-CVRP paper in today's issue: the outer problem sets solver parameters; the inner problem is the solver running to completion on an instance with those parameters.

Supply Chain Contracting

A manufacturer sets wholesale prices; retailers maximise their own margins by choosing stocking quantities. The manufacturer must anticipate retail response to each pricing decision.

Tax & Subsidy Design

A government sets tax rates or subsidies; firms and households adjust investment and consumption decisions optimally. The government's revenue or welfare objective depends on the private-sector response.

Adversarial Machine Learning

A model trainer (leader) designs a classifier; an adversary (follower) generates maximally misleading perturbations. The trainer must account for the adversary's optimal attack strategy during training.