Why it's needed
Every standard optimisation model answers the same question: given a set of constraints, what's the best decision you can make? That question carries a quiet assumption — the constraints are fixed, external, indifferent to your choice. But what if one of your constraints is itself a rational agent with their own objective?
A regulator sets prices. A market player responds by adjusting their own behaviour to maximise their own profit. A network operator chooses which links to defend. An attacker re-routes around the defended ones. In every case the second agent is not a scenario drawn from a distribution — they are an optimiser reacting to the first agent's decision. Standard single-level optimisation cannot model this. You need a framework where one optimisation problem is nested inside another.
- Toll design: a road authority sets prices; commuters choose routes to minimise their own travel cost (including the toll).
- Algorithm configuration: an outer loop sets solver parameters; the inner problem is the solver running to completion on an instance with those parameters.
- Supply chain contracting: a manufacturer sets wholesale prices; retailers maximise their own margins by choosing stocking quantities.
- Network interdiction: a defender removes links; an attacker re-routes optimally around the gaps.
The hard problem: the leader cannot enumerate all possible follower responses and pick the best one — the follower's optimal response depends on what the leader does, and the leader's feasible set depends on what the follower would do. The two optimisation problems are coupled.
The two naïve approaches both fail:
Neither approach captures the real structure: the leader and follower each optimise their own objective, the follower reacts after observing the leader's choice, and the leader must commit knowing in advance how the follower will react.
What it does
Bilevel optimisation models the interaction as a two-stage game where the leader moves first and the follower responds rationally. The framework has four pieces:
- The leader's decision (x) — the variable the leader controls and commits to first: toll levels, defender allocations, wholesale prices, solver parameters.
- The follower's decision (y) — the variable the follower chooses after observing x: route choices, attack plans, stocking quantities, solver behaviour.
- The follower's optimisation problem — given x, the follower solves their own problem and produces their best response y*(x). This is the nested constraint.
- The leader's optimisation problem — the leader maximises (or minimises) their own objective F(x, y*(x)), but can only choose among values of x for which the follower's response y*(x) actually exists and is optimal for the follower.
The leader does not pick any (x, y) pair they like — they pick x and then the follower pins down y. The leader's feasible set is therefore not a geometric region they can explore freely; it is the set of rational-response pairs, one per value of x.
Core idea
In the notation below, x is the leader's decision vector, y is the follower's decision vector, F is the leader's objective function, f is the follower's objective function, and C is the set of joint feasibility constraints that both must satisfy.
← leader minimises their own cost
subject to: y ∈ argminy' f(x, y') ← follower optimises given x
(x, y) ∈ C ← joint feasibility
x — the leader's committed decision (tolls, defender allocations, wholesale prices)y — the follower's response (route choices, attack plan, stocking quantities)F(x, y) — what the leader is trying to minimise or maximise (revenue, network flow, profit)f(x, y) — what the follower is trying to minimise (travel cost, rerouting cost, margin)argmin f — the set of follower decisions that are optimal for the follower given xC — hard constraints that both leader and follower must satisfy jointly (capacity, budget)
In the vocabulary above: the inner constraint "y ∈ argmin f(x, y')" is not a modelling choice — it is the mathematical statement that the follower will not be coerced into a suboptimal decision. The leader must find an x such that, when the follower responds rationally, the outcome F(x, y*(x)) is as good as possible for the leader.
Concrete example
A road authority controls two highway routes. Their objective is to maximise toll revenue. Commuters choose whichever route minimises their own travel cost, which includes the toll and their private driving time. The authority must commit to toll levels before observing which routes commuters take.
Step by step
The setup. Route A has low driving time (30 minutes) but the authority can set a toll. Route B is the free surface street with high driving time (50 minutes) and no toll. Each commuter will choose whichever minimises (driving time + toll cost).
What goes wrong without bilevel structure. The authority sets toll = £8, expecting all commuters to pay it. But at £8, Route A costs commuters (30 min equivalent + £8) while Route B costs (50 min equivalent + £0). If commuters value time at £0.30/min, Route B costs only £15, cheaper than £8 + £9 = £17. All commuters divert. Toll revenue: £0.
What the bilevel model produces. The authority models the commuter routing decision as a nested optimisation problem: for any toll t it sets, each commuter minimises their own cost. The authority's outer problem searches over toll values that, once commuters respond optimally, maximise revenue. The answer: set toll = £5.99 (just below the indifference threshold). At this toll, commuters are indifferent between routes; the authority can attract enough traffic to generate positive revenue. This is the bilevel-optimal solution — the best the authority can achieve knowing the follower will respond rationally.
The symbols, for the record
x = t — the toll (leader's decision variable, one-dimensional in this example)y = route(i) — each commuter i's route choice (follower's decision)F(x, y) = t × (number of commuters on Route A) — the authority's revenuef(x, yi) = driving time on chosen route + toll if Route A — commuter i's costThe nested constraint: each yi minimises f(x, yi) given the toll x.
Common misreads
The most common confusion is with two-stage stochastic programming, which also has an inner problem. In two-stage programming, a single decision-maker splits their decisions across time: a first-stage decision before uncertainty is revealed, and a second-stage recourse decision after. The inner problem represents adapting to a scenario drawn from a probability distribution — the "follower" is not an agent, it's a passively revealed outcome.
In bilevel, the follower is a rational optimiser with their own objective function. A commuter choosing the cheapest route is not a scenario — they will actively change their behaviour in response to the leader's toll level. The bilevel constraint "y ∈ argmin f(x, y)" asserts that the follower is optimising, not that a random draw has occurred. If you replace the follower with a probability distribution over outcomes, you have left bilevel and entered stochastic programming.
The second misread confuses bilevel with min-max robust optimisation. In min-max, a single decision-maker chooses an action to perform as well as possible against the worst-case scenario. The inner "max" problem finds the adversarial scenario — but that adversary is assumed to do whatever is worst for the decision-maker.
In bilevel, the follower minimises their own objective, which is not necessarily the leader's worst case. A commuter minimising travel cost may or may not produce the outcome worst for the toll authority, depending on the network structure. Bilevel does not assume adversarial intent; it assumes rational self-interest. The two formulations coincide only in zero-sum games — games where the follower's gain is exactly the leader's loss. In the general case, treating a rational follower as a worst-case adversary will produce overly conservative (and often infeasible) leader decisions.
A subtler misread: the leader doesn't get to choose y directly, even if they know the follower's constraints. The leader's feasible set only contains (x, y) pairs where y is the follower's optimal response to x — not just any feasible response. If the follower has multiple optimal responses (ties), the problem statement must specify which one the leader assumes (usually either optimistic — the follower picks the tie-break that helps the leader — or pessimistic — the worst tie-break for the leader). This tie-breaking assumption matters and must be stated explicitly.
Where it shows up
The diagnostic question: does the inner decision-maker have their own objective function, or are they a passive scenario drawn from a distribution? If the former, the problem is bilevel.