Every industrial optimisation problem has the same form: find the cheapest (or fastest, or best) way to do something, as long as certain rules hold. The second half of that sentence is where things go wrong.
- Keep the budget under $10M.
- Don't exceed 5,000 gallons per hour.
- Total production must equal demand exactly.
The “as long as” part is where things get hard. Turn a solver loose on the problem, and here's what you get:
What you actually want is both: the cheapest answer and every rule respected, at the same time.
On a realistic industrial problem, that combination is genuinely hard. Two classical tricks have been tried. Each has a fatal flaw.
Each trick, alone, breaks for its own reason. The augmented Lagrangian is what you get when you stop choosing between them.
The augmented Lagrangian carries both tricks at once inside a loop, but gives them different jobs.
- The price does the real work of enforcing the rule. It changes every round until it lands on the right value.
- The fine stays small and fixed. Its only job is to stop the price from swinging wildly between rounds.
Each round:
- Solve the problem with the current price and the small fixed fine, both sitting in the objective.
- Measure how much the rule was broken in that solution.
- Nudge the price by an amount proportional to the violation: up if the rule was broken, down if it was obeyed with room to spare. The fine stays put.
- Solve again.
Round by round, the price homes in on its true value. Once it stops changing, the rule holds exactly. You never guessed the price in advance, and you never sent the fine to infinity. The iteration found both.
For the math-curious, here's the same idea in formal notation. Each symbol maps directly to the plain-English version above.
Take a constrained problem in standard form: minimise f(x) subject to h(x) = 0. Read f(x) as the cost you want to minimise, and h(x) as the rule-violation vector (zero exactly when the rule holds).
The ordinary Lagrangian attaches a price λ (the "multiplier") to the violation:
L(x, λ) = f(x) + λT h(x)
The second term is literally "price × violation", the per-unit charge from the price trick. If λ is set to the right shadow price, minimising this surrogate gives the correct answer with no constraint in sight.
The augmented Lagrangian adds a quadratic fine with weight ρ:
Lρ(x, λ) = f(x) + λT h(x) + (ρ/2) • ||h(x)||2
That third term is the fine: a cost proportional to the squared violation. ρ is kept small; on its own it cannot enforce the rule, but it damps how far each round's solution can jump from the last.
The algorithm alternates two steps that mirror the loop above:
- Solve: minimise Lρ over x with λ held fixed.
- Update the price: λ := λ + ρ • h(x). The nudge is exactly ρ times the violation just observed.
For inequality constraints g(x) ≤ 0, the quadratic term becomes a one-sided penalty (max(0, g(x) + λ/ρ))2. The structure is unchanged: outer loop updates multipliers, inner loop minimises the surrogate.
Hestenes and Powell showed in 1969 that ρ does not need to diverge to enforce feasibility. A finite ρ, combined with the multiplier update, drives h(x) to zero at convergence. The multiplier λ, not the penalty ρ, is what makes the constraint bind at the solution. In the vocabulary above: the price does the enforcing. The fine only stabilises.
The setup. Plant A and Plant B each plan their own weekly production schedule. Both buy the same raw material from one supplier, who sells no more than 10,000 kg per week in total across the two plants. Neither plant will share its full plan with the other. So a neutral coordinator sits in the middle: it doesn't schedule anything, it just sets one shared price for the raw material each week. Each plant uses that price to decide how much to buy.
Try price only (plain Lagrangian relaxation). The coordinator quotes a price. Each plant runs its own schedule treating the material as that-priced. The coordinator sees the combined purchase, notes any cap violation, and adjusts the price for the next round. The catch: scheduling is full of yes/no decisions (run this batch, skip that line). When the price crosses a threshold, a plant's purchase flips discretely. Combined purchase jumps between "way over" the cap and "way under". The price bounces with it. No convergence.
Now add the fine (augmented Lagrangian). Each plant's objective carries both the price and a small quadratic fine on how far the combined purchase strayed from the cap last round. The fine is too small to enforce the cap on its own; that's still the price's job. Its role is to punish big swings between rounds. The oscillation damps. The price settles. The cap holds. The coordinator updates the price after each round, by an amount proportional to the violation observed.
The symbols, for the record. The price is λ. The fine weight is ρ, kept small. Each plant's objective is (its own cost) + λ • (its share of combined purchase) + (ρ/2) • (combined purchase − cap)+2. The coordinator updates λ := λ + ρ • (observed violation) each round.
Lagrangian relaxation uses only the price, not the fine. It gives a lower bound on the best possible answer and a family of candidate plans, but on problems with yes/no decisions the price oscillates and nothing settles. Augmented Lagrangian needs both halves working together: drop the fine and you get Lagrangian relaxation's oscillations back; drop the price update and you get a pure penalty method's numerical problems. Neither half alone carries the convergence guarantee.
A pure penalty method uses only the fine, not the price. To force a rule to hold, the fine must be astronomical, and at astronomical sizes the solver becomes numerically unstable: tiny rounding errors get amplified into huge penalty swings. The augmented Lagrangian gets away with a small fine because the price, not the fine, is what makes the rule hold. Cranking ρ up higher and higher defeats the whole point of the method.
On a non-convex problem (anything with integer variables qualifies), the method finds a locally best answer, not necessarily the globally best one. Different starting guesses can land in different local solutions. The standard fix is to run the loop from several starting points and keep the best answer found, or to layer a global-search method on top. Formally, the method converges to a Karush-Kuhn-Tucker (KKT) stationary point. The CALADIN and Mix-CALADIN families inherit this caveat and are typically deployed with multi-start strategies.
The rule λ := λ + ρ • (violation observed) looks simple, but it is not a free parameter you can adjust by feel. Every part of the formula has a mathematical derivation behind it (it's a steepest-ascent step on the dual of the augmented problem). Replacing it with a plausible-sounding heuristic, for example "multiply λ by 1.1 whenever the violation grows", breaks the math, and the method can stop converging. Legitimate accelerations like Nesterov momentum on λ exist, but they must be derived from the underlying theory, not guessed.
A finite penalty is enough, as long as it rides on top of the right multiplier. The multiplier does the enforcing; the penalty only stabilises.