Column Generation (CG) is an algorithmic technique for solving large-scale linear programs (LPs) and mixed integer programs (MIPs) that would be computationally intractable if all variables (columns) were included from the start. The core idea: instead of solving the full problem with all possible variables, CG works with a small subset of variables (the Restricted Master Problem, or RMP) and iteratively asks a pricing subproblem whether any excluded variable could improve the current solution. If the pricing subproblem identifies such a variable, that column is added to the RMP and the process repeats. Optimality is proven when the pricing subproblem confirms that no improving column exists, at which point the RMP solution is also optimal for the full problem.
Column Generation is the workhorse behind the largest and most practically significant OR deployments in existence. Airline crew scheduling (assigning crew members to all flights in a network) involves millions of possible crew pairings; CG prices only the pairings with potential value. Hospital nurse rostering with full-week schedule patterns, cutting stock problems for paper mills and steel plants, vehicle routing with structured routes, and portfolio optimisation with factor models all rely on CG for tractability at scale.
The problem: Assign crew members to all flights in a network over a month so that each flight is covered, each crew member works legal schedules, and total crew costs are minimised. The number of possible crew pairings (valid combinations of flights one crew can work together) is often in the millions.
Naive approach (fails): Enumerate all possible pairings, set up variables for each, and solve the MIP. The model would have millions of variables before branching even starts.
Column Generation approach: Start with a small set of "obvious" crew pairings (e.g., simple round-trip flights). Solve the RMP to assign crews to these pairings. Ask the pricing subproblem: "Is there a crew pairing not currently in the model that would reduce total cost if added?" The subproblem often takes the form of a shortest-path problem over the flight network. If one is found, add it to the RMP and re-solve. Repeat until no improving pairing exists.
Result: The algorithm generates only the pairings that matter, solving a problem with millions of implicit variables by working with dozens or hundreds of explicit ones.
Confusion 1: CG versus Benders Decomposition. Benders Decomposition splits a problem into a master problem (over integer variables) and a subproblem (over continuous variables), adding Benders cuts (rows) to the master problem from the subproblem's dual. Column Generation splits a problem into a master problem (over a subset of variables) and a pricing subproblem (that finds new variables), adding columns to the master problem. Both decompose; the direction differs. Benders adds rows; Column Generation adds columns. Confusing the two leads to misapplication: Benders works best when the master problem is over integers and the subproblem is a continuous LP; CG works best when the number of variables is large but the pricing problem is efficiently solvable.
Confusion 2: The pricing subproblem is a detail, not an afterthought. The most common practitioner error is treating the pricing subproblem as a mechanical component. In practice, the pricing subproblem is where the domain knowledge lives: it encodes the combinatorial structure of what constitutes a valid column (a legal crew pairing, a feasible route, a valid cutting pattern). Designing an efficient pricing subproblem often requires deeper OR expertise than formulating the master problem.
Confusion 3: CG is only for "exponentially many variables." Practitioners assume CG applies only when the number of variables is inherently astronomical (all possible routes, all possible schedules). In fact, CG applies whenever the LP relaxation has a tractable pricing subproblem, even for problems with a moderate number of variables, because starting with a small feasible basis and pricing incrementally can be computationally superior to solving the full LP from scratch.
When CG is applied within a branch-and-bound tree to solve the integer version of the problem, the technique is called Branch-and-Price. At each node of the branch-and-bound tree, CG is used to solve the LP relaxation, and branching occurs on fractional integer variables. Branch-and-Price is the state-of-the-art method for the largest crew scheduling, vehicle routing, and cutting stock problems solved in production today.
Column generation terminates when the pricing subproblem finds no improving column. If the pricing problem is itself intractable, column generation does not help — the bottleneck shifts from the master to the subproblem.