Modelling Concepts · First introduced 14 Apr 2026

Predict-then-Optimize

A forecast trained to be accurate and a forecast trained to make good decisions are not the same forecast. — Elmachtoub and Grigas, Smart "Predict, then Optimize" (2022)

A two-phase workflow where an ML model forecasts uncertain parameters that are passed as fixed inputs to an optimisation solver. Training the model with standard accuracy loss instead of decision-error loss produces parameters that can make the solver's solution infeasible or suboptimal at execution.

Core idea

The predict-then-optimize workflow is a natural way to combine machine learning and optimisation: use ML to forecast uncertain parameters (e.g., demand, travel times, failure rates), then pass those forecasts to an optimisation solver to find the best decisions given those predictions. The critical pitfall is that standard ML training (minimising mean squared error or cross-entropy) does not minimise the downstream decision error. A forecast that is accurate in aggregate (small MSE) but biased in specific regions of the parameter space can make the optimisation solver's solution infeasible or severely suboptimal, with no warning from the ML accuracy metrics.


Concrete example
Scenario

An e-commerce company uses an ML model to forecast demand per product. They train it with MSE loss, achieving 5% MAPE (mean absolute percentage error). They pass the forecast to a network design optimisation problem: "Allocate inventory across warehouses to minimise cost + stockout penalty." The optimisation solver treats the forecast as ground truth. At execution, it turns out the forecast was biased towards overestimating demand for certain high-margin products in certain regions. The solver allocates inventory accordingly, but real demand is lower, resulting in excess inventory and write-offs. The 5% MAPE metric did not catch this bias because it was distributed across products and regions, averaging out in aggregate.


One-line version

Training an ML model for accuracy and training it for decision quality are not the same optimisation. A model with lower MSE can produce worse decisions than one with higher MSE but better feasibility structure — accuracy metrics give no signal about this.


Related concepts