The predict-then-optimize workflow is a natural way to combine machine learning and optimisation: use ML to forecast uncertain parameters (e.g., demand, travel times, failure rates), then pass those forecasts to an optimisation solver to find the best decisions given those predictions. The critical pitfall is that standard ML training (minimising mean squared error or cross-entropy) does not minimise the downstream decision error. A forecast that is accurate in aggregate (small MSE) but biased in specific regions of the parameter space can make the optimisation solver's solution infeasible or severely suboptimal, with no warning from the ML accuracy metrics.
An e-commerce company uses an ML model to forecast demand per product. They train it with MSE loss, achieving 5% MAPE (mean absolute percentage error). They pass the forecast to a network design optimisation problem: "Allocate inventory across warehouses to minimise cost + stockout penalty." The optimisation solver treats the forecast as ground truth. At execution, it turns out the forecast was biased towards overestimating demand for certain high-margin products in certain regions. The solver allocates inventory accordingly, but real demand is lower, resulting in excess inventory and write-offs. The 5% MAPE metric did not catch this bias because it was distributed across products and regions, averaging out in aggregate.
Training an ML model for accuracy and training it for decision quality are not the same optimisation. A model with lower MSE can produce worse decisions than one with higher MSE but better feasibility structure — accuracy metrics give no signal about this.