Key
✓ positive GPU gave clear speedup
~ mixed partial gains or caveats
not GPU relevant baseline, not GPU-native
Showing 6 papers
SAT Solving
GaloisSAT: Differentiable Boolean Satisfiability Solving via Finite Field Algebra
arXiv →Hybrid GPU-CPU CDCL solver from NVIDIA, U. Utah, and CMU. A differentiable solving engine using finite field algebra runs on GPU to prune the search space — handling variable selection and backbone identification — while a traditional CDCL solver closes the proof on CPU. Tested on SAT Competition 2024 benchmarks against Kissat and CaDiCaL: 8.41× speedup on satisfiable instances, 1.29× on unsatisfiable. The asymmetry is telling — GPU helps most where heuristic guidance matters, less so where proof complexity dominates. First production-grade GPU-CDCL hybrid with benchmark results at this scale.
GPUSAT2: An Improved GPU-Based #SAT Model Counter
Springer →Model counting (#SAT, not decision SAT) using dynamic programming over tree decompositions, mapped onto GPU via OpenCL. The decomposition structure is why GPU works here: subproblems at the same tree level are independent and can be evaluated in parallel. The positive result is earned by problem structure — not by accelerating CDCL. Significant speedup over CPU for instances with bounded treewidth.
Constraint Programming
Exploring the Use of GPUs in Constraint Solving
Springer →Assesses the feasibility of parallelising constraint propagation and arc consistency on GPU. Implements a hybrid CPU-GPU solver and shows competitive performance against sequential baselines. Honest about where GPU helps (bulk constraint checking) and where it doesn't (propagation ordering, sequential fixpoint detection). One of the more careful early papers in this space. The Fioretto/Pontelli group continued this line of work.
Local Search
Adaptive Large Neighborhood Search on the GPU
EJOR →First published GPU parallelisation of ALNS. The destroy-and-repair structure is a natural GPU fit: thousands of candidate moves can be evaluated simultaneously, with no sequential dependencies in the evaluation phase. Strong results on vehicle routing benchmarks. One of the cleaner examples of GPU wins in combinatorial search because the algorithm's inner loop is already batch-friendly.
cuOpt: GPU-Accelerated Route Optimization
NVIDIA →NVIDIA's production GPU solver for VRP-class problems using GPU-accelerated LNS. Holds 23+ world records on routing benchmarks, 10x–5000x speedups over CPU depending on instance. Recently open-sourced on GitHub. Worth tracking as the state of the art in what GPU combinatorial search looks like when optimised for real hardware rather than ported from CPU. Full docs at developer.nvidia.com/cuopt.
Branch & Bound
A GPU-Based Backtracking Algorithm for Permutation Combinatorial Problems
Springer →GPU branch and bound for permutation problems. B&B has an irregular tree shape, so bounding is data-parallel but branching is not. The approach uses a work pool: many subproblems are evaluated (bounded) on GPU simultaneously, branching decisions made on CPU. The speedup comes from bounding, not from search itself. A reasonable hybrid pattern and the one most likely to generalise to other constraint-based B&B.