Decision Optimisation Radar

DAILY INTELLIGENCE BRIEF

Sunday, 05 April 2026 • Daily Edition

🏭 Industry & Tool Updates

TOOL UPDATE #1

Kinaxis Maestro Agent Studio: No-Code Composable AI Agents for Supply Chain

April 2026 | Source: Kinaxis Press Release

Kinaxis launched Maestro Agent Studio, giving supply chain teams a no-code interface to compose AI agents grounded in their real operating context — using the same data, workflows, and tools planners already rely on. The studio works with leading LLMs including OpenAI's GPT and Google Gemini, while keeping agent behaviour anchored in Maestro's trusted data, intelligence, and governance frameworks.

Why it matters for enterprises: Low-code agent orchestration entering supply chain signals that the agentic planning era is moving beyond pilots into mainstream enterprise adoption. The governance-first design addresses the trust barrier that has slowed AI deployment in high-stakes planning environments.

→ Kinaxis Press Release

TOOL UPDATE #2

Google DeepMind Gemma 4: Open Reasoning Model with Native Agentic Capabilities

April 2, 2026 | Source: Google DeepMind Blog

Google DeepMind released Gemma 4 — four model variants (2.3B–31B parameters) under Apache 2.0 licence. AIME 2026 math performance jumped from 20.8% to 89.2%, and LiveCodeBench from 29.1% to 80.0%. Native function-calling, structured JSON output, and system instruction support make it directly usable in agentic planning and LLM+OR pipelines without additional scaffolding. Deployable on-premise on a single H100.

Why it matters for enterprises: Apache 2.0 removes commercial barriers for deploying capable reasoning models in OR/planning systems. A 7B-class model solving mathematical reasoning at near-frontier level dramatically lowers the cost of LLM+OR hybrid architectures and enables on-premise deployment for sensitive enterprise planning workloads.

→ Google DeepMind Blog → Google Blog

INDUSTRY UPDATE #3

Microsoft 2026 Wave 1: Agentic AI Reaches General Availability in Dynamics 365

April 2026 | Source: Windows News AI

Microsoft's 2026 Wave 1 reached general availability, embedding agentic AI across Dynamics 365 and Power Platform. Features include supervised learning (human-validated AI decisions) and reinforcement learning (outcome-optimised autonomous workflows), enabling agentic systems that pursue defined goals across multi-step processes without constant human direction.

Why it matters for enterprises: Embedding RL-driven decision agents into the world's most widely-deployed ERP suite signals that prescriptive analytics is moving from specialist OR teams into everyday business workflows. This is the mass-market tipping point for decision automation.

→ Microsoft Wave 1 Release

INDUSTRY UPDATE #4

NVIDIA + Global Industrial Software Giants: AI-Era Manufacturing Intelligence

April 2026 | Source: NVIDIA Investor Relations

NVIDIA announced partnerships with FANUC, HD Hyundai, Honda, JLR, KION, Mercedes-Benz, MediaTek, PepsiCo, Samsung, SK hynix, and TSMC to deploy CUDA-X and GPU-accelerated industrial software across design, engineering, and manufacturing. The collaboration targets real-time optimisation of complex industrial workflows using GPU-native computation.

Why it matters for enterprises: GPU acceleration moving into industrial decision loops — not just model training — signals that the cuOpt/cuSolver stack is becoming infrastructure-layer for operational AI. Enterprises that adopt GPU-native optimisation now will have structural speed advantages in planning cycles.

→ NVIDIA Press Release

🔬 Research Papers

RESEARCH PAPER #1

iScheduler: Reinforcement Learning–Driven Continual Optimisation for Large-Scale Resource Investment

arXiv: 2602.06064 | cs.AI / math.OC

iScheduler formulates the Resource Investment Problem as a Markov decision process over decomposed subproblems, bypassing exact MIP/CP which becomes intractable on large instances. The paper also releases L-RIPLIB, an industrial-scale benchmark with 1,000 instances of 2,500–10,000 tasks derived from real cloud-platform workloads — filling a critical gap in large-scale scheduling benchmarks.

What problem it solves: Exact OR approaches (MIP/CP) fail to scale to industrial-size resource investment scheduling. MDP decomposition + RL handles instances where exact methods cannot complete in reasonable time.

Why it matters: Combining MDP decomposition with RL on a new industrial benchmark marks a practical advance in agentic planning for large-scale scheduling. L-RIPLIB will likely become a standard benchmark for the field.

→ View Paper on arXiv

RESEARCH PAPER #2

LLMs as End-to-End Combinatorial Optimisation Solvers

arXiv: 2509.16865 | NeurIPS 2025

A two-stage training strategy — supervised fine-tuning from domain-specific solver data, followed by Feasibility-and-Optimality-Aware RL (FOARL) — enables a 7B LLM with lightweight LoRA modules to outperform DeepSeek-R1 and GPT-o1 on 7 combinatorial optimisation problems including VRP, TSP, and scheduling variants.

What problem it solves: Shows that small fine-tuned LLMs can rival frontier reasoning models on CO problems, pointing to cost-effective, deployable LLM+OR hybrids without frontier model licensing costs.

Why it matters: The FOARL objective — which explicitly penalises constraint violations — is a practical template for production use cases where feasibility is non-negotiable.

→ View Paper on arXiv

RESEARCH PAPER #3

ThinkTwice: Improving LLM Reasoning and Refinement via Online Policy Optimisation

arXiv cs.AI | April 2026

ThinkTwice substantially improves both reasoning and self-refinement performance over competitive online policy optimisation baselines across five mathematical reasoning benchmarks, using models including Qwen3-4B and Olmo3-7B. The two-phase RL objective — separating initial reasoning from refinement — unlocks better generalisation without increasing model scale.

What problem it solves: LLMs that can reason and then self-refine are more useful in OR contexts where an initial formulation may be infeasible or suboptimal. ThinkTwice trains this two-phase capability explicitly.

Why it matters: Better mathematical reasoning in open models directly improves capacity for LLM-driven problem formulation and solution search in optimisation contexts. Self-refinement is particularly valuable for OR agents that need to iterate on infeasible or suboptimal solutions.

→ arXiv cs.AI

📅 Conference Radar

Conference	Key Date	Location	Relevance
CPAIOR 2026	May 26–29, 2026	Rabat, Morocco	Premier CP+AI+OR integration conference; 23rd edition
INFORMS Analytics+ 2026	April 2026	(ongoing)	Applied OR and analytics; Gurobi platinum sponsor
Gurobi Summit EMEA 2026	October 2026	Prague, Czech Republic	MIP/solver practitioner community gathering

Daily Synthesis

The infrastructure layer for decision intelligence is maturing from research prototype to production-grade toolchain — and the pace is accelerating across every layer of the stack simultaneously.

	Open reasoning models have crossed the threshold. Gemma 4's jump from 20.8% to 89.2% on AIME 2026 math — Apache 2.0, deployable on a single H100 — eliminates the last licensing and infrastructure argument against embedding capable reasoning models in enterprise OR pipelines.
	Agentic orchestration is entering the enterprise mainstream. Kinaxis Maestro Agent Studio and Microsoft Dynamics 365 Wave 1 both ship no-code agentic workflows to non-specialist users — the audience for decision automation is no longer just OR teams.
	The heuristic/learned optimisation boundary is dissolving. iScheduler's MDP decomposition + RL for large-scale scheduling and the LLM-as-CO-solver paradigm both demonstrate that "use exact solver or learn a heuristic" is no longer a binary choice — hybrid architectures outperform either alone.
	GPU acceleration is becoming operational infrastructure. NVIDIA's industrial partnerships with FANUC, Honda, TSMC, and PepsiCo signal that GPU-native computation is moving from the data centre into the decision loop — real-time optimisation of physical operations at scale.
	CPAIOR 2026 arrives in seven weeks. Rabat in late May is the first major proving ground where the research signals in today's radar will be stress-tested against practitioner reality. Papers on LLM+OR, GPU-accelerated CP, and agentic planning will define the field's trajectory for the next 12 months.

For practitioners: The convergence of open reasoning models, no-code agentic orchestration, and GPU-accelerated exact solvers means that decision automation at scale is shifting from a moonshot to a near-term engineering problem. Organisations that invest now in connecting their operational data to these toolchains — rather than waiting for the stack to "mature further" — will capture asymmetric first-mover advantages as the pieces lock together.

Generated by Decision Optimisation Radar, automated daily scan | 05 April 2026