๐ก Industry Signals
What's happening?
Five of Six US Military Branches Now on GenAI.mil as April 30 Transition Deadline Approaches
The Department of the Navy (DoN) has issued a directive requiring all commands and organizations to complete their transition to GenAI.mil, the US military's shared enterprise AI platform, by April 30. Five of the six main military branches have now designated the system as their primary AI environment. GenAI.mil handles Controlled Unclassified Information (CUI) at Impact Level 5 (IL-5), the highest classification tier for unclassified workloads.
Why it mattersWhen the world's largest employer mandates a single AI environment with a hard deadline, it signals that the era of fragmented AI tool adoption across large organizations is ending. Defense contractors and GovTech vendors that are not integrated with GenAI.mil before April 30 will lose access to the market overnight. For enterprise AI buyers outside government, this is the clearest preview yet of what mandated AI platform consolidation looks like in practice.
Read source โSalesforce Agentforce Hits 6,000 Enterprise Customers, a 48% Quarter-Over-Quarter Rise
Salesforce reported that Agentforce now has 6,000 enterprise customers, up 48% in a single quarter. The figure is widely read as confirmation that agentic AI โ systems designed to take multi-step autonomous actions rather than simply responding to single prompts โ has moved from pilots into production at scale. Perplexity also entered the enterprise agentic market this week, launching an AI agent called Computer at its Ask 2026 developer conference, positioned as a direct competitor to Microsoft Copilot and Agentforce.
Why it mattersA 48% quarter-over-quarter growth rate at 6,000 enterprise accounts is a platform-shift signal, not a feature-adoption signal. Enterprise procurement teams are now allocating dedicated budget for agentic workflows โ not just AI assistants. For organizations that have not yet evaluated an agentic platform, that window is narrowing: a second serious competitor entering the market means pricing pressure will ease but category lock-in will accelerate. Every one of those 6,000 deployments also needs a harnessThe complete infrastructure governing a production AI agent: tool authorisation, guardrails, state management, rollback, and observability โ first explained April 10, 2026. before it is production-ready.
Read source โ๐ง Models & Tools
What's new?
Muse Spark: Natively Multimodal Reasoning Model Built on MSL's New Architecture Stack
Unlike previous Meta models, Muse Spark was built from scratch on MSL's new infrastructure rather than iterated from the Llama line. The model processes text, images, and voice in a unified input stream. Benchmark results place it as competitive with leading models from OpenAI, Anthropic, and Google across many task categories, though it does not consistently surpass them across all domains. Meta's distribution reach, approximately three billion users, means even a broadly competitive model carries significant strategic weight in both enterprise and consumer AI markets.
Read source โ
Zhipu AI Ships GLM-5.1: Open-Source 744B Mixture of Experts Claims Top SWE-Bench Score
Z.ai released GLM-5.1 under the Massachusetts Institute of Technology (MIT) license, making it the most capable freely redistributable Large Language Model (LLM) currently available. The architecture uses a Mixture of Experts (MoE) design: 744 billion total parameters with only 40 billion active on any given forward pass, keeping inference costs comparable to a far smaller dense model. The context window spans 200,000 tokens. On SWE-Bench Pro, a real-world software engineering benchmark, GLM-5.1 reportedly outperforms GPT-5.4, Claude Opus 4.6, and Gemini 3.1 Pro. The MIT license with no use restrictions makes this a significant milestone for the open-source AI ecosystem.
Read source โ
Microsoft Releases MAI Trio: Cheaper Foundational Models for Speech, Voice, and Image Generation
Microsoft AI (MAI) announced three new foundational models: MAI-Transcribe-1 for speech recognition at $0.36 per hour, MAI-Voice-1 for text-to-speech at $22 per one million characters, and MAI-Image-2 for image generation at $5 per one million input tokens. All three are priced substantially below comparable offerings from Google and OpenAI, and Microsoft is positioning them as the cost-effective default for enterprise workloads requiring high volumes of multimodal processing.
Read source โ
๐ก Safety
What's dangerous?
Large Reasoning Models Used as Autonomous Jailbreak Agents Achieve 97% Attack Success Rate
A study in Nature Communications evaluated DeepSeek-R1, Gemini 2.5 Flash, Grok 3 Mini, and Qwen3 235B as autonomous adversarial agents directed to attack other Large Language Models (LLMs). Across all model pairings the overall attack success rate reached 97.14%, demonstrating that reasoning capabilities designed to solve complex problems can be systematically repurposed as jailbreak engines. The gap between target models was stark: Claude Sonnet 4.6 registered a 2.86% harm score, while DeepSeek-V3 reached 90%, suggesting alignment robustness varies enormously across providers. The paper also concludes that design-level countermeasures such as scaffolded AI frictionDeliberate interface design choices that require users to actively engage before accepting AI outputs, preserving critical thinking against passive AI over-reliance โ first explained April 7, 2026. are insufficient as standalone defences when attackers can automate reasoning-driven adversarial strategies at scale.
Read source โ
Agentic AI Systems Require Contextual Red-Teaming, Not Just Prompt-Level Testing
Palo Alto Networks' security research team published an analysis arguing that traditional red-teaming, focused on crafting adversarial prompts against a single model, is insufficient for agentic AI deployments. Such systems can invoke tools, read databases, browse web pages, and chain actions across sessions, meaning a successful attack can trigger irreversible real-world consequences far beyond a harmful text response. The proposed approach begins with a capability-mapping phase: identifying which tools an agent can call, what data it can access, what actions are irreversible, and what constraints govern its autonomy, before any adversarial probing begins. The team positions contextual red-teaming as the new minimum standard for any production agentic deployment.
Read source โ
โ๏ธ Policy
What's regulated?
White House National Artificial Intelligence Legislative Framework Calls for Federal Preemption of State Laws
The Trump Administration published its National Policy Framework for Artificial Intelligence on March 20, 2026, providing Congress with a set of legislative recommendations that form the Administration's desired legal architecture for AI in the United States. The central ask is federal preemption of conflicting state AI laws, on the grounds that fifty separate state regimes would fragment compliance requirements and undermine US global competitiveness. Stated priorities include removing barriers to AI innovation, protecting free speech and First Amendment rights, advancing workforce development, and ensuring child safety. The document explicitly cautions Congress against vague liability standards and open-ended regulatory mandates. Legal analysts note the framework does not carry the force of law but signals clearly where the Administration will resist or support specific legislative proposals.
Read source โ
AI Enforcement Accelerates as Congress Stalls: States and Private Plaintiffs Fill the Regulatory Gap
A Morgan Lewis analysis published in April 2026 finds that the absence of a comprehensive federal AI statute has not slowed enforcement activity: it has diversified it. Federal agencies are applying existing sectoral authorities to AI in finance, healthcare, and consumer protection. State legislatures are enacting targeted measures covering deepfakes, algorithmic hiring decisions, and automated decision systems. Private litigants are advancing novel legal theories in intellectual property, defamation, and negligence. The net effect is that organizations deploying AI face mounting, overlapping compliance obligations across jurisdictions, even before Congress passes any dedicated AI legislation.
Read source โ
๐ Applications
What's working?
Customer Service GenAI Deployments Are Delivering Measurable Returns โ but Only When Integrated Into Workflow
Deloitte's 2026 State of Artificial Intelligence (AI) in the Enterprise report, based on 2,800 business leaders, finds that 74% of organisations say their most advanced GenAI initiative is meeting or exceeding Return on Investment (ROI) targets, with the clearest gains concentrated in customer service automation, document intelligence, and knowledge management. A parallel Harvard Business School study found AI-assisted workers completed tasks 25% faster with 40% higher quality scores โ but only where the tool was embedded in the existing workflow rather than running as a separate application. The consistent failure mode: organisations deploying GenAI as a standalone chat interface without process integration report little to no measurable financial impact, a finding corroborated by MIT's NANDA initiative, which concluded 95% of pilot programmes fail to reach production ROI.
Read source โ
NotebookLM: Upload Any Document and Have a Conversation With It โ Free, No Engineering Required
Google NotebookLM lets any knowledge worker upload Portable Document Format (PDF) files, slides, articles, or meeting transcripts and then ask questions, generate summaries, and produce audio overviews of the material โ all grounded strictly in the uploaded documents, with no hallucinated external facts. The tool is free, requires no technical setup, and is particularly effective for people dealing with dense reports, research papers, or board packs who need to extract key points quickly. A practical workflow: upload a lengthy report before a meeting, ask "what are the three decisions this document is asking me to make?", and use the structured answer to prepare. NotebookLM became a Google Workspace core service with enterprise-grade data protection in February 2025, meaning uploaded content is not used to train Google's models.
Read source โ
Cloudera Agent Studio: Build, Test, and Deploy Multi-Agent Workflows โ Low-Code to High-Code
Cloudera Agent Studio is an enterprise platform for building, testing, and deploying multi-agent workflows entirely within your existing Cloudera environment โ no external data egress, no additional governance risk. It covers the full stack: dynamic multi-step planning, multi-agent collaboration with graphical trace visibility, and sandboxed execution. Long-horizon agents can pursue objectives across dozens of sequential decisions over hours or days while maintaining context. Tool access connects natively to Cloudera Data Flow, Data Warehouse, and Data Visualisation as callable agents. Built in collaboration with NVIDIA, it supports NVIDIA NIM inference and the Nemotron model family. Observability is built in via OpenTelemetry integration and real-time debugging โ essentially a harnessThe complete infrastructure governing a production AI agent: tool authorisation, guardrails, state management, rollback, and observability โ first explained April 10, 2026. out of the box for enterprise teams.
Explore the tool โ
๐ก Term of the Day
What does it actually mean?
Harness Engineering
Agents & Agentic AI
Harness engineering is the discipline of designing the complete infrastructure that governs how an Artificial Intelligence (AI) agent operates in production. A harness is not the agent itself โ it is everything around the agent: the tools it is allowed to call, the constraints that define its action space, the guardrails that prevent irreversible or harmful steps, the feedback loops that let it self-correct, and the observability layer that lets engineers inspect its full reasoning trace. As enterprises move from AI pilots to production deployments โ Salesforce Agentforce now has 6,000 enterprise customers โ harness engineering is the emerging discipline that separates reliable production agents from brittle demos.
Why Practitioners Misread This
Most teams treat "harness" as a synonym for a thin prompt wrapper or a basic retry loop. That misses the point entirely. A production harness must address at least five layers: tool authorisation (which tools can the agent invoke, and under what conditions?), state management (how does the agent track what it has already done, and what is still pending?), error recovery (what happens when a tool fails mid-task โ does the agent retry, escalate, or abandon?), rollback (can actions the agent has taken be undone if something goes wrong?), and observability (can engineers replay the agent's full decision trace after the fact?). Teams that skip harness engineering and ship a raw agent to production are running autonomous systems with no circuit breakers. The failure mode is not a crash โ it is a silent sequence of plausible-looking actions that produces a wrong or harmful outcome with no clear audit trail.
๐ Research Papers
What's being researched?
AI Scientist-v2: First Fully Autonomous System to Have a Paper Accepted at Peer Review
AI Scientist-v2, developed by Sakana AI, uses an agentic tree-search loop to autonomously propose hypotheses, design experiments, analyze results, and author complete research papers without any human contribution at the writing stage. A paper fully generated by the system has been accepted at a major peer-reviewed venue, which researchers and journal editors are calling a confirmed first. The result has reopened debate about authorship norms, reproducibility standards, and what peer review certifies when the work was not written by a human researcher.
Read source โ
TurboQuant: Reducing Codebook Memory Overhead in Vector Quantization (ICLR 2026)
Presented at the International Conference on Learning Representations (ICLR) 2026, TurboQuant addresses a practical bottleneck that emerges when Large Language Models (LLMs) are compressed for inference on lower-cost hardware. As quantization becomes standard practice for on-device and cost-constrained deployments, the memory required to store vector quantization codebooks grows into a meaningful fraction of total model memory. TurboQuant introduces an algorithm that reduces codebook storage without significant accuracy degradation, making it directly relevant to on-device deployments such as those enabled by today's LiteRT-LM launch.
Read source โ