GenAI Radar -- Thursday, May 14, 2026

📡 Industry Signals

What's happening?

risk Cloud Security Alliance 4 min

Shadow AI agents are now the largest unaudited attack surface in most enterprises 🔗

The enterprise assumption that an AI governance policy controls the deployed agent estate has broken against the data. A May 2026 Cloud Security Alliance study finds 82% of organisations found at least one agent unknown to the security team, with 65% confirming a data-exposure incident; more than 80% of Fortune 500 companies run active agents built with low-code tools, yet only 10% have a management strategy.

Three enterprise controls are missing: an AI agent inventory registering every deployed agent with owner and data-access scope; a model governance policy extended to cover unmanaged agents as a distinct risk class; and procurement controls applied to low-code agent platforms on the same terms as third-party SaaS contracts. Ask your Chief Information Security Officer (CISO): how many agents are running against enterprise data, who owns each, and what is their authorisation?

Why it mattersPull the AI agent inventory from your IT department this week. If no formal inventory exists, that confirms the finding. Brief the Technology Committee on the timeline to stand one up, naming a Data Governance or Enterprise Risk owner. Your CISO should be able to confirm within 72 hours which agents are touching regulated or sensitive data.

Read source →

ships SAP Newsroom 4 min

SAP's autonomous ERP launch forces a renegotiation of every ERP services contract 🔗

Large-enterprise ERP (Enterprise Resource Planning) implementations have been priced on the assumption that business process exceptions require human decision-making and billable consulting hours. SAP unveiled its Autonomous Enterprise at Sapphire 2026 on May 12, launching SAP Autonomous Suite with 50 domain-specific Joule Assistants across finance, supply chain, procurement, and human resources (HR), executing end-to-end without human checkpoints.

Three enterprise changes follow: the statement of work (SOW) with any SAP system integrator now needs to price which process steps automation handles rather than billable hours; the workforce impact assessment for affected roles needs scoping before contract signature; and the model governance policy needs to cover Joule Assistants as deployed AI agents with audit-trail requirements. Pull your current SAP renewal scope and ask your Enterprise Architecture lead: which process steps have Joule Assistants, and is that reflected in the Statement of Work?

Why it mattersBrief your SAP programme lead this week: map every process domain in scope against SAP's Joule Assistant capability list and return a revised statement of work that prices automation-first. Before signing any renewal or net-new implementation contract, require a written position on which billable hours are displaced by Joule and what the revised all-in cost model looks like.

Read source →

risk Adversa AI 3 min

New GPT-5.4 bypass technique invalidates enterprise red-team baselines for deployed models 🔗

Enterprise red-team protocols assume that safety guardrails tested at deployment remain stable; a new attack class breaks that assumption. Adversa AI researchers demonstrated Involuntary In-Context Learning (IICL) against GPT-5.4 in May 2026, embedding adversarial context that bypasses safety guardrails without triggering standard detection; across more than 60 test scenarios, the technique achieved a 60% attacker success rate.

Three enterprise controls need updating: the red-team runbook for any deployed OpenAI model should include IICL-class prompt-injection scenarios; the eval harness for GPT-based production deployments needs IICL coverage before any model version switch; and data loss prevention (DLP) rules applied to AI interfaces need to account for adversarial in-context manipulation, not only static content filters. Ask your CISO: is IICL-class injection covered in your current red-team runbook for GPT-5.x deployments, and when did you last run a live test?

Why it mattersUpdate your red-team runbook before the next model version switch. Pull the eval harness test set for any production GPT-5.x deployment and add IICL-class scenarios this sprint. Your CISO should confirm the coverage gap is closed before you sign any OpenAI contract renewal or expand the GPT-5.x footprint into regulated-data environments.

Read source →

🧠 Models & Tools

What's new?

Google / Vertex AI 3 min

Gemini 3.1 Pro Preview extends Google's reasoning lead ahead of I/O 2026 🔗

Gemini 3.1 Pro Preview is now in limited availability for Gemini Enterprise customers on Vertex AI, arriving two weeks before Google I/O 2026 (May 19). The model is Google's most capable reasoning model in the Gemini 3 series, with enterprise-grade service level objectives (SLOs) in preview, native support for connecting custom Model Context Protocol (MCP) servers for private data access, and expanded context length supporting complex multi-document reasoning. Gemini Enterprise customers gain generally available SLOs for 3.1 Pro while the model itself remains in preview, covering availability and latency commitments. Alongside 3.1 Pro, Google released Gemma 4 open-weight models (gemma-4-26b and gemma-4-31b) via the Gemini API for enterprise teams needing on-premises or private cloud deployment.

What it enablesTeams evaluating Google as a primary frontier vendor should run Gemini 3.1 Pro Preview against their existing benchmark suite before I/O 2026 announcements on May 19 reset the comparison baseline. The MCP server support is the key enterprise feature: it unlocks governed access to internal knowledge bases, ticketing systems, and data warehouses without routing sensitive data through public APIs.

Read source →

OpenAI / TechCrunch 3 min

GPT-5.5 Instant becomes free-tier default, shifting the enterprise cost conversation 🔗

OpenAI released GPT-5.5 Instant on May 5 as the new default model for free-tier ChatGPT users, replacing the previous default that lagged frontier reasoning by several generations. GPT-5.5 Instant delivers improved accuracy, reduced hallucination rates, and notably less verbose formatting than its predecessors; these changes specifically targeted at business workflows rather than conversational use. The practical implication for enterprise procurement teams is strategic: when employees access frontier-tier reasoning for free through personal accounts, the business case for per-seat enterprise licensing narrows to two questions: data sovereignty and governance, not capability access. Organisations whose primary rationale for ChatGPT Enterprise has been "employees need the best model" should reassess the argument; the best model is now the free model for many workloads.

What it enablesUse this release as the forcing function for an enterprise AI access audit. Identify which departments are using personal free-tier accounts for business tasks, what data is being routed through those accounts, and whether the current enterprise licensing structure reflects actual governed usage versus shadow access. The audit finding typically shifts the procurement conversation from capability to compliance.

Read source →

🚀 Applications

What's working?

Enterprise Microsoft Security Blog 4 min

Microsoft Agent 365 gives enterprises a single governance pane for every deployed AI agent 🔗

Microsoft Agent 365 reached general availability on May 1, priced at USD 15 per user per month (or bundled in Microsoft 365 E7 at USD 99 per user per month). The product provides a centralised registry of every AI agent running in the organisation: Microsoft Copilot Studio agents, Azure AI Foundry agents, third-party agents, and shadow agents detected by Defender and Intune, with real-time visibility into activity, health, and risk signals. Multi-cloud registry sync with Amazon Web Services (AWS) Bedrock and Google Cloud enables IT teams to inventory agents across all three major platforms. Defender integration surfaces unmanaged agents, including locally-installed coding assistants. The product sits directly above the governance gap identified in Signal 1: it is the management layer most enterprises do not yet have, and Microsoft is charging USD 15 per user per month to supply it.

What it provesAI agent governance is now a line item, not a process. Chief Information Officers (CIOs) evaluating Agent 365 should request a proof-of-concept against their current environment before purchase: the discovery run alone typically surfaces more unmanaged agents than the security team has on record. Compare the discovery count to your existing AI agent inventory; the delta is the risk exposure number you need for the Technology Committee paper.

Read source →

Personal CNBC / Google 3 min

Google Gemini Intelligence reframes Android as a personal AI system rather than an app launcher 🔗

At its Android Show on May 12, Google announced Gemini Intelligence for Android, a system-level AI layer that replaces the traditional app-centric interaction model. Google's Android chief told CNBC the company is "transitioning from an operating system to an intelligence system." Gemini Intelligence integrates across the phone's core functions, including notifications, calendar, email, maps, and third-party apps, responding to natural-language queries and taking multi-step actions across apps without the user switching between them. Google also unveiled Googlebooks, new premium Android-powered laptops from Acer, ASUS, and Lenovo, co-timed with the intelligence layer. The practical implication for personal productivity is straightforward: the interface for a large segment of Android's 3 billion devices shifts from icon grids to conversational AI, with Google I/O 2026 on May 19 expected to deepen the integration further.

Try thisIf you use an Android device, join the Gemini Intelligence beta before Google I/O next Tuesday. The highest-value workflow to test is cross-app action: ask Gemini to read an inbound email, check calendar availability, draft a reply with a proposed meeting time, and send it, end to end, without opening a single app. The result shows whether the intelligence layer has genuinely replaced the app model or is still a wrapper around it.

Read source →

Developer Google Cloud / Hugging Face 3 min

Gemini Enterprise with custom MCP servers unlocks governed private-data access for agent builders 🔗

Gemini Enterprise now supports custom Model Context Protocol (MCP) servers, enabling enterprise development teams to connect Gemini agents to private internal systems: knowledge bases, ticketing systems, code repositories, customer relationship management (CRM) platforms, without routing sensitive data through Google's own APIs. Developers write a single MCP server exposing internal tools and data; the server then works with any MCP-compatible agent framework. Gemini Enterprise handles authentication via OAuth 2.0 and enforces the organisation's existing access controls on every tool call, so the governance model follows the data, not the agent. For teams already running MCP servers for other agent platforms (Claude Code, Goose, ChatGPT), this means Gemini Enterprise joins the same tool ecosystem without a separate integration build.

Try thisIf your team has built MCP servers for internal tools, run a compatibility check against Gemini Enterprise this week; most existing servers require no changes. The value is in benchmarking Gemini 3.1 Pro against your current frontier model on internal-data retrieval tasks with the same tool layer, giving you a true apples-to-apples capability comparison rather than a general benchmark result.

Read source →

💡 Term of the Day

What does it actually mean?

Authorisation Surface 🔗

Governance · Agent Operations

The authorisation surface of a deployed AI agent is the total set of permissions, data-access rights, tool capabilities, and Application Programming Interface (API) scopes that the agent holds at any point in production. It encompasses every data store the agent can read or write, every external service it can call, every action it can take without further human approval, and every downstream system it can reach through chained tool calls. Managing the authorisation surface is the operational discipline of ensuring an agent can do only what it was procured, tested, and approved to do. In a large enterprise, the authorisation surface of a single agent may span financial records, HR data, customer information, and internal messaging systems simultaneously, even if no single human employee would hold all those access rights at once. The surface grows with each new integration, and most enterprises have no process for auditing it after initial deployment.

Often mistaken for:

Most practitioners confuse the authorisation surface with the API key list for an agent's integrations. The key list is a necessary but insufficient description. The full authorisation surface includes everything reachable through chained calls: if an agent can call a calendar API, which can access meeting notes, which link to a shared drive, the agent's effective authorisation surface includes that drive even if it was never granted direct access. The second common misreading is treating the surface as static after deployment. In practice, it grows continuously as teams add new integrations and tool extensions, often without any review. Shadow AI compounds this: agents built by individual teams with low-code platforms accumulate permissions informally, with no centralised record. An agent inventory (see Signal 1) is the precondition for knowing your authorisation surface; you cannot govern what you have not mapped.

⚠️ Safety & Policy

What's being governed?

Safety ISO / ISACA 3 min

ISO 42001 certification gains enterprise traction as the operational AI governance benchmark 🔗

The International Organization for Standardization (ISO)/International Electrotechnical Commission (IEC) 42001:2023 standard, the world's first international AI management system standard, is seeing accelerating enterprise certification activity in 2026. Hudson Talent Solutions achieved ISO 42001 certification, announced May 13, citing it as the operational framework for governing AI auditability, accountability, and third-party risk across its AI deployments. ISACA's concurrent research finding that AI adoption is accelerating faster than response capability gives the certification momentum a practical urgency: the standard maps directly to the internal audit, risk committee, and board-level assurance questions that enterprise governance teams are now receiving from regulators and clients. ISO 42001 covers AI policy, risk management, data governance, internal audit, and continuous improvement cycles, matching the same seven control domains that enterprise risk management frameworks apply to other technology categories.

What it signalsRequest a gap analysis against ISO 42001 from your internal audit team before the next Audit Committee cycle. The standard's seven control domains map cleanly to existing enterprise risk frameworks, which means the lift for an ISO 27001-certified organisation is narrower than it appears. The audit-readiness argument for your Risk Committee: ISO 42001 certification is now a credible signal to regulators, clients, and Works Councils that AI governance is operational rather than aspirational.

Read source →

Policy EU / Help Net Security 3 min

EU AI Act full enforcement arrives August 2, 2026 with only 37% of enterprises ready 🔗

The European Union (EU) AI Act becomes fully applicable on August 2, 2026, roughly 80 days from today. Under full enforcement, high-risk AI systems covering hiring, credit scoring, biometric identification, critical infrastructure, law enforcement, and education, must have completed conformity assessments, registered in the EU database, and implemented the required human oversight, logging, and documentation controls. A Help Net Security survey published May 14 finds only 37% of organisations have a formal AI governance policy in place, leaving 63% of enterprises exposed to compliance risk. The Omnibus simplification agreed May 7 adjusted some deadlines and thresholds for smaller providers but did not change the August 2 date for enterprises deploying high-risk systems. The enforcement body in each EU member state will be operational by that date.

The compliance angleMap your AI system inventory against the EU AI Act high-risk categories before June 1. Any system touching hiring, performance evaluation, credit decisions, or access to essential services in EU jurisdictions is in scope and needs a completed conformity assessment, not just a policy document. Your General Counsel and Data Protection Officer should jointly own the registry; if that joint ownership has not been formalised, do it this week.

Read source →

📄 Research Papers

What's being researched?

arXiv 2605.13779 4 min

MinT: serving millions of fine-tuned model variants over shared base models cuts handoff steps by 18x 🔗

MinT (MindLab Toolkit) introduces a managed infrastructure model for training and serving millions of Low-Rank Adaptation (LoRA) policies over a small number of expensive shared base models. Instead of materialising each fine-tuned policy as a full merged model checkpoint, MinT keeps the base model resident in memory and moves only the exported LoRA adapter through its lifecycle: rollout, update, export, evaluation, serving, and rollback. Key measurements from the paper: adapter-only handoff reduces the step count by 18.3x on a 4-billion-parameter dense model and 2.85x on a 30-billion MoE (Mixture of Experts) model; the system supports catalogs of 10 to the sixth power (1 million) addressable policies on a single deployment; and concurrent multi-policy training shortens wall time by 1.77x. For enterprise teams running multiple fine-tuned model variants (by business unit, language, or task domain) over a shared foundation model, MinT's architecture cuts both compute cost and the operational complexity of managing large adapter catalogs.

If this holdsTeams maintaining more than five fine-tuned variants of the same base model should benchmark MinT's adapter-serving architecture against their current full-checkpoint approach. The 18x handoff reduction is the headline, but the operational benefit is the shared base model: instead of running N separate model instances, you run one base model and swap adapters. At enterprise scale, that changes the inference budget line materially.

Read source →

arXiv 2605.12411 3 min

Predicting AI agent decisions from limited interactions opens a path to lightweight governance audits 🔗

This paper demonstrates that it is possible to predict the decision-making patterns of an AI agent from a small number of observed interactions, using text-based representations of the agent's behaviour. The finding has a direct governance implication: you do not need to log and audit every agent action to characterise the agent's behavioural envelope. A structured sample of interactions, processed through the paper's classification framework, produces a reliable model of which action classes the agent will and will not take. For enterprise teams tasked with auditing deployed agents against their approved scopes, this approach offers a viable alternative to exhaustive interaction logs, which are costly to store and difficult to interpret at scale. The paper validates the method across multiple agent architectures and task types, with classification accuracy holding above the threshold needed for an internal audit use case.

If this holdsEnterprise governance and internal audit teams should request a proof-of-concept from their AI platform team: apply the paper's sampling methodology to one deployed agent in a controlled environment and compare the predicted behavioural envelope against the agent's documented scope. If the method confirms the approved scope is not being exceeded, it becomes a lightweight ongoing audit mechanism that is cheaper than full-log review at scale.

Read source →