Agentic Workflow Modelling

The Evolution Path

Every business process is a hybrid of two kinds of work:

Instruction-ready work — work where you can give a machine explicit instructions: simple if/then rules, algorithmic transformations, data parsing, API calls, analytical tasks.
Intuition-based work — unstructured language, messy human interactions, subjective judgment. For most of computing history, this resisted machine automation entirely. Large Language Models (LLMs) changed that: they can now handle significant portions of it, up to a point.

The goal is not to maximise intelligence in the system. It is to push intelligence to the edges and keep everything between those edges deterministic, repeatable, and auditable. Agentic workflows add value only where deterministic execution is not feasible. Most teams get this wrong and that is where most initiatives fail.

The Deterministic Boundary Test

Every process has a deterministic skeleton. The task is finding how deep it goes — for some workflows only the top-level sequence is fixed; for others you can push it down to individual field extractions and routing rules. Apply one question to every step:

Can this step be written as explicit code?

If yes, build it as a fixed workflow — use a coding agent (Claude Code, Cursor) to implement the logic if it's complex. The output should be deterministic code, not an LLM deciding at runtime. If no, the step needs judgment or handles inputs that genuinely can't be scripted: use the LLM only there, and keep everything else deterministic.

But even deterministic steps have a boundary. Each one was written for the inputs you anticipated, not the ones you didn't — a malformed field, an unexpected format, an edge case you've never seen. That's where deterministic code fails. If the LLM handles the else branch of each step, the system degrades gracefully instead of breaking. When the LLM catches the same edge case repeatedly, write deterministic code for it and retire the fallback. The boundary expands: the system grows more deterministic over time, not less.

These steps form a natural three-stage path from understanding a process to building it right:

Example: Customer Support Email Triage

A support email arrives. Here's how to apply the three stages.

Stage 1: Describe

A support email arrives. I need to identify the customer, understand their issue, check their account history, categorise the issue type, and either resolve it directly or pass it to a human with context.

Assess: which steps can be scripted?

Deterministic ✓ → Stage 2: Automate

Parse email: extract sender, subject, body
Look up customer by email in the database
Extract order ID via regex
Fetch order status from the orders API
Route to escalation queue if SLA breached

Needs LLM ✓ → Stage 3: Fill Gaps

Classify issue from free-text (billing? shipping? defect?)
Draft a response tailored to the customer's tone and context
Decide whether an exception (refund, replacement) is warranted

Stages 2 & 3: Build it

Stage 2 (Automate): a coding agent (Claude Code, Cursor) turns the five deterministic steps into reliable pipeline code. Stage 3 (Fill Gaps): the LLM handles the three steps that genuinely can't be scripted — classification, response drafting, and exception judgment.

And at every deterministic step, the boundary matters. What if the email is a forwarded chain or the order ID is in an image? The deterministic code handles the expected case; the LLM catches what falls outside it. When the same edge case repeats, write a parser for it and retire the fallback.

A nuance: some tasks skip Stage 1 and 2 entirely

This framework assumes you're starting from a deterministic process: something with a clear, repeatable structure you first understand, then encode, then generalise. That fits well for invoice processing, customer triage, report generation, data pipelines.

Some tasks were never deterministic to begin with — writing code, research, open-ended planning — and the agent is the only viable approach from the start. The lesson still applies: don't add an agent to something genuinely deterministic, and don't rigidly script something inherently open-ended.

The Generalisation Step

Agentic workflow modelling is fundamentally an exercise in abstraction. We identify common structure, isolate variation points, and apply cognition only where abstraction breaks down.

Once your workflow is working (deterministic steps built and tested, LLM filling the gaps and catching the unknowns), the next question is scope. Can one workflow handle more than one process? Look across your business for similar processes. Where they share the same steps, keep them fixed. Where they diverge, apply the same question from Stage 2: can this difference be handled deterministically (a routing rule, a conditional, a lookup)? If yes, code it. If no, the step genuinely requires judgment that varies by context: that's where the agent decides. One generalised workflow replaces many one-off automations.

The recipe: map multiple similar processes, identify where they diverge, and at each divergence point assess whether the routing can be scripted. If yes, code it. If no, that's where the agent decides.

Implementing This at Scale

Stage 1: Process Decomposition

Strip away the "AI magic" and document the process as raw engineering: inputs, outputs, state changes, and decision trees. The deliverable is a flow diagram or pseudocode that describes the strict sequence of events.

Stages 2 and 3: The Deterministic Boundary

When evaluating the workflow, every step falls into one of three execution types:

Execution Type	When to Use	Tooling
Pure Deterministic Code	Field extraction, API calls, database lookups, static routing	Python, TypeScript (built with a coding agent)
LLM as Core Engine	Unstructured text classification, synthesis, judgment calls that can't be scripted	Structured outputs: JSON mode, Instructor, Pydantic
LLM as Catch-All	The else branch of deterministic steps: malformed inputs, formatting anomalies, edge cases not yet in the deterministic parser	Fallback handlers, telemetry alerts, refactor triggers

The distinction between rows 2 and 3 matters: "LLM as Core Engine" is a planned, permanent use of the LLM for steps that were never deterministic. "LLM as Catch-All" is a temporary fallback for steps that are deterministic but haven't yet covered a specific edge case. The second kind should shrink over time as you harden the boundary.

The Cost of Cognition

Deterministic and agentic execution have fundamentally different cost profiles. Treat agentic computation as an expensive resource: deploy it only where deterministic execution genuinely cannot do the job.

Execution Type	Cost	Latency	Reliability
Deterministic	Lowest	Lowest	Highest
Agentic	Highest	Highest	Variable

Every step you keep deterministic is a step you run faster, cheaper, and with predictable output. The LLM budget goes further when it is spent only where code and rules genuinely cannot do the job.

Engineering the Expanding Boundary

The hardening loop has a concrete engineering implementation: wrap every deterministic step in a try/except block, route failures to an LLM handler, log every trigger, and refactor when a pattern repeats.

Prompt pattern for the exception handler:

"The deterministic parser failed to extract the Order ID from this input. Analyse the string, extract the ID, and label the structural anomaly format so it can be added to the parser."

The label instruction is critical: without it you collect resolved cases but no dataset to refactor from.

The Agent Router

When consolidating multiple similar processes into one generalised workflow, every divergence point needs a router. Apply the same assess-first rule: never use an LLM router if a deterministic one will do.

Both routes converge back into a deterministic code pipeline. The LLM router's job is narrow: classify intent into a fixed set of options (structured output, not free text), then hand off. The routing decision itself should never be open-ended.

One metric to track

As the boundary expands over time, the proportion of inputs hitting the LLM fallback should fall. Track fallback rate (percentage of inputs that reach the LLM catch-all) per deterministic step. If the rate isn't declining after hardening cycles, the refactor loop isn't closing.

Example threshold: if a step's fallback rate exceeds 5% over a rolling 10,000-input window, raise an automated alert. That rate signals structural drift in input formats, not random noise, and warrants an immediate refactor cycle rather than leaving the LLM to absorb it indefinitely.

Workflows vs Chatbots

Both use LLMs, but they're fundamentally different things:

Agentic Workflow

Has a start and an end
The agents decide when to stop
Goal: complete a defined task
Can reflect, iterate, find different paths

Start

→

Agent A

→

Agent B

→

Done

Chatbot

No defined end: runs until user stops
The user decides when to stop
Goal: continuous conversation
Reactive: responds to user input

User

↔

Bot

↔

User

↔

∞

What Makes Workflows Powerful

Even though workflows have a defined end, they're not rigid. Agents within them can reflect on their work, retry with improvements, and choose different execution paths. Structure plus flexibility is what separates them from both chatbots and fixed-script automation.

The Modelling Playbook

Every workflow has two dimensions: the process (what steps happen and in what order) and the agents (who decides and acts at each node). The diagram below shows what each lens looks like in practice.

The Shift in Thinking

Process-centric asks what are the steps? Agent-centric asks who handles this, and what can they decide? Start with the process to get the structure. Then identify which nodes need an agent — and leave everything else deterministic.

Design Checklist: How to Spec a Workflow

Deterministic

List all tasks, start to finish
Map sequence + dependencies
Document inputs/outputs per step
Define decision points (if/then rules)
Write execution functions
Visualise as flowchart
Validate + test

Done once. Iterate only if tests fail.

Agentic

Agent capabilities: what each agent can do, decide, and act on
Goal structure: objectives and how success is measured
Environment constraints: boundaries agents must operate within

This is a spec: it describes what you configure. But what makes it agentic is what happens after deployment ↓

The Operational Cycle: What Makes It Agentic

The spec above is just the starting point. What makes a system agentic is that it runs in a loop: each outcome feeds back and shapes what the agent does next. (This adapting happens within a single run — carrying it across runs needs memory, since the model itself isn't retrained.) For the loop to work, the agent has to remember the run so far: each step's result is collected and passed into the next LLM call. Without that running record, every call starts blank and the feedback has nowhere to land.

Why This Matters

A deterministic workflow runs the same way every time. That is the right choice for stable, auditable processes. An agentic workflow improves with every rotation within a run. Outcomes feed back into capabilities: the agent adjusts its reasoning, tries different paths, and gets better at achieving its goals, though carrying those lessons across runs requires adding memory.

Architectural Blueprinting

Architectural blueprinting is the practice of translating a workflow into a diagram that anyone on the team can read and build from. The conventions below define the shared visual language — you’ll see them applied in the Risk Assessment case study that follows.

Four Rules for Clear Diagrams

1. Standard Symbols

Rectangles for tasks, diamonds for decisions, arrows for flow. Pick a convention and stick to it. Consistency beats creativity in diagrams.

2. Clear Labelling

Do: "Fetch User Preferences"
Don't: "Get Data"
Names should tell you what happens without reading docs.

3. Show Inputs & Outputs

Every task consumes something and produces something. Label your arrows or add data annotations. This reveals dependencies and helps with debugging.

4. Right Granularity

Start high-level ("Process Order"), zoom in when needed ("Check Inventory" → "Process Payment"). Match detail level to your audience. Executives vs engineers see different diagrams.

Example: Granularity Levels

The Agentic Litmus Test

The agentic litmus test comes down to a single question:

Is the execution path fixed at design time, or determined at runtime?

You can apply this test to an entire workflow or zoom into a single task within it.

The Signal: If a component follows a predefined sequence of steps with well-defined inputs and outputs, it should be deterministic. If the component must decide what information to gather, which tools to use, which actions to take next, or when the task is complete, that is where an agent belongs.
The Reality: A workflow can use AI at every single node and remain entirely deterministic. Conversely, a single autonomous task embedded within an otherwise rigid pipeline can be genuinely agentic.

A useful rule of thumb:

Known process → Workflow

Known goal, unknown process → Agent

Case Study: Risk Assessment

A bank needs to evaluate a loan application and produce a risk decision. To do this, it may need to collect customer data, validate it, calculate a risk score, categorise the risk, and make a final decision.

The question is not what to do. The question is how to structure the work: does every application follow the same fixed sequence, or does the system decide what to investigate based on what it finds?

Version 1: Deterministic

The workflow is defined at design time. Five steps, always in the same order, always the same path.

Both applicants go through the same process. Only the outcome differs.

Applicant A

Collect → Validate → Score → Categorise → Approve

Applicant B

Collect → Validate → Score → Categorise → Reject

This is appropriate when regulations require consistency, inputs are well understood, and every decision must be auditable by the same standard.

Version 2: Agentic

Instead of a fixed sequence, a planner decides what to investigate next based on what it discovers. The path is not set at the start; it emerges from the evidence.

Two applications arrive. The planner chooses a completely different investigation path for each.

Run 1: Property investment

Check location risk

Finding: high flood risk

Check insurance costs

Finding: extremely high

Decision: Reject

Path: Location → Insurance → Decision (2 steps)

Run 2: Development application

Check market conditions

Finding: stable

Check construction costs

Finding: rising rapidly

Check developer financials

Finding: weak balance sheet

Decision: Reject

Path: Market → Cost → Financials → Decision (3 steps)

Different paths. Different number of steps. Same overall goal.

Why This Is Truly Agentic

In the deterministic version, the execution path is known at design time:

Collect → Validate → Score → Categorise → Decision

In the agentic version, only the goal is known at design time:

Produce a risk decision.

Everything else is determined at runtime. The system must figure out:

Which investigations are needed
How many investigations are needed
What order to conduct them in
When enough evidence has been gathered to decide

Those decisions are made by the planner at runtime, based on what each investigation reveals. That is what makes it genuinely agentic.

Enterprise Parallel

The same distinction appears in enterprise due diligence. A deterministic approach assigns every acquisition the same four reviews:

Financial → Legal → Security → ESG → Decision

An agentic approach starts with a goal and follows the evidence. The first finding reveals cybersecurity concerns, so the next step is infrastructure. That reveals legacy systems, so the next step is compliance exposure, then third-party vendors. The investigation path emerges from what is discovered, not from a checklist defined before the work began.

The Lesson

The risk assessment case study is a direct illustration of the agentic litmus test. In the deterministic version, the path is fixed at design time. In the agentic version, the system decides what to investigate next, what information to gather, and when enough evidence exists to conclude. Every case may require a completely different investigation path while pursuing the same objective. That is the test.

Real Agentic Patterns

Once a workflow decides its own steps, real patterns emerge. Here are four architectures used in practice:

1. Goal-Setting Loop

The workflow starts with a goal, not a step. It plans tasks, executes them with tools, evaluates results, and loops until the goal is met.

2. Group Chat Pattern (e.g. AutoGen)

A chat manager coordinates multiple agents: here an assistant and a user proxy. AutoGen (a multi-agent framework from Microsoft) popularised this pattern. The assistant works; the proxy acts as a stand-in for the human and judges when the job is done.

3. Worker + Critic (Nested)

A worker generates output, a critic evaluates it, and a user proxy decides when the result is good enough.

4. Crew Pattern (e.g. CrewAI)

A manager receives the goal and delegates to specialised agents, each with their own sub-tasks and tools. Agents can work in parallel (parallel execution is opt-in; CrewAI defaults to sequential), which is key to efficient agentic systems. Scales to many agents.

Parallel execution warning. When agents run concurrently and write to a shared memory object or state context, their outputs can conflict or overwrite each other. Parallel execution requires a deterministic orchestration layer (directed acyclic graph (DAG) or map-reduce) to synchronise the join before a manager synthesises results. Treat shared state as a critical section: one writer at a time, or use isolated output slots that the manager merges.

The Common Thread

Every agentic pattern shares three traits: goals are set at runtime (not hardcoded), steps are generated (not predefined), and feedback loops drive iteration (not just retry-on-failure). The differences are in who coordinates: a goal loop, a chat manager, a critic, or a crew manager.

Agent Building Blocks

When modelling a workflow, you're wiring together agents of different types. Seven common ones, ordered from simple to sophisticated:

Choosing the Right Building Block

Match agent type to the job: Direct/Augmented for simple single-step tasks. Knowledge/RAG when accuracy matters more than creativity. Evaluation when quality needs a second pass. Routing when you need to dispatch across specialists. Planning when the steps themselves are unknown.

The Principle

The goal is not to maximise intelligence. The goal is to minimise the amount of intelligence required.

Every successful agentic system is mostly deterministic with carefully placed islands of cognition. The intelligence is not the architecture. It is a component within one.

Lesson Recap

What You Now Know

Conceptual modelling: how to think about complex processes beyond simple linear steps (evolution path, generalisation, the agentic litmus test)
Agent roles: how to define distinct agent types and responsibilities, from direct prompt to action planning, and wire them into workflows
Parallel processing: how agentic patterns like crew managers and parallelisation let multiple agents work simultaneously for efficiency
Deterministic vs agentic: when each approach fits, what makes a workflow truly agentic (runtime planning, dynamic paths, feedback loops), and how to evolve one into the other