The Core Idea
Some requests are too big for one prompt, and you cannot split them up ahead of time either, because you do not know the parts until you have read the request. "Plan a street party", "work out whether our town needs a new library", "turn this messy brief into a launch plan": each breaks into different pieces, and which pieces you need depends entirely on the specific request. The orchestrator-workers pattern is built for exactly this. One agent, the orchestrator, reads the request, decides on the spot what subtasks it needs and who should handle them, hands each subtask to a specialist worker, and finally stitches the workers' results into a single answer.
A lead detective works a case the same way. No two cases are alike, so there is no fixed checklist to follow. The lead sizes up what is in front of them and decides what this case needs: perhaps a forensics sweep, a round of witness interviews, a look through the financial records. Each line of inquiry goes to the right specialist. Then the lead draws every finding together into one theory of the case. The specialists never see the whole picture, and the lead never dusts for prints. Two kinds of role, one coordinating mind.
The Two Roles
The Orchestrator (the coordinator)
Reads the request, breaks it into subtasks at runtime, assigns each to a worker, then synthesises the results. It owns the plan and the final assembly, never the detailed work.
The Workers (the specialists)
Each handles one subtask in its area, and only that. A worker knows nothing of the others and never sees the whole job. Swap in a new kind of worker and the orchestrator can start using it.
The Orchestrator's Two Jobs
Everything the orchestrator does falls into one of two jobs, and they bookend the whole run.
1. Decompose and delegate, at runtime
This is the move that defines the pattern. The orchestrator does not follow a fixed recipe. It looks at the actual request and decides, then and there, what the subtasks are and which specialist each one needs. Ask it to plan a street party and it might call for a caterer, a logistics lead, and someone on publicity. Ask it to research a policy question and it would pick an entirely different set. Because deciding the parts is itself a judgement call, an Large Language Model (LLM) sits at the centre doing the deciding.
2. Synthesise
When the workers report back, the orchestrator combines their separate pieces into one coherent result. This is well beyond gluing sections together. It reconciles overlaps, fills gaps, puts the material in a sensible order, and produces something whole that answers the original request. The workers each saw a sliver. The orchestrator is the only part of the system that ever holds the complete picture.
Dynamic, Not Static: the Difference from Parallelisation
This pattern can look like parallelisation. In both, several agents work on different parts at the same time. The difference is where the split is decided. With parallelisation you decide the parts in advance and they are the same on every run, the way a cookie cutter presses out the same shape from any dough. Here the orchestrator decides the parts at runtime, fresh for each request, the way our lead detective sizes up each new case from scratch. That single shift, from a fixed split to a decided one, is the whole pattern.
Which to use. Parallelisation is faster, cheaper, and simpler whenever the breakdown is known and fixed. Orchestrator-workers is the one to reach for when the breakdown itself depends on the request and cannot be written down in advance. The price for that flexibility is an extra LLM call to plan, plus the unpredictability that comes with letting a model decide the shape of the work.
Orchestrator-Workers in Code
Below is the pattern on one request: plan a neighbourhood street party. The orchestrator decides the subtasks, a worker handles each, and the orchestrator synthesises a plan. The blocks form one complete program. Set an OPENAI_API_KEY and it runs as shown.
Start with the plumbing: a small chat() helper, the request, and a tiny helper that pulls a JSON list out of a model reply even when it adds stray text around it.
import os, json
from openai import OpenAI
from dotenv import load_dotenv
load_dotenv()
client = OpenAI(api_key=os.getenv("OPENAI_API_KEY"))
def chat(user_prompt, system_prompt="You are a helpful assistant.",
max_tokens=400, temperature=None):
kwargs = dict(
model="gpt-4",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt},
],
max_tokens=max_tokens,
)
if temperature is not None:
kwargs["temperature"] = temperature
return client.chat.completions.create(**kwargs).choices[0].message.content
REQUEST = "Plan a neighbourhood street party for about 80 people on a Saturday afternoon."
def _json_list(text):
"""Pull the JSON list out of a reply, tolerating fences or stray prose."""
start, end = text.find("["), text.rfind("]")
return json.loads(text[start:end + 1])
The orchestrator's first job, decompose, is a single LLM call that returns a plan as data. It is told to reply with a JSON list of subtasks, each naming a specialist and the task for them. This is the dynamic step: a different request would come back with a different list.
orchestrator, job 1: decompose at runtimedef plan(request):
raw = chat(
user_prompt=f"Request:\n{request}",
system_prompt=(
"You are a project lead. Break the request into 3 to 5 independent "
"subtasks, each owned by one specialist. Reply with ONLY a JSON list, "
'each item {"specialist": "...", "task": "..."}. No prose, no code fence.'
),
max_tokens=400,
temperature=0.3,
)
return _json_list(raw)
Each worker is one LLM call that takes on the specialist role the orchestrator named and does only its slice. It sees the overall goal for context, but its job is the single task it was handed.
the worker: one specialist, one subtaskdef work(specialist, task, request):
return chat(
user_prompt=f"Overall goal: {request}\n\nYour task: {task}",
system_prompt=(
f"You are a {specialist}. Carry out your task concisely and "
"practically, as 4 to 6 short bullet points."
),
max_tokens=300,
temperature=0.7,
)
The orchestrator's second job, synthesise, is one more LLM call that takes every worker's output and weaves a single plan. The instruction to integrate rather than stack is what keeps the result coherent instead of a pile of disconnected sections.
orchestrator, job 2: synthesise the resultsdef synthesise(request, results):
bundle = "\n\n".join(
f"From the {r['specialist']} (on: {r['task']}):\n{r['output']}" for r in results
)
return chat(
user_prompt=f"Original request:\n{request}\n\nSpecialist contributions:\n{bundle}",
system_prompt=(
"You are the project lead. Weave the specialist contributions into one "
"clear, well-structured plan that answers the original request. Add a "
"short intro, a section per area, and a closing checklist. Do not just "
"stack the pieces, integrate them."
),
max_tokens=600,
temperature=0.4,
)
The run function is the orchestration itself: decompose, send each subtask to a worker, then synthesise. The workers here are independent, so they could run at the same time the way parallelisation does. The difference is that the orchestrator chose them a moment ago rather than you choosing them in advance.
def run(request):
subtasks = plan(request)
print(f"[Orchestrator] split the request into {len(subtasks)} subtasks:")
for s in subtasks:
print(f" - {s['specialist']}: {s['task']}")
results = []
for s in subtasks:
print(f"[Worker: {s['specialist']}] working...")
results.append({**s, "output": work(s["specialist"], s["task"], request)})
print("[Orchestrator] synthesising the final plan...")
return synthesise(request, results)
if __name__ == "__main__":
final = run(REQUEST)
print("\n=== FINAL PLAN ===\n" + final)
Run it and the orchestrator's choices show up first, then the workers, then the woven plan. Notice that nobody hard-coded "caterer" or "logistics". The orchestrator read the request and decided them:
sample output[Orchestrator] split the request into 5 subtasks:
- Event Planner: Plan the overall event, including schedule, activities, layout
- Caterer: Plan, prepare, and serve food and drinks for 80 people
- Entertainment Coordinator: Arrange music, games, and other entertainment
- Logistics Manager: Permits, street closures, setup and cleanup
- Marketing Specialist: Create and distribute invitations and announcements
[Worker: Event Planner] working...
[Worker: Caterer] working...
[Worker: Entertainment Coordinator] working...
[Worker: Logistics Manager] working...
[Worker: Marketing Specialist] working...
[Orchestrator] synthesising the final plan...
=== FINAL PLAN ===
Introduction: A neighbourhood street party is a great way to build community...
Section 1 - Event design: layout, a Saturday-afternoon schedule, and
activities for all ages, agreed first with a small organising group.
Section 2 - Catering: a crowd-friendly menu for 80 with dietary options,
quantities estimated, suppliers and serving stations arranged.
Section 3 - Entertainment: live music or a DJ, games for children and
adults, with an optional evening film.
Section 4 - Logistics: permits and a road-closure request, a safety plan,
and clear notice to every resident.
Section 5 - Promotion: printed invitations two weeks ahead plus posts on
local community channels, with RSVPs tracked.
Closing checklist: committee meeting, layout and schedule, menu and
suppliers, entertainment booked, permits and closures, invitations sent.
Where It Goes Wrong
A bad decomposition sinks everything. The orchestrator's split is the foundation the whole run stands on. If it misses a subtask or carves the work badly, no worker can put back what was never asked for, and the synthesis inherits the gap. The decompose step deserves your most careful prompt and your most capable model, because every later step trusts it.
The structured handoff can break. The orchestrator hands its plan to ordinary code as data, here as JSON. Models sometimes wrap that in code fences or add a sentence of explanation, which would crash a naive parser. Read it defensively, as the _json_list helper does, and keep a fallback for when the plan does not come back as valid data at all.
Open-ended means unpredictable cost. Because the orchestrator decides how many subtasks to spawn, a sprawling request can balloon into many worker calls, and a confused one into nonsense subtasks. Cap the number of subtasks, sanity-check the plan before running it, and keep an eye on the spend.
Synthesis can flatten or paper over. Folding many voices into one risks losing detail, or smoothing over a genuine conflict between two workers. If one worker says the budget is tight and another plans a brass band, the synthesiser should surface that tension, not quietly average it away. Ask it to flag disagreements rather than blend them.
When to Reach for It
Reach for this pattern when the request is open-ended and its parts are not known until you read it: research questions, planning tasks, anything where each instance breaks down differently. Skip it when the breakdown is fixed and known in advance, where plain parallelisation is simpler, cheaper, and just as good. The test is one question: can you write the list of subtasks before you see the request? If yes, you do not need an orchestrator. If no, this is the pattern that earns its keep.
Lesson Recap
What You Now Know
- The pattern: one orchestrator reads a request, splits it into subtasks at runtime, delegates each to a specialist worker, and synthesises their outputs into one answer
- The two roles: the orchestrator owns the plan and the final assembly, the workers each own one subtask and never see the whole job
- The orchestrator's two jobs: decompose and delegate at the start, synthesise at the end
- Dynamic, not static: the defining trait is that the split is decided per request at runtime, which is why an LLM does the deciding
- The difference from parallelisation: parallelisation uses a fixed split you set in advance, this pattern lets the orchestrator decide the split fresh each time
- Synthesis is real work: the orchestrator integrates the pieces rather than stacking them, and is the only part that holds the whole picture
- Main risks: a poor decomposition dooms the run, the data handoff can break, cost is unpredictable, and synthesis can hide conflicts
- The test for using it: if you cannot write the subtask list before seeing the request, this is the pattern you want