The World of Autonomous Guided Vehicles

The Core Insight

Navigation is Solved. Coordination is Not.

Before any algorithm, you need to understand what AGV systems are actually up against.

The Navigation Problem

Get this robot from A to B

One robot. One destination. An empty map. Dijkstra's algorithm finds the shortest path in milliseconds. This problem is solved. It has been solved since 1959. Every undergraduate computer science student can implement it in an afternoon. Every robotics framework ships it out of the box.

The Coordination Problem

Move everything, efficiently, without gridlock

Fifty robots. Hundreds of tasks. Shared corridors. Shared chargers. Battery levels that decline, tasks that arrive unpredictably, and robots that occasionally fail. Now the paths interact, the assignments compete, and the system can deadlock. This problem is not solved.

Every vendor will show you a robot smoothly avoiding an obstacle. That's navigation, and it's the relatively easy part. What you don't hear often is fifty robots working effectively while competing for three charging stations, while tasks keep arriving. That's coordination. This series is about coordination.

Taxonomy

Not All Robots Are Equal

The term "AGV" is often used loosely. The distinctions matter, because the vehicle type determines the constraints, and the constraints determine which algorithms apply.

🏭

AGV

Fixed-path

🤖

AMR

Free-ranging

🚂

Tugger / Train

High-volume lanes

📦

ASRS

Rack-bound

Autonomous Guided Vehicle (AGV): The original

NavigationMagnetic tape, wire, or laser reflectors

Path flexibilityFixed routes, predefined at installation

Speed0.5 – 2.5 m/s typical

PayloadUp to tens of tonnes

Deployment era1950s to present

Map changesRequire physical modification

The classic AGV follows fixed infrastructure: wire embedded in the floor, magnetic tape, or laser reflectors mounted on walls. The routing graph is fixed at installation. This is a constraint, but also a guarantee: path conflicts reduce to traffic management on a known topology, which is far easier to reason about than free-ranging motion. Classic AGVs dominate heavy-industry settings (automotive plants, paper mills, hospitals) where routes are stable and payloads are large. The planning problem is well-defined because the map is fixed. What varies is the schedule and the traffic.

Decision Architecture

Decisions Across Four Time Horizons

AGV systems make decisions across four time horizons. The layers are coupled, not independent. A decision made at one layer creates constraints that the layers below must absorb. Getting the diagnosis right means knowing which layer the root cause lives in, not just where the symptom appears. Higher layers can optimise more broadly but must commit before conditions are fully known. Lower layers react to current state but with limited reach.

Layer	Typical decisions	Information inputs	Failure mode
Strategic Months – years	How many robots, chargers, lanes? Where to place infrastructure?	Business forecasts, sales history, SKU growth projections Noisy: assumption-heavy, high latency	Under-provisioning. The system hits a ceiling no algorithm can optimise around.
Tactical Days – weeks	Zone layout, traffic rule design, charger placement, shift structure	Historical flow rates, seasonal patterns, shift schedules Known shape, uncertain timing	Congestion. Rules designed for 10 robots fail when 50 are deployed.
Operational Minutes – hours	Assign task to robot, dispatch order, charge scheduling	Real-time Warehouse Management System (WMS) order pool Orders are real, deadlines are fixed	Sub-optimisation. Chasing one hot order while starving the rest of the fleet.
Real-time Seconds	Conflict-free paths, deadlock avoidance, local replanning	LiDAR, encoders, Inertial Measurement Units (IMUs), Vehicle-to-Vehicle (V2V) comms Sensor ground truth: current state only	Local optima. Avoiding a collision now can cause a deadlock 10 seconds later.
Recovery On-event	Robot failure, blocked path, missed deadline, unexpected obstacle	Fault signals, sensor anomalies, timeout alerts High certainty: something has already gone wrong	Escalation failure. Automated recovery loops without resolution, compounding the original fault.

The layers interact through a simple but consequential structure: upstream layers define constraints, downstream layers absorb the consequences. A strategic miscalculation, say too few chargers, becomes a constraint the operational layer cannot optimise around. Robots queue for charging slots. The operational layer degrades, and that pressure propagates into real-time performance: longer waits generate downstream path conflicts. The failure surfaces as a routing problem. The root cause is a capacity planning decision made months earlier. Getting the architecture right at each layer is what separates a working deployment from a permanent incident. The rest of this series addresses each layer in turn.

The Anatomy

Three Things Every AGV System Has

Every AGV deployment, from a single robot in a lab to a thousand-vehicle warehouse, is defined by the same three components. Understanding them precisely is the prerequisite for choosing the right solutions.

🚗

The Fleet

Number of vehicles and their types
Battery capacity and charge rate
Maximum speed and turning radius
Payload capacity and physical dimensions
Current position, orientation, and operational state
Sensor suite and localisation accuracy

🗺️

The Environment

Map topology: graph, grid, or continuous
Static obstacles: walls, racks, pillars
Dynamic obstacles: humans, forklift traffic
Charging station locations and count
Speed zones and one-way corridors
Communication dead zones

📋

The Mission Stream

Task type: pickup, delivery, inspection
Origin and destination locations
Priority and time windows
Arrival pattern: batch, online, or mixed
Order cancellation and modification rate
Dependencies between tasks

Core Problems

Six Problems Inside Every AGV System

Every AGV deployment is solving at least six distinct problems simultaneously. Each has its own difficulty structure, its own solutions, and its own failure modes.

🗺️

Path Planning

Single-robot · Static map

One robot, one destination, a static map: Dijkstra's algorithm finds the optimal path in polynomial time. This problem is fast to solve and well-understood.

The challenge is not finding a path, but executing it while the world changes. Other robots move, humans cross corridors, loads shift. A path computed at t=0 may be invalid by the time execution begins.

Most real systems combine offline shortest-path planning on a static topology with online local replanning. The offline pass gives a route; the online layer handles everything that moved since.

This is still a single-robot problem. The difficulty begins when paths interact.

In production: the topology graph is often built manually. Getting it right (one-way corridors, turn costs, weight limits by route) takes a lot of time before a robot moves. And when paths fail in the field, the algorithm is rarely the cause. Most failures trace back to localisation: the robot knows where it wants to go, but not where it is.

🔀

Multi-Agent Path Finding

All robots · Conflict-free

Unlike single-robot planning, the problem is to assign a path to every robot such that no two robots occupy the same space at the same time.

This problem scales poorly with robot count and conflict density. In worst-case formulations, exact solutions become intractable as fleet size grows. In practice, solvers perform well for small fleets and low-conflict environments. As scale increases, one can trade off a small loss in solution quality for speed using a suboptimal approach.

At warehouse scale, optimality is usually the wrong target. Task streams have enough variance that a fast, slightly suboptimal plan outperforms an optimal plan that arrives too late. Tune for latency, not solution quality.

In practice: most production systems never run a full MAPF solver. Priority rules (higher-priority robot has right of way) are simpler, faster, and sufficient for most warehouse densities. MAPF becomes relevant when density is high enough that priority rules cause too many robots to wait.

📌

Task Assignment

Robots × Tasks · Online

n tasks, k robots: minimise total travel cost. The offline batch case can be solved efficiently. In practice, the problem is online: tasks arrive continuously, robots become available unpredictably, and some tasks have time windows. Every assignment commits capacity that future tasks may need, and no exact solution stays optimal once the system starts changing.

Planning two or three steps ahead is possible but brittle. Uncertainty in arrivals, availability, and travel times quickly invalidates any lookahead plan. Most mature systems therefore assign the next task and re-optimise continuously as tasks complete and new ones arrive: not elegant, but robust to the variability of real task streams.

In practice: the hardest part of task assignment is the integration layer: tasks come from a Warehouse Management System (WMS) or Enterprise Resource Planning (ERP) system with its own priorities and cancellations. The assignment logic is rarely the bottleneck.

🚦

Traffic Management

Deadlock · Contention · Flow

Deadlock is the silent killer of AGV systems. It occurs when a set of robots form a cycle of mutual waiting: four robots, each waiting for the one ahead to clear, forming a closed loop. Nobody moves. The system halts.

Deadlock is a resource allocation failure.

Detecting deadlock is easy: find cycles in the wait-for graph. Recovering from it requires intervention: automated recovery or, in some cases, manual robot relocation. Preventing deadlock by design is much harder. It requires routing decisions that prevent cycles from forming.

Most real deployments use zone-based control (one robot per zone at a time), sacrificing throughput for the guarantee of no gridlock. The alternative, deadlock-free routing, requires careful design of the network topology.

The VDA 5050 standard (Verband der Automobilindustrie 5050) is the emerging protocol for AGV communication. It standardises how fleet management software sends routing and zone-reservation commands, a critical piece of real deployments.

⚡

Charging Strategy

Energy · Scheduling · Prediction

A robot that dies mid-aisle does not just stop. It blocks every robot behind it. Charging is therefore a constraint, not an afterthought. Threshold-based strategies (charge when battery drops below 20%) are simple and predictable, but produce a charging stampede at shift boundaries when all robots hit the threshold simultaneously. Predictive strategies estimate future workload to schedule charging proactively, but require a model of the task stream that typically takes months of operational data to calibrate. The hybrid approach (threshold with workload-aware priority override) is where most mature systems land.

The non-obvious constraint: charger count is often the true system bottleneck, not robot count. Adding robots without adding chargers creates contention at the charging stations that eliminates the throughput gain.

📊

Fleet Sizing

Capacity · Queuing · Variance

Simulation-based

Fleet sizing starts with a simple formula. Take average task duration, multiply by task rate, then divide by target utilisation. This underestimates by 30 to 50% in almost every real deployment. The missing term is variance. Queueing theory, specifically the M/M/c model, shows that as utilisation approaches capacity, wait times grow non-linearly. A system running at 85% utilisation does not have 1.7x the wait time of one running at 50%. It may have 5x or 10x, because variance in arrival and service times creates queues that clear slowly. The honest answer to "how many robots do I need?" is to run a discrete-event simulation with realistic variance before committing.

Rule of thumb: size for 70–75% utilisation target, not 90%. The throughput difference is small. The stability difference is enormous, especially when task arrival variance spikes during peak demand.

Ways of Modelling

Seven Lenses, One System

The model you choose constrains every solution you can reach. Experienced practitioners layer models, using different representations for different parts of the problem. None of these is "the right model." Each makes different trade-offs.

Graph Models

Directed weighted graph · Zone-based · Time-expanded

The graph model is the workhorse of industrial AGV systems. The environment is abstracted as a directed graph: nodes are locations (intersections, pick stations, chargers), edges are traversable segments with travel time and capacity attributes. Path planning is shortest-path search. Traffic management is resource reservation on edges and nodes. The key variant is the time-expanded graph, where each node is replicated for every timestep. A robot's path then becomes a path through this 3D structure (space × time), and conflicts become impossible by construction. Time-expanded graphs make Conflict-Based Search (CBS)-style conflict resolution explicit in the model itself.

Strengths

Matches how real facilities are laid out
Path planning is fast: Dijkstra on known topology
Easy to reason about zones, one-way corridors
Well-supported in most fleet management systems like Open Robotics Middleware Framework (OpenRMF)

Limitations

Graph must be manually constructed and maintained
Doesn't handle free-ranging AMRs naturally
Time-expanded variant grows large quickly (space × time)
Continuous motion (turning radius) requires approximation

Objectives

The Objective Is a Choice

Every AGV system is optimising something. The hard part is that the obvious objectives conflict, and which one you sacrifice is a product decision, not an algorithm one.

Throughput & Efficiency

Tasks per hour: the primary throughput metric for most warehouses
Makespan: time to complete a fixed batch of tasks. Dominant in manufacturing
Total travel distance: proxy for time wasted on movement rather than work
Utilisation rate: fraction of time each robot is doing productive work

Cost & Resources

Fleet size: capital cost. The output of fleet sizing is a lower bound, not a target
Energy consumption: battery cost, charger infrastructure, grid load
Infrastructure footprint: number of chargers, zone density, lane layout
Charging interruptions: how often robots are pulled from tasks to charge

Resilience & Robustness

Deadlock rate: how often the system halts and requires intervention
Replanning latency: how quickly the system recovers when a robot fails or a path is blocked
System availability: uptime fraction. Often the contractual metric with the customer
Graceful degradation: does losing one robot reduce throughput by 1/n, or cause a cascade?

These clusters pull against each other. Pushing utilisation toward capacity is efficient on paper, but queueing theory shows wait times blow up non-linearly as you approach the limit. Throughput collapses before the robots run out of work. Minimising fleet size raises utilisation, which raises wait times, which kills throughput. Optimising energy may route robots in ways that increase conflict density. There is no single objective that dominates. Every deployment makes a choice about what to sacrifice, and that choice shapes every algorithm decision downstream.

Open Source Ecosystem

What You Can Actually Use

A survey of the open source stack. For each tool: what it is genuinely good at, where it runs out of road, and what you will have to build yourself.

This is not an exhaustive list. If a tool you use or know about is missing, I'd like to hear about it.

OpenRMF

Open Robotics Middleware Framework: fleet management layer

Free ROS 2

The most complete open source fleet management system available. OpenRMF handles multi-robot task allocation, traffic management, lift and door integration, and provides a REST and WebSocket API for integration with warehouse management systems. Built on ROS 2 (Robot Operating System 2). It models environments as annotated graphs (the "navigation graph") and handles reservation-based conflict avoidance. Not plug-and-play, expect a significant configuration and integration effort, but the architecture is sound and the community is active.

Best for Multi-vendor fleets Fleet coordination VDA 5050 integration

ROS 2 / Nav2

Robot Operating System 2 + Navigation stack

Free C++ / Python

Nav2 is the navigation stack built on ROS 2. It handles everything a single Autonomous Mobile Robot (AMR) needs to move autonomously: map loading, localisation, global path planning (A*, Dijkstra, Smac Planner), local obstacle avoidance (DWB, MPPI), and recovery behaviours. Nav2 is the de facto standard for AMR navigation. Multi-robot coordination requires OpenRMF on top. The Python API (nav2_simple_commander) makes it accessible for prototyping. Production deployments typically use C++ nodes.

Best for Single-robot navigation AMR prototyping Localisation (SLAM)

OR-Tools

Google Operations Research tools

Free Python

Google's optimisation library, and the best open source option for task assignment, routing, and scheduling problems in AGV systems. The linear assignment solver handles the Hungarian algorithm. The Vehicle Routing Problem (VRP) solver handles multi-robot, multi-task routing with time windows and capacity constraints. The CP-SAT constraint programming solver handles complex scheduling with arbitrary constraints. All three have first-class Python APIs. OR-Tools is where you turn when you have a well-defined optimisation problem and need an exact or near-exact solution at offline planning time.

Best for Task assignment Route optimisation Offline scheduling

SimPy

Discrete-event simulation framework

Free Python

SimPy is a Python discrete-event simulation framework and the right tool for answering "what would happen if..." before committing to a deployment. Model your AGV system as a SimPy environment: robots as processes, chargers as resources, tasks as events. Run thousands of simulations with varying parameters and measure throughput, wait times, and battery behaviour. SimPy simulates rather than optimises, but simulation is often the more honest answer to fleet sizing, strategy comparison, and bottleneck identification. If you are not simulating before deploying, you are guessing.

Best for Fleet sizing Strategy comparison Bottleneck analysis

libMultiRobotPlanning

MAPF algorithms: CBS, ECBS, SIPP

Free C++

A clean C++ implementation of the core Multi-Agent Path Finding (MAPF) algorithms: Conflict-Based Search (CBS), Enhanced CBS (ECBS), and Safe Interval Path Planning (SIPP). Primarily a research library, well-structured, well-documented, and a good starting point for understanding how these algorithms behave on real maps. Not production-hardened, but invaluable for experimenting with MAPF and benchmarking your own implementations against published baselines.

Best for MAPF research Algorithm benchmarking CBS prototyping

Gazebo / Isaac Sim

Physics-based robot simulation

Free / Commercial ROS 2

Gazebo (open source, ROS 2 native) and NVIDIA Isaac Sim (commercial, GPU-accelerated) provide physics-based simulation of robots and environments. Unlike SimPy, these simulators model the physical robot, sensors, actuators, collision dynamics. Use them to test navigation stacks before putting a real robot in the warehouse. Isaac Sim adds photo-realistic rendering and domain randomisation for training perception models. Both integrate with Nav2 and OpenRMF. Gazebo is the default starting point; Isaac Sim is worth the complexity if you are training deep learning models on synthetic data.

Best for Navigation testing Sensor simulation Pre-deployment validation

What does not exist yet

Unlike scheduling problems, timetabling, or vehicle routing, there is no dedicated open source library that addresses all aspects of AGV coordination: task assignment, routing, charging, and the decisions that connect them, in a way that lets you design and tune the system as you see fit. That layer does not exist as a reusable tool. If you know of one, I'd like to hear about it. It seems every serious deployment builds it from scratch, using bespoke algorithms or generic optimisation solvers like Gurobi or OR-Tools.

Where AGVs Work

Real-World Deployments

AGV systems operate at scale across very different industries. The algorithms are similar. The dominant constraint in each domain is not. Knowing which problem type your deployment most resembles tells you where to invest your effort.

📦

E-commerce Fulfilment

High-density, high-throughput, constantly changing inventory locations. The canonical AMR use case, pioneered by Kiva Systems (now Amazon Robotics). Robots carry entire shelving pods to a stationary human picker, flipping the traditional model of the human walking to the goods. The dominant challenge is coordinating hundreds of robots in a shared space where every route intersects.

🏭

Automotive Manufacturing

Heavy-payload AGVs moving partially assembled vehicles between production stations. Routes are fixed. Sequences are not. The dominant challenge is synchronisation: a robot that arrives late at a station holds up the entire line. Failure handling is the other critical concern, because there is no slack in a just-in-time production schedule to absorb an AGV that stops mid-floor.

🏥

Hospitals

Medicine, linen, meals, and sterile supplies, all moving through the same corridors as patients and staff. Key constraints: priority overrides for emergency supplies, human-robot interaction in uncontrolled environments, and elevator scheduling across floors.

✈️

Airports and Baggage

Automated baggage handling between check-in, sorting, and aircraft loading. Schiphol, Frankfurt, and Changi operate some of the largest and oldest AGV installations in continuous service. The defining constraint is the hard time window: the plane leaves whether or not the baggage system kept up. Outdoor apron environments add weather variability that indoor deployments never face.

🚢

Container Ports

AGVs move shipping containers between ship-to-shore cranes and the container yard. The Port of Rotterdam's Maasvlakte II terminal has operated over 300 AGVs continuously since 2014, one of the largest fleets in the world. The defining challenge is sequencing: hundreds of vehicles sharing a constrained quay, where a single deadlock can delay a vessel and cascade across the port schedule.

💊

Pharmaceutical Warehousing

Temperature-controlled storage, high-value inventory, and strict batch traceability requirements set by regulators. Throughput demands are modest, but error tolerance is effectively zero. Every move must be logged against a batch record. AGVs must integrate with Enterprise Resource Planning (ERP) systems and the quality management layer. The integration and audit trail challenge is harder than anything on the routing side.

The Hard Parts

What Deployment in Production Teaches You

Academic models assume robots do what they're told, maps don't change, and networks don't drop packets. None of that is true. These challenges rarely make it into textbooks, but they are what actually determine whether a deployment succeeds or fails.

Localisation drift and map staleness Robustness

A robot with a wrong estimate of its own position is more dangerous than a stopped robot. Simultaneous Localisation and Mapping (SLAM) accumulates drift over time: the robot's estimate of its position diverges from reality, slowly and continuously. In a warehouse where shelves are moved seasonally or racks are shifted during the night shift, the stored map can be wrong the moment you load it. You need a strategy for map maintenance (re-mapping) and localisation correction (fiducial markers, infrastructure-based anchors) before you need a fancy path planner. The best algorithm in the world cannot navigate a robot to the right location if the robot does not know where it is.

What is a fiducial marker? A printed pattern (typically an ArUco marker) mounted at known positions on walls or shelves. When the robot's camera recognises one, it knows exactly where that landmark sits in the world and uses it to reset its accumulated drift. Infrastructure-based anchors do the same thing with radio (Ultra-Wideband) or LiDAR reflectors instead of cameras. Think of them as indoor GPS checkpoints: the denser the coverage, the tighter the localisation error stays.

Failure modes and recovery Reliability

Robots fail. The question is not whether but when, and what happens when they do. A robot stopped mid-aisle blocks every robot behind it. A robot that fails mid-load cannot be safely moved without human intervention. A robot that loses network connection mid-task must either continue autonomously (risky) or stop and wait (blocks everything). Most fleet management systems have inadequate answers to these scenarios because failure modes are hard to test systematically. Design for failure first. The most important architectural question is not "how do we assign tasks?" but "what happens when the system is in a degraded state and we need to keep moving?"

The re-optimisation trigger problem Planning

When the world changes (a robot fails, a task is cancelled, a new high-priority task arrives) you face a choice: replan from scratch, or patch the current plan. Replanning from scratch gives you the best plan for the current state, but robots in motion have to be halted or rerouted mid-journey, which has a cost. Patching preserves ongoing commitments but leaves you with a suboptimal plan that accumulates drift over a shift. Too frequent replanning causes thrashing: robots constantly change course and make little forward progress. Too infrequent means you're executing a plan built for a world that no longer exists. The right replanning frequency is a function of task volatility, and it is almost never the same between deployments.

Warehouse Management System integration Integration

In practice, the dominant pain in an AGV deployment is often not routing or assignment. It is integration with the Warehouse Management System (WMS). The WMS owns the task stream. It decides what needs to move, from where, to where, and by when. The AGV fleet management system needs to consume that task stream in real time, report completion, handle exceptions, and reconcile inventory state. Every WMS speaks a different dialect. Every integration is custom. The VDA 5050 standard helps for the robot communication layer, but the WMS interface remains bespoke in almost every deployment. Budget at least as much time for integration as for the robotics.

Human-robot interaction in shared spaces Safety

In the lab, robots operate alone. In the warehouse, they share aisles with forklifts, pickers on foot, and visiting maintenance crews who might not have read the safety briefing. Robots must detect, classify, and respond to humans, stopping, waiting, taking an alternative path. This is harder than it sounds. A robot that stops conservatively for every uncertain sensor reading will have terrible throughput. A robot that is too aggressive is a safety liability. The right policy depends on the space: a pedestrian-only aisle has different requirements than an aisle shared with counterbalance forklifts. Regulatory requirements (ISO 3691-4 for driverless industrial trucks) constrain the design space further. Get a safety engineer involved before you have a working robot, not after.

KPI definition and measurement Operations

Every AGV project reports "robot utilisation" as a key metric. It is almost always the wrong metric. A robot that is 90% "utilised" because it is waiting at a charger is not contributing to throughput. The metric you actually care about is task throughput per unit time, picks per hour, deliveries per shift, pallets moved. Secondary metrics: task wait time (how long between task creation and robot assignment), travel efficiency (distance travelled with load versus without), and system availability (fraction of time the fleet is operational). Define these metrics before deployment. The instrumentation required to measure them is non-trivial, and retrofitting observability to a live system is painful.

The World of
Autonomous Guided Vehicles

Navigation is Solved. Coordination is Not.

Get this robot from A to B

Move everything, efficiently, without gridlock

Not All Robots Are Equal

Autonomous Guided Vehicle (AGV): The original

Autonomous Mobile Robot (AMR): The modern default

Tugger / Train: High-throughput lanes

Automated Storage and Retrieval System (ASRS): Rack-bound

Decisions Across Four Time Horizons

Three Things Every AGV System Has

The Fleet

The Environment

The Mission Stream

Six Problems Inside Every AGV System

Seven Lenses, One System

Strengths

Limitations

The Objective Is a Choice

What You Can Actually Use

Real-World Deployments

E-commerce Fulfilment

Automotive Manufacturing

Hospitals

Airports and Baggage

Container Ports

Pharmaceutical Warehousing

What Deployment in Production Teaches You

The World ofAutonomous Guided Vehicles

Navigation is Solved. Coordination is Not.

Get this robot from A to B

Move everything, efficiently, without gridlock

Not All Robots Are Equal

Autonomous Guided Vehicle (AGV): The original

Autonomous Mobile Robot (AMR): The modern default

Tugger / Train: High-throughput lanes

Automated Storage and Retrieval System (ASRS): Rack-bound

Decisions Across Four Time Horizons

Three Things Every AGV System Has

The Fleet

The Environment

The Mission Stream

Six Problems Inside Every AGV System

Seven Lenses, One System

Strengths

Limitations

The Objective Is a Choice

What You Can Actually Use

Real-World Deployments

E-commerce Fulfilment

Automotive Manufacturing

Hospitals

Airports and Baggage

Container Ports

Pharmaceutical Warehousing

What Deployment in Production Teaches You

The World of
Autonomous Guided Vehicles