🤖 Robotics · Operations Research · Systems Engineering

The World of
Autonomous Guided Vehicles

From warehouse floors to airport tarmacs: how mathematics moves fleets of robots through constrained, contested space without stopping, colliding, or deadlocking.

Scroll to explore

Navigation is Solved. Coordination is Not.

Before any algorithm, you need to understand what AGV systems are actually up against.

The Navigation Problem

Get this robot from A to B

One robot. One destination. An empty map. Dijkstra's algorithm finds the shortest path in milliseconds. This problem is solved. It has been solved since 1959. Every undergraduate computer science student can implement it in an afternoon. Every robotics framework ships it out of the box.

The Coordination Problem

Move everything, efficiently, without gridlock

Fifty robots. Hundreds of tasks. Shared corridors. Shared chargers. Battery levels that decline, tasks that arrive unpredictably, and robots that occasionally fail. Now the paths interact, the assignments compete, and the system can deadlock. This problem is not solved.

Every vendor will show you a robot smoothly avoiding an obstacle. That's navigation, and it's the relatively easy part. What you don't hear often is fifty robots working effectively while competing for three charging stations, while tasks keep arriving. That's coordination. This series is about coordination.

Not All Robots Are Equal

The term "AGV" is often used loosely. The distinctions matter, because the vehicle type determines the constraints, and the constraints determine which algorithms apply.

🏭
AGV
Fixed-path
🤖
AMR
Free-ranging
🚂
Tugger / Train
High-volume lanes
📦
ASRS
Rack-bound

Autonomous Guided Vehicle (AGV): The original

S1 S2 S3 S4 OCCUPIED WAIT no tape elsewhere to follow FIXED TAPE NETWORK waits when intersection occupied · no autonomous rerouting
NavigationMagnetic tape, wire, or laser reflectors
Path flexibilityFixed routes, predefined at installation
Speed0.5 – 2.5 m/s typical
PayloadUp to tens of tonnes
Deployment era1950s to present
Map changesRequire physical modification

The classic AGV follows fixed infrastructure: wire embedded in the floor, magnetic tape, or laser reflectors mounted on walls. The routing graph is fixed at installation. This is a constraint, but also a guarantee: path conflicts reduce to traffic management on a known topology, which is far easier to reason about than free-ranging motion. Classic AGVs dominate heavy-industry settings (automotive plants, paper mills, hospitals) where routes are stable and payloads are large. The planning problem is well-defined because the map is fixed. What varies is the schedule and the traffic.

Decisions Across Four Time Horizons

AGV systems make decisions across four time horizons. The layers are coupled, not independent. A decision made at one layer creates constraints that the layers below must absorb. Getting the diagnosis right means knowing which layer the root cause lives in, not just where the symptom appears. Higher layers can optimise more broadly but must commit before conditions are fully known. Lower layers react to current state but with limited reach.

Layer Typical decisions Information inputs Failure mode
Strategic
Months – years
How many robots, chargers, lanes? Where to place infrastructure? Business forecasts, sales history, SKU growth projections
Noisy: assumption-heavy, high latency
Under-provisioning. The system hits a ceiling no algorithm can optimise around.
Tactical
Days – weeks
Zone layout, traffic rule design, charger placement, shift structure Historical flow rates, seasonal patterns, shift schedules
Known shape, uncertain timing
Congestion. Rules designed for 10 robots fail when 50 are deployed.
Operational
Minutes – hours
Assign task to robot, dispatch order, charge scheduling Real-time Warehouse Management System (WMS) order pool
Orders are real, deadlines are fixed
Sub-optimisation. Chasing one hot order while starving the rest of the fleet.
Real-time
Seconds
Conflict-free paths, deadlock avoidance, local replanning LiDAR, encoders, Inertial Measurement Units (IMUs), Vehicle-to-Vehicle (V2V) comms
Sensor ground truth: current state only
Local optima. Avoiding a collision now can cause a deadlock 10 seconds later.
Recovery
On-event
Robot failure, blocked path, missed deadline, unexpected obstacle Fault signals, sensor anomalies, timeout alerts
High certainty: something has already gone wrong
Escalation failure. Automated recovery loops without resolution, compounding the original fault.

The layers interact through a simple but consequential structure: upstream layers define constraints, downstream layers absorb the consequences. A strategic miscalculation, say too few chargers, becomes a constraint the operational layer cannot optimise around. Robots queue for charging slots. The operational layer degrades, and that pressure propagates into real-time performance: longer waits generate downstream path conflicts. The failure surfaces as a routing problem. The root cause is a capacity planning decision made months earlier. Getting the architecture right at each layer is what separates a working deployment from a permanent incident. The rest of this series addresses each layer in turn.

Three Things Every AGV System Has

Every AGV deployment, from a single robot in a lab to a thousand-vehicle warehouse, is defined by the same three components. Understanding them precisely is the prerequisite for choosing the right solutions.

🚗

The Fleet

  • Number of vehicles and their types
  • Battery capacity and charge rate
  • Maximum speed and turning radius
  • Payload capacity and physical dimensions
  • Current position, orientation, and operational state
  • Sensor suite and localisation accuracy
🗺️

The Environment

  • Map topology: graph, grid, or continuous
  • Static obstacles: walls, racks, pillars
  • Dynamic obstacles: humans, forklift traffic
  • Charging station locations and count
  • Speed zones and one-way corridors
  • Communication dead zones
📋

The Mission Stream

  • Task type: pickup, delivery, inspection
  • Origin and destination locations
  • Priority and time windows
  • Arrival pattern: batch, online, or mixed
  • Order cancellation and modification rate
  • Dependencies between tasks

Six Problems Inside Every AGV System

Every AGV deployment is solving at least six distinct problems simultaneously. Each has its own difficulty structure, its own solutions, and its own failure modes.

🗺️
Path Planning
Single-robot · Static map
S G wall

One robot, one destination, a static map: Dijkstra's algorithm finds the optimal path in polynomial time. This problem is fast to solve and well-understood.

The challenge is not finding a path, but executing it while the world changes. Other robots move, humans cross corridors, loads shift. A path computed at t=0 may be invalid by the time execution begins.

Most real systems combine offline shortest-path planning on a static topology with online local replanning. The offline pass gives a route; the online layer handles everything that moved since.

This is still a single-robot problem. The difficulty begins when paths interact.

In production: the topology graph is often built manually. Getting it right (one-way corridors, turn costs, weight limits by route) takes a lot of time before a robot moves. And when paths fail in the field, the algorithm is rarely the cause. Most failures trace back to localisation: the robot knows where it wants to go, but not where it is.
🔀
Multi-Agent Path Finding
All robots · Conflict-free
CONFLICT B goes first · A proceeds after A B A WAIT B

Unlike single-robot planning, the problem is to assign a path to every robot such that no two robots occupy the same space at the same time.

This problem scales poorly with robot count and conflict density. In worst-case formulations, exact solutions become intractable as fleet size grows. In practice, solvers perform well for small fleets and low-conflict environments. As scale increases, one can trade off a small loss in solution quality for speed using a suboptimal approach.

At warehouse scale, optimality is usually the wrong target. Task streams have enough variance that a fast, slightly suboptimal plan outperforms an optimal plan that arrives too late. Tune for latency, not solution quality.

In practice: most production systems never run a full MAPF solver. Priority rules (higher-priority robot has right of way) are simpler, faster, and sufficient for most warehouse densities. MAPF becomes relevant when density is high enough that priority rules cause too many robots to wait.
📌
Task Assignment
Robots × Tasks · Online
R1 R2 R3 T1 T2 T3 d=4 d=3 d=5

n tasks, k robots: minimise total travel cost. The offline batch case can be solved efficiently. In practice, the problem is online: tasks arrive continuously, robots become available unpredictably, and some tasks have time windows. Every assignment commits capacity that future tasks may need, and no exact solution stays optimal once the system starts changing.

Planning two or three steps ahead is possible but brittle. Uncertainty in arrivals, availability, and travel times quickly invalidates any lookahead plan. Most mature systems therefore assign the next task and re-optimise continuously as tasks complete and new ones arrive: not elegant, but robust to the variability of real task streams.

In practice: the hardest part of task assignment is the integration layer: tasks come from a Warehouse Management System (WMS) or Enterprise Resource Planning (ERP) system with its own priorities and cancellations. The assignment logic is rarely the bottleneck.
🚦
Traffic Management
Deadlock · Contention · Flow
DEADLOCK N S E W

Deadlock is the silent killer of AGV systems. It occurs when a set of robots form a cycle of mutual waiting: four robots, each waiting for the one ahead to clear, forming a closed loop. Nobody moves. The system halts.

Deadlock is a resource allocation failure.

Detecting deadlock is easy: find cycles in the wait-for graph. Recovering from it requires intervention: automated recovery or, in some cases, manual robot relocation. Preventing deadlock by design is much harder. It requires routing decisions that prevent cycles from forming.

Most real deployments use zone-based control (one robot per zone at a time), sacrificing throughput for the guarantee of no gridlock. The alternative, deadlock-free routing, requires careful design of the network topology.

The VDA 5050 standard (Verband der Automobilindustrie 5050) is the emerging protocol for AGV communication. It standardises how fleet management software sends routing and zone-reservation commands, a critical piece of real deployments.
Charging Strategy
Energy · Scheduling · Prediction
100% 50% 20% 0h 4h 8h stampede

A robot that dies mid-aisle does not just stop. It blocks every robot behind it. Charging is therefore a constraint, not an afterthought. Threshold-based strategies (charge when battery drops below 20%) are simple and predictable, but produce a charging stampede at shift boundaries when all robots hit the threshold simultaneously. Predictive strategies estimate future workload to schedule charging proactively, but require a model of the task stream that typically takes months of operational data to calibrate. The hybrid approach (threshold with workload-aware priority override) is where most mature systems land.

The non-obvious constraint: charger count is often the true system bottleneck, not robot count. Adding robots without adding chargers creates contention at the charging stations that eliminates the throughput gain.
📊
Fleet Sizing
Capacity · Queuing · Variance
Simulation-based
0 0% 100% utilization → wait time → target actual naive M/M/c queueing

Fleet sizing starts with a simple formula. Take average task duration, multiply by task rate, then divide by target utilisation. This underestimates by 30 to 50% in almost every real deployment. The missing term is variance. Queueing theory, specifically the M/M/c model, shows that as utilisation approaches capacity, wait times grow non-linearly. A system running at 85% utilisation does not have 1.7x the wait time of one running at 50%. It may have 5x or 10x, because variance in arrival and service times creates queues that clear slowly. The honest answer to "how many robots do I need?" is to run a discrete-event simulation with realistic variance before committing.

Rule of thumb: size for 70–75% utilisation target, not 90%. The throughput difference is small. The stability difference is enormous, especially when task arrival variance spikes during peak demand.

Seven Lenses, One System

The model you choose constrains every solution you can reach. Experienced practitioners layer models, using different representations for different parts of the problem. None of these is "the right model." Each makes different trade-offs.

Graph Models
Directed weighted graph · Zone-based · Time-expanded

The graph model is the workhorse of industrial AGV systems. The environment is abstracted as a directed graph: nodes are locations (intersections, pick stations, chargers), edges are traversable segments with travel time and capacity attributes. Path planning is shortest-path search. Traffic management is resource reservation on edges and nodes. The key variant is the time-expanded graph, where each node is replicated for every timestep. A robot's path then becomes a path through this 3D structure (space × time), and conflicts become impossible by construction. Time-expanded graphs make Conflict-Based Search (CBS)-style conflict resolution explicit in the model itself.

Strengths
  • Matches how real facilities are laid out
  • Path planning is fast: Dijkstra on known topology
  • Easy to reason about zones, one-way corridors
  • Well-supported in most fleet management systems like Open Robotics Middleware Framework (OpenRMF)
Limitations
  • Graph must be manually constructed and maintained
  • Doesn't handle free-ranging AMRs naturally
  • Time-expanded variant grows large quickly (space × time)
  • Continuous motion (turning radius) requires approximation

The Objective Is a Choice

Every AGV system is optimising something. The hard part is that the obvious objectives conflict, and which one you sacrifice is a product decision, not an algorithm one.

Throughput & Efficiency
  • Tasks per hour: the primary throughput metric for most warehouses
  • Makespan: time to complete a fixed batch of tasks. Dominant in manufacturing
  • Total travel distance: proxy for time wasted on movement rather than work
  • Utilisation rate: fraction of time each robot is doing productive work
Cost & Resources
  • Fleet size: capital cost. The output of fleet sizing is a lower bound, not a target
  • Energy consumption: battery cost, charger infrastructure, grid load
  • Infrastructure footprint: number of chargers, zone density, lane layout
  • Charging interruptions: how often robots are pulled from tasks to charge
Resilience & Robustness
  • Deadlock rate: how often the system halts and requires intervention
  • Replanning latency: how quickly the system recovers when a robot fails or a path is blocked
  • System availability: uptime fraction. Often the contractual metric with the customer
  • Graceful degradation: does losing one robot reduce throughput by 1/n, or cause a cascade?

These clusters pull against each other. Pushing utilisation toward capacity is efficient on paper, but queueing theory shows wait times blow up non-linearly as you approach the limit. Throughput collapses before the robots run out of work. Minimising fleet size raises utilisation, which raises wait times, which kills throughput. Optimising energy may route robots in ways that increase conflict density. There is no single objective that dominates. Every deployment makes a choice about what to sacrifice, and that choice shapes every algorithm decision downstream.

What You Can Actually Use

A survey of the open source stack. For each tool: what it is genuinely good at, where it runs out of road, and what you will have to build yourself.

This is not an exhaustive list. If a tool you use or know about is missing, I'd like to hear about it.

OpenRMF
Open Robotics Middleware Framework: fleet management layer
Free ROS 2

The most complete open source fleet management system available. OpenRMF handles multi-robot task allocation, traffic management, lift and door integration, and provides a REST and WebSocket API for integration with warehouse management systems. Built on ROS 2 (Robot Operating System 2). It models environments as annotated graphs (the "navigation graph") and handles reservation-based conflict avoidance. Not plug-and-play, expect a significant configuration and integration effort, but the architecture is sound and the community is active.

Best for Multi-vendor fleets Fleet coordination VDA 5050 integration
ROS 2 / Nav2
Robot Operating System 2 + Navigation stack
Free C++ / Python

Nav2 is the navigation stack built on ROS 2. It handles everything a single Autonomous Mobile Robot (AMR) needs to move autonomously: map loading, localisation, global path planning (A*, Dijkstra, Smac Planner), local obstacle avoidance (DWB, MPPI), and recovery behaviours. Nav2 is the de facto standard for AMR navigation. Multi-robot coordination requires OpenRMF on top. The Python API (nav2_simple_commander) makes it accessible for prototyping. Production deployments typically use C++ nodes.

Best for Single-robot navigation AMR prototyping Localisation (SLAM)
OR-Tools
Google Operations Research tools
Free Python

Google's optimisation library, and the best open source option for task assignment, routing, and scheduling problems in AGV systems. The linear assignment solver handles the Hungarian algorithm. The Vehicle Routing Problem (VRP) solver handles multi-robot, multi-task routing with time windows and capacity constraints. The CP-SAT constraint programming solver handles complex scheduling with arbitrary constraints. All three have first-class Python APIs. OR-Tools is where you turn when you have a well-defined optimisation problem and need an exact or near-exact solution at offline planning time.

Best for Task assignment Route optimisation Offline scheduling
SimPy
Discrete-event simulation framework
Free Python

SimPy is a Python discrete-event simulation framework and the right tool for answering "what would happen if..." before committing to a deployment. Model your AGV system as a SimPy environment: robots as processes, chargers as resources, tasks as events. Run thousands of simulations with varying parameters and measure throughput, wait times, and battery behaviour. SimPy simulates rather than optimises, but simulation is often the more honest answer to fleet sizing, strategy comparison, and bottleneck identification. If you are not simulating before deploying, you are guessing.

Best for Fleet sizing Strategy comparison Bottleneck analysis
libMultiRobotPlanning
MAPF algorithms: CBS, ECBS, SIPP
Free C++

A clean C++ implementation of the core Multi-Agent Path Finding (MAPF) algorithms: Conflict-Based Search (CBS), Enhanced CBS (ECBS), and Safe Interval Path Planning (SIPP). Primarily a research library, well-structured, well-documented, and a good starting point for understanding how these algorithms behave on real maps. Not production-hardened, but invaluable for experimenting with MAPF and benchmarking your own implementations against published baselines.

Best for MAPF research Algorithm benchmarking CBS prototyping
Gazebo / Isaac Sim
Physics-based robot simulation
Free / Commercial ROS 2

Gazebo (open source, ROS 2 native) and NVIDIA Isaac Sim (commercial, GPU-accelerated) provide physics-based simulation of robots and environments. Unlike SimPy, these simulators model the physical robot, sensors, actuators, collision dynamics. Use them to test navigation stacks before putting a real robot in the warehouse. Isaac Sim adds photo-realistic rendering and domain randomisation for training perception models. Both integrate with Nav2 and OpenRMF. Gazebo is the default starting point; Isaac Sim is worth the complexity if you are training deep learning models on synthetic data.

Best for Navigation testing Sensor simulation Pre-deployment validation
What does not exist yet

Unlike scheduling problems, timetabling, or vehicle routing, there is no dedicated open source library that addresses all aspects of AGV coordination: task assignment, routing, charging, and the decisions that connect them, in a way that lets you design and tune the system as you see fit. That layer does not exist as a reusable tool. If you know of one, I'd like to hear about it. It seems every serious deployment builds it from scratch, using bespoke algorithms or generic optimisation solvers like Gurobi or OR-Tools.

Real-World Deployments

AGV systems operate at scale across very different industries. The algorithms are similar. The dominant constraint in each domain is not. Knowing which problem type your deployment most resembles tells you where to invest your effort.

📦

E-commerce Fulfilment

High-density, high-throughput, constantly changing inventory locations. The canonical AMR use case, pioneered by Kiva Systems (now Amazon Robotics). Robots carry entire shelving pods to a stationary human picker, flipping the traditional model of the human walking to the goods. The dominant challenge is coordinating hundreds of robots in a shared space where every route intersects.

🏭

Automotive Manufacturing

Heavy-payload AGVs moving partially assembled vehicles between production stations. Routes are fixed. Sequences are not. The dominant challenge is synchronisation: a robot that arrives late at a station holds up the entire line. Failure handling is the other critical concern, because there is no slack in a just-in-time production schedule to absorb an AGV that stops mid-floor.

🏥

Hospitals

Medicine, linen, meals, and sterile supplies, all moving through the same corridors as patients and staff. Key constraints: priority overrides for emergency supplies, human-robot interaction in uncontrolled environments, and elevator scheduling across floors.

✈️

Airports and Baggage

Automated baggage handling between check-in, sorting, and aircraft loading. Schiphol, Frankfurt, and Changi operate some of the largest and oldest AGV installations in continuous service. The defining constraint is the hard time window: the plane leaves whether or not the baggage system kept up. Outdoor apron environments add weather variability that indoor deployments never face.

🚢

Container Ports

AGVs move shipping containers between ship-to-shore cranes and the container yard. The Port of Rotterdam's Maasvlakte II terminal has operated over 300 AGVs continuously since 2014, one of the largest fleets in the world. The defining challenge is sequencing: hundreds of vehicles sharing a constrained quay, where a single deadlock can delay a vessel and cascade across the port schedule.

💊

Pharmaceutical Warehousing

Temperature-controlled storage, high-value inventory, and strict batch traceability requirements set by regulators. Throughput demands are modest, but error tolerance is effectively zero. Every move must be logged against a batch record. AGVs must integrate with Enterprise Resource Planning (ERP) systems and the quality management layer. The integration and audit trail challenge is harder than anything on the routing side.

What Deployment in Production Teaches You

Academic models assume robots do what they're told, maps don't change, and networks don't drop packets. None of that is true. These challenges rarely make it into textbooks, but they are what actually determine whether a deployment succeeds or fails.

Localisation drift and map staleness Robustness

A robot with a wrong estimate of its own position is more dangerous than a stopped robot. Simultaneous Localisation and Mapping (SLAM) accumulates drift over time: the robot's estimate of its position diverges from reality, slowly and continuously. In a warehouse where shelves are moved seasonally or racks are shifted during the night shift, the stored map can be wrong the moment you load it. You need a strategy for map maintenance (re-mapping) and localisation correction (fiducial markers, infrastructure-based anchors) before you need a fancy path planner. The best algorithm in the world cannot navigate a robot to the right location if the robot does not know where it is.

What is a fiducial marker? A printed pattern (typically an ArUco marker) mounted at known positions on walls or shelves. When the robot's camera recognises one, it knows exactly where that landmark sits in the world and uses it to reset its accumulated drift. Infrastructure-based anchors do the same thing with radio (Ultra-Wideband) or LiDAR reflectors instead of cameras. Think of them as indoor GPS checkpoints: the denser the coverage, the tighter the localisation error stays.
Failure modes and recovery Reliability

Robots fail. The question is not whether but when, and what happens when they do. A robot stopped mid-aisle blocks every robot behind it. A robot that fails mid-load cannot be safely moved without human intervention. A robot that loses network connection mid-task must either continue autonomously (risky) or stop and wait (blocks everything). Most fleet management systems have inadequate answers to these scenarios because failure modes are hard to test systematically. Design for failure first. The most important architectural question is not "how do we assign tasks?" but "what happens when the system is in a degraded state and we need to keep moving?"

The re-optimisation trigger problem Planning

When the world changes (a robot fails, a task is cancelled, a new high-priority task arrives) you face a choice: replan from scratch, or patch the current plan. Replanning from scratch gives you the best plan for the current state, but robots in motion have to be halted or rerouted mid-journey, which has a cost. Patching preserves ongoing commitments but leaves you with a suboptimal plan that accumulates drift over a shift. Too frequent replanning causes thrashing: robots constantly change course and make little forward progress. Too infrequent means you're executing a plan built for a world that no longer exists. The right replanning frequency is a function of task volatility, and it is almost never the same between deployments.

Warehouse Management System integration Integration

In practice, the dominant pain in an AGV deployment is often not routing or assignment. It is integration with the Warehouse Management System (WMS). The WMS owns the task stream. It decides what needs to move, from where, to where, and by when. The AGV fleet management system needs to consume that task stream in real time, report completion, handle exceptions, and reconcile inventory state. Every WMS speaks a different dialect. Every integration is custom. The VDA 5050 standard helps for the robot communication layer, but the WMS interface remains bespoke in almost every deployment. Budget at least as much time for integration as for the robotics.

Human-robot interaction in shared spaces Safety

In the lab, robots operate alone. In the warehouse, they share aisles with forklifts, pickers on foot, and visiting maintenance crews who might not have read the safety briefing. Robots must detect, classify, and respond to humans, stopping, waiting, taking an alternative path. This is harder than it sounds. A robot that stops conservatively for every uncertain sensor reading will have terrible throughput. A robot that is too aggressive is a safety liability. The right policy depends on the space: a pedestrian-only aisle has different requirements than an aisle shared with counterbalance forklifts. Regulatory requirements (ISO 3691-4 for driverless industrial trucks) constrain the design space further. Get a safety engineer involved before you have a working robot, not after.

KPI definition and measurement Operations

Every AGV project reports "robot utilisation" as a key metric. It is almost always the wrong metric. A robot that is 90% "utilised" because it is waiting at a charger is not contributing to throughput. The metric you actually care about is task throughput per unit time, picks per hour, deliveries per shift, pallets moved. Secondary metrics: task wait time (how long between task creation and robot assignment), travel efficiency (distance travelled with load versus without), and system availability (fraction of time the fleet is operational). Define these metrics before deployment. The instrumentation required to measure them is non-trivial, and retrofitting observability to a live system is painful.