From warehouse floors to airport tarmacs: how mathematics moves fleets of robots through constrained, contested space without stopping, colliding, or deadlocking.
Before any algorithm, you need to understand what AGV systems are actually up against.
One robot. One destination. An empty map. Dijkstra's algorithm finds the shortest path in milliseconds. This problem is solved. It has been solved since 1959. Every undergraduate computer science student can implement it in an afternoon. Every robotics framework ships it out of the box.
Fifty robots. Hundreds of tasks. Shared corridors. Shared chargers. Battery levels that decline, tasks that arrive unpredictably, and robots that occasionally fail. Now the paths interact, the assignments compete, and the system can deadlock. This problem is not solved.
The term "AGV" is often used loosely. The distinctions matter, because the vehicle type determines the constraints, and the constraints determine which algorithms apply.
The classic AGV follows fixed infrastructure: wire embedded in the floor, magnetic tape, or laser reflectors mounted on walls. The routing graph is fixed at installation. This is a constraint, but also a guarantee: path conflicts reduce to traffic management on a known topology, which is far easier to reason about than free-ranging motion. Classic AGVs dominate heavy-industry settings (automotive plants, paper mills, hospitals) where routes are stable and payloads are large. The planning problem is well-defined because the map is fixed. What varies is the schedule and the traffic.
AGV systems make decisions across four time horizons. The layers are coupled, not independent. A decision made at one layer creates constraints that the layers below must absorb. Getting the diagnosis right means knowing which layer the root cause lives in, not just where the symptom appears. Higher layers can optimise more broadly but must commit before conditions are fully known. Lower layers react to current state but with limited reach.
| Layer | Typical decisions | Information inputs | Failure mode |
|---|---|---|---|
| Strategic Months – years |
How many robots, chargers, lanes? Where to place infrastructure? | Business forecasts, sales history, SKU growth projections Noisy: assumption-heavy, high latency |
Under-provisioning. The system hits a ceiling no algorithm can optimise around. |
| Tactical Days – weeks |
Zone layout, traffic rule design, charger placement, shift structure | Historical flow rates, seasonal patterns, shift schedules Known shape, uncertain timing |
Congestion. Rules designed for 10 robots fail when 50 are deployed. |
| Operational Minutes – hours |
Assign task to robot, dispatch order, charge scheduling | Real-time Warehouse Management System (WMS) order pool Orders are real, deadlines are fixed |
Sub-optimisation. Chasing one hot order while starving the rest of the fleet. |
| Real-time Seconds |
Conflict-free paths, deadlock avoidance, local replanning | LiDAR, encoders, Inertial Measurement Units (IMUs), Vehicle-to-Vehicle (V2V) comms Sensor ground truth: current state only |
Local optima. Avoiding a collision now can cause a deadlock 10 seconds later. |
| Recovery On-event |
Robot failure, blocked path, missed deadline, unexpected obstacle | Fault signals, sensor anomalies, timeout alerts High certainty: something has already gone wrong |
Escalation failure. Automated recovery loops without resolution, compounding the original fault. |
The layers interact through a simple but consequential structure: upstream layers define constraints, downstream layers absorb the consequences. A strategic miscalculation, say too few chargers, becomes a constraint the operational layer cannot optimise around. Robots queue for charging slots. The operational layer degrades, and that pressure propagates into real-time performance: longer waits generate downstream path conflicts. The failure surfaces as a routing problem. The root cause is a capacity planning decision made months earlier. Getting the architecture right at each layer is what separates a working deployment from a permanent incident. The rest of this series addresses each layer in turn.
Every AGV deployment, from a single robot in a lab to a thousand-vehicle warehouse, is defined by the same three components. Understanding them precisely is the prerequisite for choosing the right solutions.
Every AGV deployment is solving at least six distinct problems simultaneously. Each has its own difficulty structure, its own solutions, and its own failure modes.
One robot, one destination, a static map: Dijkstra's algorithm finds the optimal path in polynomial time. This problem is fast to solve and well-understood.
The challenge is not finding a path, but executing it while the world changes. Other robots move, humans cross corridors, loads shift. A path computed at t=0 may be invalid by the time execution begins.
Most real systems combine offline shortest-path planning on a static topology with online local replanning. The offline pass gives a route; the online layer handles everything that moved since.
This is still a single-robot problem. The difficulty begins when paths interact.
Unlike single-robot planning, the problem is to assign a path to every robot such that no two robots occupy the same space at the same time.
This problem scales poorly with robot count and conflict density. In worst-case formulations, exact solutions become intractable as fleet size grows. In practice, solvers perform well for small fleets and low-conflict environments. As scale increases, one can trade off a small loss in solution quality for speed using a suboptimal approach.
At warehouse scale, optimality is usually the wrong target. Task streams have enough variance that a fast, slightly suboptimal plan outperforms an optimal plan that arrives too late. Tune for latency, not solution quality.
n tasks, k robots: minimise total travel cost. The offline batch case can be solved efficiently. In practice, the problem is online: tasks arrive continuously, robots become available unpredictably, and some tasks have time windows. Every assignment commits capacity that future tasks may need, and no exact solution stays optimal once the system starts changing.
Planning two or three steps ahead is possible but brittle. Uncertainty in arrivals, availability, and travel times quickly invalidates any lookahead plan. Most mature systems therefore assign the next task and re-optimise continuously as tasks complete and new ones arrive: not elegant, but robust to the variability of real task streams.
Deadlock is the silent killer of AGV systems. It occurs when a set of robots form a cycle of mutual waiting: four robots, each waiting for the one ahead to clear, forming a closed loop. Nobody moves. The system halts.
Deadlock is a resource allocation failure.
Detecting deadlock is easy: find cycles in the wait-for graph. Recovering from it requires intervention: automated recovery or, in some cases, manual robot relocation. Preventing deadlock by design is much harder. It requires routing decisions that prevent cycles from forming.
Most real deployments use zone-based control (one robot per zone at a time), sacrificing throughput for the guarantee of no gridlock. The alternative, deadlock-free routing, requires careful design of the network topology.
A robot that dies mid-aisle does not just stop. It blocks every robot behind it. Charging is therefore a constraint, not an afterthought. Threshold-based strategies (charge when battery drops below 20%) are simple and predictable, but produce a charging stampede at shift boundaries when all robots hit the threshold simultaneously. Predictive strategies estimate future workload to schedule charging proactively, but require a model of the task stream that typically takes months of operational data to calibrate. The hybrid approach (threshold with workload-aware priority override) is where most mature systems land.
Fleet sizing starts with a simple formula. Take average task duration, multiply by task rate, then divide by target utilisation. This underestimates by 30 to 50% in almost every real deployment. The missing term is variance. Queueing theory, specifically the M/M/c model, shows that as utilisation approaches capacity, wait times grow non-linearly. A system running at 85% utilisation does not have 1.7x the wait time of one running at 50%. It may have 5x or 10x, because variance in arrival and service times creates queues that clear slowly. The honest answer to "how many robots do I need?" is to run a discrete-event simulation with realistic variance before committing.
The model you choose constrains every solution you can reach. Experienced practitioners layer models, using different representations for different parts of the problem. None of these is "the right model." Each makes different trade-offs.
The graph model is the workhorse of industrial AGV systems. The environment is abstracted as a directed graph: nodes are locations (intersections, pick stations, chargers), edges are traversable segments with travel time and capacity attributes. Path planning is shortest-path search. Traffic management is resource reservation on edges and nodes. The key variant is the time-expanded graph, where each node is replicated for every timestep. A robot's path then becomes a path through this 3D structure (space × time), and conflicts become impossible by construction. Time-expanded graphs make Conflict-Based Search (CBS)-style conflict resolution explicit in the model itself.
Every AGV system is optimising something. The hard part is that the obvious objectives conflict, and which one you sacrifice is a product decision, not an algorithm one.
These clusters pull against each other. Pushing utilisation toward capacity is efficient on paper, but queueing theory shows wait times blow up non-linearly as you approach the limit. Throughput collapses before the robots run out of work. Minimising fleet size raises utilisation, which raises wait times, which kills throughput. Optimising energy may route robots in ways that increase conflict density. There is no single objective that dominates. Every deployment makes a choice about what to sacrifice, and that choice shapes every algorithm decision downstream.
A survey of the open source stack. For each tool: what it is genuinely good at, where it runs out of road, and what you will have to build yourself.
This is not an exhaustive list. If a tool you use or know about is missing, I'd like to hear about it.
The most complete open source fleet management system available. OpenRMF handles multi-robot task allocation, traffic management, lift and door integration, and provides a REST and WebSocket API for integration with warehouse management systems. Built on ROS 2 (Robot Operating System 2). It models environments as annotated graphs (the "navigation graph") and handles reservation-based conflict avoidance. Not plug-and-play, expect a significant configuration and integration effort, but the architecture is sound and the community is active.
Nav2 is the navigation stack built on ROS 2. It handles everything a single Autonomous Mobile Robot (AMR) needs to move autonomously: map loading, localisation, global path planning (A*, Dijkstra, Smac Planner), local obstacle avoidance (DWB, MPPI), and recovery behaviours. Nav2 is the de facto standard for AMR navigation. Multi-robot coordination requires OpenRMF on top. The Python API (nav2_simple_commander) makes it accessible for prototyping. Production deployments typically use C++ nodes.
Google's optimisation library, and the best open source option for task assignment, routing, and scheduling problems in AGV systems. The linear assignment solver handles the Hungarian algorithm. The Vehicle Routing Problem (VRP) solver handles multi-robot, multi-task routing with time windows and capacity constraints. The CP-SAT constraint programming solver handles complex scheduling with arbitrary constraints. All three have first-class Python APIs. OR-Tools is where you turn when you have a well-defined optimisation problem and need an exact or near-exact solution at offline planning time.
SimPy is a Python discrete-event simulation framework and the right tool for answering "what would happen if..." before committing to a deployment. Model your AGV system as a SimPy environment: robots as processes, chargers as resources, tasks as events. Run thousands of simulations with varying parameters and measure throughput, wait times, and battery behaviour. SimPy simulates rather than optimises, but simulation is often the more honest answer to fleet sizing, strategy comparison, and bottleneck identification. If you are not simulating before deploying, you are guessing.
A clean C++ implementation of the core Multi-Agent Path Finding (MAPF) algorithms: Conflict-Based Search (CBS), Enhanced CBS (ECBS), and Safe Interval Path Planning (SIPP). Primarily a research library, well-structured, well-documented, and a good starting point for understanding how these algorithms behave on real maps. Not production-hardened, but invaluable for experimenting with MAPF and benchmarking your own implementations against published baselines.
Gazebo (open source, ROS 2 native) and NVIDIA Isaac Sim (commercial, GPU-accelerated) provide physics-based simulation of robots and environments. Unlike SimPy, these simulators model the physical robot, sensors, actuators, collision dynamics. Use them to test navigation stacks before putting a real robot in the warehouse. Isaac Sim adds photo-realistic rendering and domain randomisation for training perception models. Both integrate with Nav2 and OpenRMF. Gazebo is the default starting point; Isaac Sim is worth the complexity if you are training deep learning models on synthetic data.
Unlike scheduling problems, timetabling, or vehicle routing, there is no dedicated open source library that addresses all aspects of AGV coordination: task assignment, routing, charging, and the decisions that connect them, in a way that lets you design and tune the system as you see fit. That layer does not exist as a reusable tool. If you know of one, I'd like to hear about it. It seems every serious deployment builds it from scratch, using bespoke algorithms or generic optimisation solvers like Gurobi or OR-Tools.
AGV systems operate at scale across very different industries. The algorithms are similar. The dominant constraint in each domain is not. Knowing which problem type your deployment most resembles tells you where to invest your effort.
High-density, high-throughput, constantly changing inventory locations. The canonical AMR use case, pioneered by Kiva Systems (now Amazon Robotics). Robots carry entire shelving pods to a stationary human picker, flipping the traditional model of the human walking to the goods. The dominant challenge is coordinating hundreds of robots in a shared space where every route intersects.
Heavy-payload AGVs moving partially assembled vehicles between production stations. Routes are fixed. Sequences are not. The dominant challenge is synchronisation: a robot that arrives late at a station holds up the entire line. Failure handling is the other critical concern, because there is no slack in a just-in-time production schedule to absorb an AGV that stops mid-floor.
Medicine, linen, meals, and sterile supplies, all moving through the same corridors as patients and staff. Key constraints: priority overrides for emergency supplies, human-robot interaction in uncontrolled environments, and elevator scheduling across floors.
Automated baggage handling between check-in, sorting, and aircraft loading. Schiphol, Frankfurt, and Changi operate some of the largest and oldest AGV installations in continuous service. The defining constraint is the hard time window: the plane leaves whether or not the baggage system kept up. Outdoor apron environments add weather variability that indoor deployments never face.
AGVs move shipping containers between ship-to-shore cranes and the container yard. The Port of Rotterdam's Maasvlakte II terminal has operated over 300 AGVs continuously since 2014, one of the largest fleets in the world. The defining challenge is sequencing: hundreds of vehicles sharing a constrained quay, where a single deadlock can delay a vessel and cascade across the port schedule.
Temperature-controlled storage, high-value inventory, and strict batch traceability requirements set by regulators. Throughput demands are modest, but error tolerance is effectively zero. Every move must be logged against a batch record. AGVs must integrate with Enterprise Resource Planning (ERP) systems and the quality management layer. The integration and audit trail challenge is harder than anything on the routing side.
Academic models assume robots do what they're told, maps don't change, and networks don't drop packets. None of that is true. These challenges rarely make it into textbooks, but they are what actually determine whether a deployment succeeds or fails.