A Dispatch Console That Handles 5,000 Stops a Day

Build a last-mile dispatch console that handles 5,000 stops/day with two endpoints: routing and places. REST examples, latency budget, retry strategy.

| May 28, 2026

A Dispatch Console That Handles 5,000 Stops a Day

Last-mile dispatch becomes a different problem at the 5,000-stops-per-day mark. Up to a few hundred stops, a dispatcher can solve the truck-loading puzzle in a spreadsheet and a route planner that runs once at 06:00. Past a few thousand, the puzzle is no longer about the optimal route — it is about responding to the reality that thirty stops were added at 09:30, four trucks broke down at 11:00, the customer at stop 1417 just called to reschedule, and the dispatcher needs a screen that re-plans in seconds, not in batches overnight.

The architecture that handles this is unglamorous. Two REST endpoints, one queue, one in-memory state machine per truck, and a discipline about which work happens at what latency tier. This post walks through the pieces, the numbers, the failure modes, and the API call patterns that hold up at 5,000 stops a day on a single dispatcher's screen.

What "5,000 stops a day" actually means

The headline number hides three different workloads.

A few hundred *initial* stops at 04:00 — the overnight batch that turns yesterday's orders into the first cut of routes. This runs once, can take five minutes, and never needs to be sub-second. The constraint is correctness, not speed.

A few thousand *updates* per day — additions, cancellations, time-window changes, address corrections — landing throughout the working day. Each update may invalidate one truck's route or every truck's route. The dispatcher needs the re-plan in under ten seconds or the operational picture goes stale. The constraint here is interactive latency.

A few hundred *exceptions* per hour — truck breaks down, driver runs late, customer reschedules at the doorstep. Each exception needs the dispatcher to see the impact on every other stop on that truck plus a candidate handoff to a neighbouring truck. The constraint is responsiveness measured in seconds, not minutes.

These three workloads share endpoints but have very different latency budgets. A common mistake is sizing the system for the headline 5,000-stops-per-day number and assuming the rest follows. It does not — the system is sized for the *update* workload, which dominates the cost and the user-experience risk.

The two endpoints that do the heavy lifting

`POST /api/v1/routing` — given a list of stops with coordinates and time windows, returns an ordered sequence per vehicle, with estimated drive times between each pair. Supports up to several hundred stops per request. Use this for batch route construction and for re-route-this-one-truck operations.

`GET /api/v1/places/nearby` — given a coordinate and a category (gas station, parking, depot, charger, food), returns the nearest matches with travel-distance estimates. Use this for "the truck broke down, find the nearest tow yard" and "we need to add a charging stop on this route" workflows.

Pricing context: at paid pricing starting from $54/month, a dispatch console doing 5,000 stops per day plus 50,000 routing-related re-plans and exception lookups lands well under one cent per stop. The full pricing table is at csv2geo.com/pricing/api.

A worked walk-through of the routing endpoint with deeper detail on the optimisation behaviour lives in Concurrency Tuning for Geocoding — many of the same retry and budget patterns apply.

The latency budget

Three tiers. Design every code path against the right one.

| Tier | Budget | What lives here | |---|---|---| | Overnight batch | 5 min | Initial 5,000-stop route construction | | Interactive re-plan | 8-10 s | Single-truck re-route after schedule change | | Exception lookup | 1-2 s | "Find nearest depot" / "nearest charger" / single-stop verification |

The 8-10 s interactive budget is the killer one. If you blow it, the dispatcher reverts to a manual workflow and the entire console becomes a viewer instead of a tool. Three things have to fit inside it: the round-trip to your backend, the routing API call, and the front-end re-render. A reasonable allocation is 200 ms client overhead, 6-8 s routing API, 1-2 s state propagation and render.

That budget is achievable for a single truck's worth of stops (commonly 50-80 stops). It is not achievable if you naively re-route every truck on every change. The discipline that holds the budget is *only re-route the trucks whose stop list actually changed*. A schedule change to stop 1417 affects truck 3; it does not affect trucks 1, 2, 4, 5. Re-route only truck 3.

The architecture

┌──────────────────┐    ┌─────────────────────┐    ┌──────────────────┐
│  dispatcher UI   │ ←→ │  console backend    │ ←→ │  CSV2GEO API     │
│  (React/Vue)     │    │  (Node/Go/Python)   │    │  routing + places│
└──────────────────┘    │  + in-memory state  │    └──────────────────┘
                        │  + Redis queue      │
                        └─────────────────────┘
                                  ↓
                        ┌─────────────────────┐
                        │  Postgres           │
                        │  stops, trucks,     │
                        │  audit log          │
                        └─────────────────────┘

In-memory state per truck is the unglamorous part that makes the latency budget work. Each truck has a current ordered list of stops, a current location estimate (from the last GPS ping), and a dirty flag. When a stop changes, the dirty flag flips on the affected truck only, the routing endpoint is called for just that truck, and the new ordered list replaces the old one. The dispatcher's screen updates from a WebSocket push.

Postgres is the source of truth and the audit log. It is not in the latency-critical path for interactive re-plans — the backend reads stops from Postgres at startup, keeps them in memory, and writes updates back asynchronously. The audit log keeps every state transition so the morning-after operations review can answer "why did we re-route truck 3 at 09:47?" without guessing.

The two API call patterns

Step 1 — overnight batch route construction

Once a day, build the first cut of routes for tomorrow's deliveries. A single call per truck, parallel across trucks. The endpoint accepts up to several hundred stops per request, so a single truck's worth of stops fits in one call.

import requests, os, concurrent.futures

API = "https://csv2geo.com/api/v1"
KEY = os.environ["CSV2GEO_KEY"]

def route_one_truck(truck_id, stops, depot):
    payload = {
        "vehicle": {"start": depot, "end": depot},
        "stops": [
            {"id": s["id"], "lat": s["lat"], "lng": s["lng"],
             "time_window": [s["earliest"], s["latest"]],
             "service_time_s": s.get("service_time_s", 180)}
            for s in stops
        ],
    }
    r = requests.post(
        f"{API}/routing",
        params={"api_key": KEY},
        json=payload,
        timeout=60,
    )
    r.raise_for_status()
    return truck_id, r.json()

# Run all trucks in parallel
with concurrent.futures.ThreadPoolExecutor(max_workers=16) as ex:
    futs = [ex.submit(route_one_truck, t["id"], t["stops"], t["depot"])
            for t in trucks]
    results = {tid: data for tid, data in (f.result() for f in futs)}

16 worker threads is a starting point. The right number depends on how many trucks you have and your API rate-limit headroom — see Concurrency Tuning for Geocoding for the rationale.

Step 2 — interactive re-plan after a change

The hot path. Triggered from the dispatcher's UI when a stop is added, cancelled, or moved between trucks. Single truck, single call:

async function replanTruck(truckId, depot, stops) {
  const r = await fetch(`https://csv2geo.com/api/v1/routing`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      vehicle: { start: depot, end: depot },
      stops: stops.map(s => ({
        id: s.id, lat: s.lat, lng: s.lng,
        time_window: [s.earliest, s.latest],
        service_time_s: s.serviceTimeS ?? 180,
      })),
    }),
  });
  if (!r.ok) throw new Error(`routing http ${r.status}`);
  return r.json();
}

This is the call that has to come back in 6-8 s for the latency budget to hold. Two failure modes to design around:

Timeout or 5xx response. Retry with exponential backoff bounded at 12 s total — past that, surface the error to the dispatcher with a "re-plan failed, keeping previous route" message rather than spinning indefinitely. The retry policy is the same one in Exponential Backoff — When to Retry, When to Stop.
Infeasible time windows. When the routing engine cannot fit all stops within their windows it returns the best partial route plus a list of unscheduled stops. Surface those to the dispatcher as a separate queue — "these 3 stops need manual rescheduling" — rather than treating it as an error.

Step 3 — exception lookup with the places endpoint

Driver radios in: "I'm at the warehouse but the back gate is locked, where's the closest open one?" Or: "I have 40 minutes till my next window, where's a charger?" The places-nearby endpoint answers these in one round-trip:

curl -s "https://csv2geo.com/api/v1/places/nearby?lat=$LAT&lng=$LNG&category=charging_station&radius_m=5000&limit=5&api_key=$KEY" \
  | jq '.results[] | {name, address, distance_m}'

This is the 1-2 second tier. Keep the request shape stable across categories — only the category parameter changes between "charging station", "fuel", "loading dock", and "rest area" — so the front-end can render the result set with the same component regardless of what was asked.

Step 4 — propagate the new state

After a successful re-plan, three things happen in order:

Write the new ordered stop list to Postgres (truck_id, sequence, ETA per stop). This is the audit trail.
Update the in-memory truck state. This is what the next re-plan reads from.
Push a WebSocket message to every dispatcher subscribed to this truck. The dispatcher's screen re-renders with the new route on the map and the updated stop-list panel.

Order matters. If you push to the WebSocket before writing to Postgres, a server restart between the two can leave the dispatcher's screen out of sync with the durable state.

Step 5 — handle the exception cascade

The hardest pattern in dispatch software: one change cascades into multiple re-plans. Truck 3 breaks down → stops on truck 3 need to be redistributed to trucks 1, 2, and 4 → trucks 1, 2, 4 each need their own re-plan → the original truck 3 is removed from the active fleet. Done naively, that is four routing calls in parallel, eight seconds each, eight seconds of total wait if you parallelise (and you must).

The wrong pattern is to wait for all four re-plans before showing anything to the dispatcher. The right pattern is to optimistically show "redistribution in progress" with the candidate stop assignments immediately and update each truck's route on its panel as the re-plan completes. The dispatcher's mental model stays current; the screen progressively fills in. Eight seconds of staring at a spinner becomes eight seconds of watching trucks update one by one, which feels half as long.

What breaks at 5,000 stops a day

Three concrete failure modes that hit teams ramping past a few thousand stops.

Rate-limit collisions during the morning re-plan storm. Between 08:00 and 09:30 every weekday the dispatcher fires dozens of re-plans as drivers report in late, customers reschedule, and the first traffic delays propagate. If your console's worker pool fires routing calls in parallel without budgeting against the API rate limit, you collide with yourself and the limit kicks in mid-storm. The fix is a single token-bucket rate limiter shared across all worker threads, sized to 70-80% of your plan's per-second limit so you have headroom for exception lookups during the storm.

The cold-start re-plan that takes 20 seconds. When the backend restarts, the in-memory truck state is empty and the first re-plan request has to re-load the truck's stops from Postgres before calling the routing endpoint. This adds 1-2 seconds to the first call after each restart, which is enough to blow the 8-10 s budget when combined with a slightly slow API response. Mitigation: warm the in-memory cache at startup by reading all active trucks in parallel before accepting requests. A 3-4 second startup penalty in exchange for sub-budget interactive latency forever after.

The "phantom truck" race condition. If the dispatcher cancels a re-plan in flight (closes the panel, moves a stop back), the response may still arrive 5 seconds later and overwrite the new state. Mitigation: every re-plan request carries a monotonically-increasing request ID per truck, and the backend ignores responses whose request ID is older than the truck's current ID. Three lines of code, prevents a class of bug that is hell to diagnose under load.

Caching, but only for the right reads

Geocoding the same address twice in one day is wasted work. Wrap the geocode endpoint behind a process-local LRU keyed by the normalised address string, and the morning batch's "what's the lat/lng for this new stop" calls drop by 60-80% after the first hour. The exception-lookup places calls do not cache as well — "nearest charger to my current GPS" is by definition unique per query — and the routing calls do not cache at all because the inputs include the live truck position. Cache the stable inputs, not the dynamic ones. See Caching Geocoding Results — 90% Cost Reduction for the full pattern.

What honest p99 looks like

The temptation is to report a single average latency number for "routing API". The honest number is a p99, broken down by stop count, because 50-stop routes return in 2-3 seconds and 200-stop routes return in 8-12 seconds and reporting their average makes the 200-stop case look fine when it is on the latency cliff. Plot p99 against stop count and you get a curve you can defend to operations. See p99 Latency — When the Average Lies for why the p50/p99 distinction matters in particular for routing.

FAQ

How many stops per truck before the latency budget breaks? At 50-80 stops the 6-8 s routing response holds with margin. At 150-200 stops you are on the cliff — start splitting a truck's stops into two virtual route segments instead of one routing call.

Can the routing endpoint handle multi-day routes? Yes — time windows are absolute timestamps, not offsets from "now". A two-day route with overnight breaks is one request; the engine respects the windows.

What happens when a stop has no time window? The engine fills with reasonable defaults (typically business hours of the destination's time zone). For production you should always pass an explicit window — implicit defaults will surprise you.

How do I handle stops that fail to geocode? Surface them to the dispatcher as "address could not be located" and require manual coordinate entry before adding to a truck. Auto-snapping to the nearest address is tempting and usually wrong — see Reverse-Geocoding Accuracy and the Distance Meters.

Is there an SDK? The official SDKs wrap these same REST endpoints. For dispatch software the recommendation is to stay on raw REST — the routing payload shape changes rarely, and the SDK version-pinning friction is real when you ship hot fixes during a Monday-morning incident.

What's the cost at 5,000 stops/day with 50,000 re-plans/month? At paid pricing tiers starting from $54/month for 100,000 calls, a single-warehouse dispatch operation lands well under one cent per stop. The full pricing table is at csv2geo.com/pricing/api.

Can the same console handle multi-warehouse fleets? Yes, but the optimisation gets harder — cross-warehouse stop transfers introduce a second-order problem the routing endpoint does not solve directly. For that case use the per-truck routing pattern, then layer a separate transfer-decision engine on top.

---

*I.A. / CSV2GEO Creator*