Batch vs Realtime Geocoding: Picking the Right Default

When to geocode in batch vs realtime: cost math, latency requirements, freshness needs, and an architecture that lets you switch between them.

| May 17, 2026

Batch vs Realtime Geocoding: Picking the Right Default

The two patterns for getting addresses geocoded look completely different and someone always asks "which one do we need?" The honest answer is "you probably need both, but only one matters for the immediate decision." This post is the practical version: when batch makes sense, when realtime makes sense, the cost math, and the architecture that lets a single pipeline serve both modes without a rewrite.

By the end you should be able to look at any geocoding requirement and answer in 30 seconds: batch, realtime, or hybrid.

The two patterns defined

Realtime geocoding: A user (or a service) needs a coordinate *now*. They submit a single address; you return a single result; latency matters; cost-per-call is secondary.

Examples:

A logistics app where the dispatcher types in a delivery address and sees it pinned on the map immediately.
An autocomplete dropdown that fires as the user types.
A signup form that validates the user's address before submission.
A map tool that lets a sales rep look up where a prospect is.

Batch geocoding: A pile of addresses needs to be geocoded; the result is consumed later (minutes, hours, days). Latency-per-row is irrelevant; total cost matters; throughput matters.

Examples:

A CSV upload of 50,000 customer addresses for a mailing campaign.
A nightly job that re-geocodes any addresses that changed in the CRM.
A historical backfill of 5M records when a new product launches.
A monthly report that needs to count how many customers are within 25 miles of each store location.

The two have fundamentally different cost curves and different correctness budgets.

Cost math

Realtime cost-per-call is dominated by the geocoder's per-request rate, but the total volume is small (per-user, per-action). A pipeline doing 10K realtime lookups/day at $0.0005 each = $5/day = $150/month. Negligible.

Batch cost-per-call is the same per unit, but volumes are 10–1000× larger. A pipeline doing 1M batch lookups/day = $500/day = $15K/month. Now it matters.

The two modes also have different opportunities for cost reduction:

| Cost lever | Realtime | Batch | |---|---|---| | Caching | High value (repeat queries common in apps) | Very high value (lots of duplicate addresses in input data) | | Deduplication | Low value (one-off queries) | Very high value (~30–60% dedup rate on dirty data) | | Rate limiting | Low value (low volume) | Critical (prevents 429s on bursty workloads) | | Async processing | Hurts UX (user is waiting) | Free (no one is watching) |

A realtime pipeline gets ~30% cheaper with caching. A batch pipeline gets 90%+ cheaper with caching + dedup (cost math here).

Latency requirements

Realtime: p99 < 200 ms is the magic threshold for "feels instant" in a web UI. Above 500ms users notice. Above 1s they think your app is broken.

Batch: doesn't matter per-call. What matters is *total time to process N rows*. A 50K-row batch that takes 12 minutes is fine; the same batch taking 2 hours is annoying.

The total-time math: with 20-worker concurrency at p50 50ms per call:

50K rows ÷ 20 workers ÷ 20 calls/sec/worker = 125 seconds = 2 minutes (theoretical)
Real-world with cache misses, retries, occasional 429s: 5–15 minutes

Per-call, realtime needs 1 fast call. Batch needs many parallel calls but each can be slower.

Freshness budget

Different correctness requirements:

Realtime: Result must reflect *current state*. If an address moved last week (rare but possible), the user should see the new coords.
Batch: Result reflects state at processing time. A monthly report can use coords that were correct when the batch ran.

This affects caching strategy. Realtime caches with short TTLs (1 hour) trade-off freshness for speed. Batch can cache for months because the result is consumed once and the data freshness is "as of run date."

When to use which

Decision tree:

Is a human waiting for the result?
├── Yes → Realtime
│         (autocomplete, form validation, map UI, single-address lookup)
└── No  → How many addresses?
          ├── 1 to ~100      → Realtime (overhead of batching not worth it)
          ├── 100 to ~10K    → Either; batch is cheaper, realtime is simpler
          └── >10K           → Batch (cost difference is real)

The interesting middle ground (100–10K addresses) is where most teams pick wrong. The temptation is to "just call the API in a loop." That works but leaves money on the table — the batch endpoint is cheaper per call AND faster wall-clock because the geocoder can parallelize internally.

The realtime architecture

Optimized for low p99 latency on individual requests:

[Client] ── HTTPS ──▶ [Edge cache (CDN)] ── miss ──▶ [API server] ──▶ [Geocoder]
                              │                            │
                              └── hit (cache TTL ~1h) ◀────┘

Components:

Edge cache (Cloudflare, Fastly): caches popular addresses at the CDN. Trims latency for repeat queries to ~10ms.
API server: thin layer. Takes the request, does cache lookup, calls geocoder, returns. No queueing.
Geocoder: csv2geo or similar. Single-call latency budget: ~50ms p50.

Total p99 budget allocation:

DNS + TLS: 30ms
Edge cache lookup or origin connect: 30ms
Cache miss → geocoder call: 80ms
Response serialization: 10ms
Total: 150ms p99 — well under the 200ms threshold

Code shape (Express.js example):

import { LRUCache } from 'lru-cache';
import express from 'express';

const cache = new LRUCache({ max: 50_000, ttl: 1000 * 60 * 60 });
const app = express();

app.get('/geocode', async (req, res) => {
  const addr = req.query.q;
  if (!addr) return res.status(400).json({ error: 'q required' });

  const key = stableKey(addr);
  if (cache.has(key)) {
    return res.json(cache.get(key));
  }

  try {
    const r = await fetch(`https://api.csv2geo.com/v1/geocode?q=${encodeURIComponent(addr)}`, {
      headers: { 'X-API-Key': process.env.API_KEY },
      signal: AbortSignal.timeout(5000),
    });
    if (!r.ok) return res.status(502).json({ error: 'upstream' });
    const result = (await r.json()).results[0];
    cache.set(key, result);
    res.json(result);
  } catch (e) {
    res.status(504).json({ error: 'timeout' });
  }
});

Three things to notice:

In-process LRU cache for the hot path. Most of the lookups in a 1-hour window are repeats.
`AbortSignal.timeout(5000)` to fail fast on a slow geocoder. Better to return 504 to the user than to hold the connection for 30 seconds.
No queue. Realtime requests must be served immediately or fail; queuing would just add latency.

The batch architecture

Optimized for cost and total throughput on large workloads:

[CSV upload] ──▶ [Producer] ──▶ [Queue] ──▶ [Worker pool] ──▶ [Result store]
                                              │
                                              └──▶ [csv2geo /v1/geocode (POST batch)]
                                                          │
                                                          └──▶ [Cache + DB]

Components:

Producer: parses input, dedupes, enqueues per-row (or per-chunk) jobs.
Queue: SQS, BullMQ, or in-process pool depending on scale (detailed comparison here).
Worker pool: pulls jobs, calls the batch geocoder endpoint with chunks of 100–1000 addresses, writes results.
Result store: Postgres or S3, with the input row index preserved so you can join back.

Batch endpoint advantage: csv2geo's POST /v1/geocode accepts up to 10,000 addresses in one HTTP call. That's 10,000× the network overhead amortization vs single calls. Latency-per-batch is ~2–5 seconds; per-row inside the batch is ~50ms.

Code shape (Python worker):

import requests

def process_chunk(addresses):
    """Geocode a chunk of up to 10,000 addresses in one call."""
    r = requests.post(
        'https://api.csv2geo.com/v1/geocode',
        json={'addresses': addresses},
        headers={'X-API-Key': API_KEY},
        timeout=60,   # batches take longer
    )
    r.raise_for_status()
    return r.json()['results']

# Worker pulls 1000 addresses at a time
def worker(queue):
    while True:
        chunk = queue.get_chunk(size=1000)
        if not chunk: break
        results = process_chunk([row['addr'] for row in chunk])
        for row, result in zip(chunk, results):
            db.save(row['id'], result)

Three things to notice:

Batch endpoint, not single-call loop. 10–100× cheaper in network overhead.
Chunk size 1000. Sweet spot — bigger chunks have higher abandon cost on errors; smaller chunks lose batching benefit.
Persist per-row. Even though we batch, each row's result is saved separately so partial failures don't lose work.

The hybrid: same pipeline, both modes

Many real systems need both. A SaaS app might:

Geocode user input in real-time when they type (realtime).
Re-geocode the entire customer list nightly to catch address changes (batch).

The trick is to share the cache and the geocoder between the two modes. Architecture:

              ┌─── [Realtime API server] ──┐
              │                            │
[Shared cache (Redis)]               [csv2geo API]
              │                            │
              └─── [Batch worker pool]  ───┘

Both modes write to the same Redis cache. A realtime query that misses the cache, calls the API, and writes the result — that result is then available to the next batch run for free. A batch run that warms the cache with 50K results — those become realtime cache hits for the next user.

The implementation is the same geocode_with_cache(addr) function called from two different code paths. No "batch vs realtime" code split below the cache layer.

When to switch modes

A pipeline that started realtime grows. At some point batch becomes the right move:

| Symptom | Likely fix | |---|---| | Single requests >5K/day, mostly during business hours | Add a worker pool; realtime stays fast, batch absorbs the volume | | Recurring monthly reports re-geocoding the whole customer list | Move to async batch with a status email | | User-uploaded CSVs lock up the web request for 30+ seconds | Move uploads to a queue, return job ID, poll for completion | | API bill growing 2× per quarter | Add caching + dedup; usually drops bill by 50–80% | | Geocoder p99 spikes when a customer uploads a large file | Throttle batch via rate limiting; reserve burst capacity for realtime |

The opposite move (batch → realtime) is rare but happens: a nightly batch report that becomes a self-serve dashboard. Then the architecture inverts — pre-compute results in batch, serve them realtime from cache.

What I'd not do

Use the batch endpoint for single addresses just because it's cheaper per call. The overhead of constructing a 1-element batch and parsing the wrapper response makes it slower than GET /v1/geocode?q=... for individual lookups.
Try to make realtime geocoding work for 100K-row CSV uploads. It's fine to call the API in a loop for the first 50 rows to feel the pain, then queue the rest.
Cache realtime and batch results separately. Same data, same cache. Keep one source of truth.
Skip dedup for "small" batches. Even 1,000 rows often have 200 duplicates. The dedup step is 30 lines of code and saves real money. Always do it.

Cost comparison: realtime-loop vs batch

Same workload — 100,000 addresses to geocode, with ~30% dedup opportunity:

| Approach | API calls | Wall time | Cost @ $0.0005 | |---|---|---|---| | Realtime loop, no dedup | 100,000 | ~3 hours (single thread) | $50 | | Realtime loop, in-process dedup | 70,000 | ~2 hours | $35 | | Realtime parallel (20 workers), dedup | 70,000 | ~6 minutes | $35 | | Batch endpoint, dedup | 70,000 | ~3 minutes | $35 | | Batch endpoint, dedup, 90% cache hits (warm) | 7,000 | ~30 seconds | $3.50 |

The cache-warm scenario is the goal: subsequent runs of overlapping data become almost free. This is why caching matters more for batch than realtime — repeat workloads are common at batch scale.

Frequently Asked Questions

When does batch beat realtime for geocoding?

At >10K addresses always, and at 100–10K when async is acceptable. Batch endpoints amortize HTTP overhead across many addresses (one round-trip for 1,000 addresses instead of 1,000 round-trips), parallelize on the provider side, and let you negotiate volume pricing. Below 100 addresses, realtime is simpler and the volume is not enough to matter.

What is the cost difference between realtime-loop and batch for 100K addresses?

Same call volume after dedup (~70K), but batch finishes in 3 minutes versus 3 hours single-thread or 6 minutes with 20 parallel realtime workers. Cost is identical per call (~$35); the savings show up as wall-clock time. With 90% cache hits on a warm subsequent run, batch drops to $3.50 and 30 seconds.

Can the same pipeline serve both batch and realtime modes?

Yes — and it should. Share the cache, the dedup layer, the rate limiter, and the result store. Keep separate code paths only above the cache: realtime calls one address at a time, batch enqueues a payload. The reusable components dwarf the entry-point-specific code.

What latency budget pushes you to realtime?

Sub-second. If a user is waiting for the result (form submission, autocomplete, route preview), you need realtime — typically a sub-200ms p99 from a warm cache or sub-2s from the geocoder cold. Anything where a "we will email you when done" response is acceptable can be batch.

Why does caching matter more for batch than realtime?

Because batch workloads are typically repeat workloads — the same customer's address list, re-uploaded weekly with 10% new rows. After one warm run the cache hit rate climbs above 90% and per-batch cost drops to near zero. Realtime traffic is more varied (one-off user queries), so cache hit rates plateau lower.

Summary

| You need | Default to | |---|---| | Single address, user waiting | Realtime | | <100 addresses, async OK | Realtime (it's simpler) | | 100-10K addresses, async OK | Batch (cheaper per call) | | >10K addresses | Batch always | | Both, same product | Hybrid: shared cache, separate code paths above the cache |

The architecture decision is mostly about latency budget and total volume. Get those two right and the rest follows. The cache, the queue, the dedup, the rate limiter — all get reused regardless of which mode is the entry point.