Embedding a geocoding API in your SaaS without becoming an SRE

Embed geocoding in your B2B SaaS product without an ops headache. Rate limits, retry logic, caching, and API design patterns that hold under real load.

| June 07, 2026

Embedding a geocoding API in your SaaS without becoming an SRE

Most B2B SaaS products reach a point where they need geocoding. A CRM wants to normalise customer addresses. A field-service scheduler needs to turn a postcode into coordinates. A compliance tool needs to attach a jurisdiction to every account record. The feature is small enough that a single engineer owns the integration — and that engineer typically has no SRE support, no dedicated infra rotation, and a backlog full of other things.

The wrong version of this integration ships fast and pages you at 2 a.m. three weeks later when a rate-limit spike cascades into a request queue backup that cascades into a user-visible timeout. The right version takes a little more thought upfront and then does not generate incidents.

This post is the upfront thought. It covers the patterns that make a geocoding API integration production-ready without requiring a dedicated ops engineer: how the rate limits actually work, how to size your client's concurrency, how to retry without thundering-herding yourself, how to cache aggressively enough that your live-traffic API spend is a small fraction of your raw address volume, and how to instrument the client so you catch degradation before your users do. Every code example is plain HTTP — curl, Python requests, Node fetch. SDKs exist; the REST patterns hold regardless of whether you use them.

The failure modes that catch teams by surprise

Before architecture, a taxonomy of what goes wrong. Recognising the failure mode is 80% of the fix.

Synchronous geocoding on the user's request path. A user submits a new account with a billing address. Your API handler geocodes it inline before returning the 201. When the geocoding API slows — because your upstream is having a slow minute, because you are hitting a rate limit, because a deployment briefly disrupted routing — your API handler stalls. Your users see timeouts. The fix is to accept the address, return immediately, and geocode asynchronously. The geocoded field is nullable in your schema; it gets populated within seconds; no user ever waits for a third-party API to respond.

Batch jobs that burst into the API. A nightly job processes 20,000 records. It fires all 20,000 geocoding calls in a tight async loop at the start of the run. The API rate-limits you. Your error handling retries all 20,000 simultaneously. The API rate-limits you harder. You wake up to 20,000 failed records and no useful logs. The fix is a concurrency cap — no more than N in-flight requests at once — combined with exponential backoff per failed request.

No cache, so every read is a write. An address search widget geocodes the user's query on every keystroke. The same address "123 Main St, Austin TX" is geocoded 40 times a day across different users and different sessions. You are spending 40 credits on something that could cost 1. The fix is a read-through cache keyed on the normalised address string, with a long TTL — addresses do not move.

Silent 4xx degradation. The geocoding API returns a 422 for a malformed address. Your client logs the error and moves on. The record sits in your database with a null geocode and no flag. Three months later a report surfaces that 8% of your enterprise customer's account records are un-geocoded and no one can explain why. The fix is a structured error taxonomy — 4xx errors that represent bad input (and will never succeed on retry) need a separate code path from 5xx errors that represent transient failures (and should retry).

Rate-limit arithmetic that does not account for concurrency. You have a 100,000 calls/day plan. Your batch job runs at midnight. You think "100,000 / 86,400 seconds = 1.2 calls/second, I am fine." But your batch job fires 50 concurrent requests at once, which means you are burning 50 credits per second for the first few hundred seconds. The API's per-second rate limit kicks in before the daily limit becomes relevant. The fix is to understand both the per-second bucket and the per-day bucket, and to size your concurrency against the per-second limit, not the daily average.

Now the architecture that avoids all five.

How CSV2GEO rate limits work

CSV2GEO operates two independent limits that your client must respect simultaneously.

Daily credit budget. The free tier is 3,000 calls per day. The entry paid tier ($54/month) is 100,000 calls per month — roughly 3,300 per day on a 30-day month. Higher tiers increase the monthly budget; see csv2geo.com/pricing/api for the current brackets. Daily credits reset at UTC midnight.

Per-second (burst) rate limit. This is the limit that catches batch jobs. Even if you have 100,000 credits in your daily budget, you cannot burn all of them in one second. The API enforces a per-second call rate. When you exceed it you get a 429 Too Many Requests response with a Retry-After header. That header is your signal — not just "slow down," but "slow down for exactly this many seconds."

The client design that handles both:

import time
import os
import requests
from collections import deque

KEY = os.environ["CSV2GEO_API_KEY"]
API = "https://csv2geo.com/api/v1/geocode"

# Simple token-bucket: N calls per second, enforced client-side.
# Adjust N to stay comfortably under the server-side limit.
CALLS_PER_SECOND = 5
_call_times = deque()

def _throttle():
    now = time.monotonic()
    # Prune calls older than 1 second
    while _call_times and now - _call_times[0] > 1.0:
        _call_times.popleft()
    if len(_call_times) >= CALLS_PER_SECOND:
        sleep_for = 1.0 - (now - _call_times[0])
        if sleep_for > 0:
            time.sleep(sleep_for)
    _call_times.append(time.monotonic())

def geocode(address: str) -> dict | None:
    _throttle()
    try:
        r = requests.get(
            API,
            params={"q": address, "api_key": KEY},
            timeout=10,
        )
        if r.status_code == 429:
            retry_after = int(r.headers.get("Retry-After", "2"))
            time.sleep(retry_after)
            return geocode(address)  # one recursive retry
        r.raise_for_status()
        results = r.json().get("results", [])
        return results[0] if results else None
    except requests.Timeout:
        return None  # caller handles None as "geocode pending"

This is a thread-unsafe single-process throttle — fine for a synchronous batch script. For a multi-worker setup (Celery, concurrent.futures, a Node cluster), put the rate-limit state in Redis and enforce it there. The token bucket vs leaky bucket comparison covers the tradeoffs in detail.

Async geocoding — the pattern that removes the user from the blast radius

Never geocode on the user's critical path for write operations. This is the single highest-leverage change most SaaS integrations can make.

The pattern:

Accept the address. Store it. Return your API response immediately with geocode_status: "pending".
Enqueue a background job with the record ID and the raw address string.
The worker calls the geocoding API, writes the result back to the record, and updates geocode_status to "complete" or "failed".
Your frontend polls or subscribes to the record update.

The user sees the address appear instantly. The geocode appears a second or two later — fast enough that on a typical detail page load, it is already there before the user scrolls to it.

In Node with a simple job queue:

// In your account-creation handler:
const account = await db.accounts.create({ billingAddress: req.body.address });
await geocodeQueue.add('geocode-account', { accountId: account.id, address: req.body.address });
res.status(201).json({ id: account.id, geocodeStatus: 'pending' });

// In your worker:
geocodeQueue.process('geocode-account', async (job) => {
  const { accountId, address } = job.data;
  const encoded = encodeURIComponent(address);
  const r = await fetch(
    `https://csv2geo.com/api/v1/geocode?q=${encoded}&api_key=${process.env.CSV2GEO_KEY}`
  );
  if (!r.ok) throw new Error(`geocode failed: ${r.status}`); // queue retries on throw
  const data = await r.json();
  const result = data.results?.[0] ?? null;
  await db.accounts.update(accountId, {
    lat: result?.lat ?? null,
    lng: result?.lng ?? null,
    geocodeStatus: result ? 'complete' : 'no_result',
  });
});

The job queue handles retries automatically on a thrown error. Your worker does not need to implement its own retry loop for the common cases — the queue's retry policy (exponential backoff with a max attempt count) is the right home for that logic. See exponential backoff — when to retry, when to stop for the parameters worth tuning.

The cache layer — the single biggest cost lever

Addresses do not move. A geocoded result for "350 Fifth Avenue, New York NY" is valid for years. If your application geocodes the same address string more than once — across users, across sessions, across re-imports of the same customer data — you are paying for the same answer repeatedly.

A read-through cache in Redis, keyed on the normalised address string, with a 90-day TTL, will drop the fraction of addresses that hit the live API to the percentage that are genuinely new. For most B2B SaaS applications — where customers update their own address records infrequently — cache hit rates above 80% are common within two weeks of launch.

The normalisation step matters: "350 5th Ave New York NY" and "350 Fifth Avenue, New York, NY 10118" should resolve to the same cache key. Normalise before lookup:

import re
import hashlib

def normalise_address(raw: str) -> str:
    # lowercase, collapse whitespace, strip punctuation except commas and hyphens
    s = raw.lower().strip()
    s = re.sub(r"[^\w\s,\-]", "", s)
    s = re.sub(r"\s+", " ", s)
    return s

def cache_key(raw: str) -> str:
    return "geo:" + hashlib.sha256(normalise_address(raw).encode()).hexdigest()[:16]

import redis

cache = redis.Redis.from_url(os.environ["REDIS_URL"])
TTL = 60 * 60 * 24 * 90  # 90 days

def geocode_cached(address: str) -> dict | None:
    key = cache_key(address)
    cached = cache.get(key)
    if cached:
        import json
        return json.loads(cached)
    result = geocode(address)  # the throttled function from earlier
    if result is not None:
        import json
        cache.set(key, json.dumps(result), ex=TTL)
    return result

The broader caching strategy — what to cache, how long, how to handle cache invalidation — is covered end-to-end in caching geocoding results — 90% cost reduction. The headline number in that post is real: teams with high address-reuse rates routinely cut API spend by 80–90% after adding a proper cache layer.

Concurrency tuning for batch jobs

The question every batch job eventually forces: how many in-flight requests is the right number?

Too few and the job takes hours. Too many and you hit the per-second rate limit, get 429s, waste time on retries, and may still take hours because the retries eat the budget.

The sweet spot is a function of two things: the API's per-second rate limit and your own network round-trip time. A rough formula: concurrency = rate_limit_per_second × mean_latency_seconds. If the API allows 10 calls per second and a round-trip takes 200 ms, keeping 2 requests in flight at a time saturates the rate limit without going over. In practice, start conservatively — 3 to 5 concurrent — and increase until you first see 429s, then back off by 20%.

In Python with concurrent.futures:

from concurrent.futures import ThreadPoolExecutor, as_completed

def enrich_batch(addresses: list[str], max_workers: int = 4) -> list[dict | None]:
    results = [None] * len(addresses)
    with ThreadPoolExecutor(max_workers=max_workers) as pool:
        future_to_idx = {
            pool.submit(geocode_cached, addr): i
            for i, addr in enumerate(addresses)
        }
        for future in as_completed(future_to_idx):
            idx = future_to_idx[future]
            try:
                results[idx] = future.result()
            except Exception as exc:
                # Log; leave results[idx] as None for retry pass
                print(f"address {idx} failed: {exc}")
    return results

Keep max_workers as a config value, not a hardcoded constant. You will want to adjust it when you move between tiers or when you observe 429 rates creeping up. The deeper treatment of concurrency mechanics lives in concurrency tuning for geocoding — finding the sweet spot.

Error taxonomy — 4xx is not the same as 5xx

This is the distinction that matters most for a robust client. The two error classes require completely different handling.

Transient errors (5xx, network timeout, connection reset): the server had a bad moment. Retry with backoff. These should succeed on the second or third attempt the vast majority of the time. If they do not, escalate to your alerting — you have a sustained incident, not a transient blip.

Permanent errors (4xx, excluding 429): the request itself is the problem. A 400 Bad Request means your address string is malformed. A 401 Unauthorized means your API key is wrong. A 422 Unprocessable Entity means the address parsed but could not be geocoded — perhaps it does not exist in the 461M+ address dataset, perhaps it is outside the 39 supported countries. Retrying a 422 is wasted credits; it will return 422 every time until the address input changes.

The geocode_status column in your database should distinguish these:

| API outcome | Status to write | Action | |---|---|---| | 200, result returned | complete | done | | 200, empty results | no_result | surface to user for address correction | | 422 | invalid_address | surface to user for address correction | | 429 | pending | re-queue with delay | | 5xx / timeout | pending | re-queue with exponential backoff |

The no_result and invalid_address statuses are the ones most integrations forget to handle. They show up as null geocodes with no explanation, confusing support tickets, and eventual data-quality reports that are hard to trace. Build the status taxonomy before you ship.

Observability — the two metrics that matter most

You do not need a full APM setup to catch geocoding degradation before your users do. Two metrics, logged from your client, are enough.

Cache hit rate. Every call to your geocode_cached function increments either a geocode.cache.hit or geocode.cache.miss counter. Plot it as a rolling percentage. Sudden drops in hit rate — below 50% on a mature installation — mean you are seeing a new pattern of address input: a bulk import, a new customer segment, a geography you had not seen before. That is a useful signal both for cost forecasting and for QA (new geographies are where geocoding accuracy tends to be lower).

API error rate by status class. A counter per response class: geocode.api.2xx, geocode.api.4xx_permanent, geocode.api.4xx_429, geocode.api.5xx. A rising 4xx_permanent rate means your input data quality is degrading — customers are entering worse addresses, or an upstream import is feeding you malformed strings. A rising 5xx rate means the API itself is having a bad period — page if it sustains for more than two minutes. A rising 4xx_429 rate means your concurrency is too high — back off immediately.

Both counters are three lines of code each. If your stack has a StatsD client or a Prometheus client, emit them there. If not, write them to your application log in a structured format and build a simple log-alert query. The full observability treatment is in observability for geocoding pipelines — the metrics that matter.

How to ship this on Monday

A realistic five-step plan for a two-person team adding geocoding to an existing SaaS product.

Step 1: Audit your address volume and uniqueness ratio

Before writing a line of client code, answer two questions. How many unique address strings does your application process per day? What fraction of them are repeats — the same customer address seen in multiple records, the same billing address across multiple invoices? If your repeat fraction is above 30%, the cache is your first engineering priority, not the API client itself. If your address volume is under 3,000 unique strings per day, the free tier covers you entirely while you pilot.

Step 2: Add the geocode_status column and the async job first

Before you call the API at all, add lat FLOAT, lng FLOAT, and geocode_status VARCHAR(20) DEFAULT 'pending' to your relevant table. Wire up the job queue. Deploy the worker with a stub that writes geocode_status = 'complete' without actually calling the API. Verify that the async flow works end-to-end — record created, job enqueued, record updated — before you add the real HTTP call. This order matters because it forces you to design the failure path (what does the UI show for geocode_status = 'pending'?) before you have a live API dependency.

Step 3: Implement the cache layer before the live API calls

Wire up Redis (or your existing caching layer) and the geocode_cached wrapper. Seed it with any addresses you already have normalised coordinates for — many SaaS products have a year of historical address data sitting in a legacy column. A one-time script that reads (address, lat, lng) from existing records and writes them into the cache means your first week of live traffic has a meaningful cache hit rate from day one.

Step 4: Implement the throttled client with the error taxonomy

Now add the actual HTTP call, the token-bucket throttle, and the status taxonomy. Write a small test suite that covers the four outcomes: successful geocode, empty result, permanent error, transient error. Use responses (Python) or nock (Node) to mock the API responses so the tests run without network access. The test suite is also your documentation — the next engineer to touch this code will read the tests before they read the source.

Step 5: Add the two observability counters and set a 429-rate alert

The last thing to ship before you call it production-ready: the geocode.cache.hit, geocode.cache.miss, and geocode.api.* counters, and a single alert rule — "if geocode.api.4xx_429 rate exceeds X per minute for 2 consecutive minutes, page the on-call." That alert is the early-warning system that tells you your batch job's concurrency setting is wrong before the 429 cascade reaches the daily credit limit.

The cost picture at each tier

A worked example so you can forecast honestly before choosing a plan.

Assume a B2B SaaS with 500 new account sign-ups per day, each with one billing address. Assume 60% of those addresses are genuinely new (the other 40% are existing customers updating records or submitting duplicate imports).

New addresses per day: 300
Cache covers: 200 per day
Live API calls per day: 300
Monthly live API calls: ~9,000

The free tier (3,000 calls/day) covers this comfortably. When you add a feature that geocodes all existing account records retroactively — say 80,000 accounts — you run that as a one-time batch over a few days using the free tier's daily budget, or you take the $54/month tier for a month, run the batch, then re-evaluate.

The point is that the cache dramatically changes the economics. Without a cache, 500 sign-ups/day is 15,000 API calls/month — a paying tier. With a cache, the same application runs on the free tier through most of its growth curve. Model the cache from the start; it changes which tier you need at every volume level.

What you should not do to save money

Two anti-patterns that look like cost savings and create problems.

Geocoding only the country and ignoring the full address. Some teams decide they only need country-level or city-level resolution and pass a truncated address string to save processing. The API will accept it, but the result is less accurate than a full-address geocode — and the accuracy cost is invisible until you try to do something meaningful with the coordinates. Pass the full address; let the API do the normalisation.

Setting a very long retry loop on 4xx permanent errors. A 422 on a badly formed address is not going to succeed on the 20th retry any more than it did on the first. Teams that build a generic "retry everything up to 20 times" loop end up burning their daily credit budget on addresses that were never going to geocode. The status taxonomy in the table above prevents this: invalid_address records go into a user-correction queue, not a retry queue.

FAQ

Can I embed CSV2GEO's geocoding in a SaaS product that I sell to other businesses? Yes. There is no resale restriction on the API. You call the API under your own key, you mark up the cost or absorb it as part of your product's unit economics, and your customers never see the underlying call. The API agreement covers your usage under your account.

What happens when I hit the daily credit limit? Calls beyond the daily limit return 402 Payment Required. Your client should treat this as a pending outcome — not a failed outcome — and re-queue the job for the following UTC day. The credits reset at midnight UTC, not at a rolling 24-hour window.

Is there a way to test the integration without spending real credits? The free tier (3,000 calls/day, no credit card required) is the right sandbox environment. It is the same API, the same response format, the same rate-limit behaviour. Do not build a mock server and test against that — you will miss the real response shape differences that only show up against the live API.

How do I handle addresses in countries outside the 39 supported? The API returns an empty results array for addresses it cannot resolve. Write geocode_status = 'no_result' and surface a message to the user that the address could not be located — do not suggest they reformat it, because the problem may be coverage rather than format. Coverage is expanding; check the pricing page for the current country list.

Should I use the SDK or raw REST? Both work. The REST API is simple enough that many production integrations never use the SDK — a 30-line wrapper like the one in this post is easier to maintain than a dependency with its own upgrade cycle. If your team is already comfortable with requests or fetch, stay there. If you want type-safe request/response models and do not want to write the wrapper yourself, the SDK is a reasonable choice.

How do I know if my geocoding accuracy is good enough for my use case? Run a sanity-check pass on a sample of results. For US addresses, a geocoded point that is more than 200 metres from the parcel centroid is a signal to investigate. The reverse geocoding accuracy and distance meters post covers how to think about accuracy thresholds for different use cases. For most B2B SaaS applications — routing, jurisdiction assignment, address standardisation — the default confidence threshold (results with confidence >= 0.7) is sufficient.

What is the right TTL for cached geocoding results? 90 days is a conservative default. Addresses are stable on timescales of years — buildings do not move, streets do not get renumbered often. The main edge case is a customer who genuinely changes their address, which you handle by invalidating the cache key on any update to the address field rather than by shortening the TTL globally.

Caching geocoding results — 90% cost reduction — the full caching strategy that drops your API spend to a fraction of your address volume
Exponential backoff — when to retry, when to stop — retry parameters worth tuning for a geocoding worker
Rate limiting — token bucket vs leaky bucket — the two client-side rate-limit strategies and when each fits
Observability for geocoding pipelines — the metrics that matter — what to instrument and how to alert on it
Concurrency tuning for geocoding — finding the sweet spot — how to find the right number of in-flight requests for your tier and network

---

*I.A. / CSV2GEO Creator*

Ready to geocode your addresses?

Use our batch geocoding tool to convert thousands of addresses to coordinates in minutes. Start with 100 free addresses.

Try Batch Geocoding Free →

Share this post: Twitter Facebook LinkedIn

← Back to Blog

Embedding a geocoding API in your SaaS without becoming an SRE

The failure modes that catch teams by surprise

How CSV2GEO rate limits work

Async geocoding — the pattern that removes the user from the blast radius

The cache layer — the single biggest cost lever

Concurrency tuning for batch jobs

Error taxonomy — 4xx is not the same as 5xx

Observability — the two metrics that matter most

How to ship this on Monday

Step 1: Audit your address volume and uniqueness ratio

Step 2: Add the geocode_status column and the async job first

Step 3: Implement the cache layer before the live API calls

Step 4: Implement the throttled client with the error taxonomy

Step 5: Add the two observability counters and set a 429-rate alert

The cost picture at each tier

What you should not do to save money

FAQ

Related Articles

Related articles

How to Cache Geocoding Results: TTL, Keys, and 90% Cost Reduction

Exponential Backoff for Geocoding: When to Retry, When to Stop

Rate Limiting a Geocoding Pipeline: Token Bucket vs Leaky Bucket vs Sliding Window

Observability for Geocoding Pipelines: The Metrics That Matter

Concurrency Tuning for Geocoding: Finding Your Sweet Spot