Validating shipping addresses at checkout with geocoding
Use geocoding confidence scores to accept, confirm, or reject shipping addresses at checkout. Patterns, code, and failure modes for engineering teams.
A customer types their shipping address. You display an order summary. They click "Place Order." Somewhere between that click and the driver scanning the parcel at the doorstep, the address turns out to be wrong — the flat number is missing, the postcode belongs to the next town, the street name is one letter off from the real street. The parcel comes back. You reship. The customer opens a support ticket. If the reship also fails, you refund. The gross margin on that order is gone.
Returns and re-ships driven by bad addresses are not a small-scale nuisance. For e-commerce businesses shipping more than a few hundred orders per day, even a 1% bad-address rate produces a noise floor of failed deliveries that costs more than a mid-tier engineering hire to absorb. The common instinct is to buy a dedicated address-verification product — a black-box boolean API that returns valid: true or valid: false with no explanation and no tuning surface. That instinct is understandable and usually wrong.
What actually works is a geocode call that returns a confidence score, followed by a three-branch decision gate: accept silently, ask the customer to confirm, or reject and prompt for correction. The decision gate is yours — you tune the thresholds against your own returns data. The geocoding API provides the signal; your code makes the call. This post shows you how to build it.
Why confidence scores beat boolean validation
A boolean "valid" flag on an address is misleading in both directions. An address can be lexically valid — it parses, the postcode format is correct, the street exists — and still be undeliverable because the house number is not in the carrier's delivery records, or because the building was renumbered, or because the customer typed a real address that is not theirs. Conversely, an address that fails a format check might be perfectly deliverable: an older property with an unusual hyphenated lot number, a rural route address, a military APO. A boolean validator optimised for format correctness will fail real addresses and pass fictitious ones with some regularity.
A geocoding API that returns a confidence score is doing something more useful: it is telling you how precisely it was able to locate the address you gave it. A score near 1.0 means the geocoder found an exact match against its address dataset — the street, number, unit, and postcode all resolved. A score near 0.5 means it matched to street level but could not resolve the number, which might mean the number is wrong, or that the property is simply absent from the dataset. A score near 0.2 means it only matched to city or postcode level — the address string is ambiguous or largely unrecognisable.
The confidence score does not tell you whether the address will be delivered. It tells you how much of the address the geocoder was able to resolve, which is the proxy signal you need for deciding whether to trust the customer's input. When you pair the score with the returned components — the normalised street name, the matched postcode, the resolved city — you can also surface the standardised form of the address back to the customer, which reduces carrier mis-sorts even when the original input was approximately correct.
CSV2GEO's forward geocoding endpoint returns a confidence score per result, alongside the full set of resolved address components. There is no separate "address verification" boolean — you geocode, inspect the confidence, inspect the components, and decide. That design is intentional: it gives you the tuning surface that a boolean endpoint strips away.
The three-branch decision gate
Before writing a line of code, define the gate. You need three outcomes:
Accept. The geocoder matched confidently. The returned address components are consistent with what the customer typed. Store the normalised form, proceed to fulfilment. No friction added to the checkout flow.
Confirm. The geocoder matched but with moderate confidence, or the normalised address differs meaningfully from the customer's input (different postcode, corrected street name, missing unit inferred from context). Show the customer the normalised address and ask: "Did you mean this?" One click to confirm, one click to edit. This is the "soft interrupt" — it catches the majority of fixable errors without blocking legitimate orders.
Reject. The geocoder returned low confidence or no match. The address is unresolvable — wrong city/postcode combination, street does not exist, completely unparseable. Block the order from proceeding with an explicit error and an invitation to correct the address. Do not soft-pedal this: "We could not locate this address. Please check the street name and postcode before placing your order." Hard but honest.
The threshold values that separate these three branches are the thing you have to tune against your own data. A reasonable starting point for a UK or US e-commerce checkout:
| Confidence score | Branch | |---|---| | ≥ 0.85 | Accept | | 0.55 – 0.85 | Confirm | | < 0.55 | Reject |
These numbers are a starting point, not a prescription. If your carrier data shows that a specific score band correlates with a re-ship rate above your acceptable threshold, shift the boundary. If you operate in a country where address geocoding coverage is thinner, a score of 0.75 might not mean the same thing as it does for a major US metro. The confidence scores explained post goes into the mechanics in detail; read it before you finalise your thresholds.
What the API call looks like
The CSV2GEO forward geocoding endpoint is a single GET request. Here it is bare:
curl -G "https://csv2geo.com/api/v1/geocode" \
--data-urlencode "q=42 Whitfield Street, London W1T 2RH" \
--data-urlencode "api_key=$CSV2GEO_API_KEY"A successful response looks like this (abbreviated to the fields your gate logic cares about):
{
"results": [
{
"confidence": 0.91,
"formatted_address": "42 Whitfield Street, Fitzrovia, London, W1T 2RH, United Kingdom",
"components": {
"house_number": "42",
"road": "Whitfield Street",
"suburb": "Fitzrovia",
"city": "London",
"postcode": "W1T 2RH",
"country_code": "gb"
},
"lat": 51.5194,
"lng": -0.1352
}
]
}The confidence field is what gates the branch. The components object is what you use to surface the normalised address back to the customer, and to detect cases where the geocoder silently corrected something the customer typed wrong — a different postcode, a slightly different street name — that deserve a confirmation prompt even when confidence is high.
Building the gate in Python
A self-contained function that returns the branch outcome plus the normalised address string. This is what you call from your checkout backend — not from the browser, because the API key must stay server-side.
import os
import requests
API = "https://csv2geo.com/api/v1/geocode"
KEY = os.environ["CSV2GEO_API_KEY"]
ACCEPT_THRESHOLD = 0.85
CONFIRM_THRESHOLD = 0.55
def validate_shipping_address(raw_address: str) -> dict:
"""
Returns:
{
"branch": "accept" | "confirm" | "reject",
"confidence": float | None,
"normalised": str | None, # geocoder's formatted_address
"components": dict | None,
}
"""
try:
r = requests.get(
API,
params={"q": raw_address, "api_key": KEY},
timeout=5,
)
r.raise_for_status()
except requests.Timeout:
# Fail open: do not block checkout on API timeout.
return {"branch": "accept", "confidence": None,
"normalised": None, "components": None}
except requests.HTTPError as exc:
if exc.response is not None and exc.response.status_code < 500:
# 4xx — bad request, treat as unresolvable.
return {"branch": "reject", "confidence": None,
"normalised": None, "components": None}
# 5xx — fail open.
return {"branch": "accept", "confidence": None,
"normalised": None, "components": None}
data = r.json()
results = data.get("results") or []
if not results:
return {"branch": "reject", "confidence": 0.0,
"normalised": None, "components": None}
top = results[0]
score = top.get("confidence", 0.0)
normalised = top.get("formatted_address")
components = top.get("components", {})
if score >= ACCEPT_THRESHOLD:
branch = "accept"
elif score >= CONFIRM_THRESHOLD:
branch = "confirm"
else:
branch = "reject"
return {
"branch": branch,
"confidence": score,
"normalised": normalised,
"components": components,
}Notice the timeout is 5 seconds and the error handling is explicit about fail-open versus fail-closed. Timing out on a geocoding API call is not a reason to block a paying customer from completing an order — it is a reason to log the miss and proceed. The 5xx fail-open / 4xx fail-closed split is the right default: a server error on the API side is transient, a 4xx almost always means the request was malformed.
Building the gate in Node
The same logic in Node using the native fetch API, suitable for a Next.js API route or an Express endpoint:
const API = 'https://csv2geo.com/api/v1/geocode';
const KEY = process.env.CSV2GEO_API_KEY;
const ACCEPT_THRESHOLD = 0.85;
const CONFIRM_THRESHOLD = 0.55;
export async function validateShippingAddress(rawAddress) {
const url = `${API}?q=${encodeURIComponent(rawAddress)}&api_key=${KEY}`;
let data;
try {
const controller = new AbortController();
const timer = setTimeout(() => controller.abort(), 5000);
const r = await fetch(url, { signal: controller.signal });
clearTimeout(timer);
if (r.status >= 400 && r.status < 500) {
return { branch: 'reject', confidence: null, normalised: null, components: null };
}
if (!r.ok) {
// 5xx — fail open
return { branch: 'accept', confidence: null, normalised: null, components: null };
}
data = await r.json();
} catch (err) {
// Timeout or network error — fail open
return { branch: 'accept', confidence: null, normalised: null, components: null };
}
const results = data?.results ?? [];
if (results.length === 0) {
return { branch: 'reject', confidence: 0, normalised: null, components: null };
}
const top = results[0];
const score = top.confidence ?? 0;
const normalised = top.formatted_address ?? null;
const components = top.components ?? {};
let branch;
if (score >= ACCEPT_THRESHOLD) branch = 'accept';
else if (score >= CONFIRM_THRESHOLD) branch = 'confirm';
else branch = 'reject';
return { branch, confidence: score, normalised, components };
}Call this from your API route handler, never from the client. The API key stays in the server environment. The response shape is the same as the Python version — one call, four fields, branch drives the UX.
How to use the components to detect silent corrections
Confidence alone does not catch every case worth surfacing to the customer. Consider an input like "42 Whitfeld Street, London W2T 2RH" — the street name has a typo and the postcode is one character off. The geocoder may return confidence 0.88 (high enough to accept) because it was able to resolve the intended address, but the components.road will say "Whitfield Street" and components.postcode will say "W1T 2RH" — both different from the customer's input. A pure confidence-gate would silently accept and ship to the corrected address without telling the customer. That is usually fine, but not always — the customer might have intentionally typed the second address, and the geocoder chose the first.
The safest pattern: whenever the normalised address differs from the customer's input in a field that materially affects delivery (street name, house number, postcode), escalate to confirm regardless of confidence score. Here is a minimal comparison helper:
def address_was_corrected(raw: str, components: dict) -> bool:
"""
Rough check: did the geocoder change the house number or postcode?
Extend with city, road, etc. as your delivery geography demands.
"""
raw_lower = raw.lower()
house = (components.get("house_number") or "").lower()
postcode = (components.get("postcode") or "").replace(" ", "").lower()
# If either the house number or postcode from the geocoder
# doesn't appear in the raw string, treat it as a correction.
if house and house not in raw_lower:
return True
if postcode and postcode not in raw_lower.replace(" ", ""):
return True
return FalseCombine it with the branch logic:
result = validate_shipping_address(raw_address)
if result["branch"] == "accept" and result["components"]:
if address_was_corrected(raw_address, result["components"]):
result["branch"] = "confirm"This adds about ten lines and catches the silent-correction case. The customer sees the geocoder's version of their address and confirms it. Friction is minimal; the catch is real.
A complete HowTo: shipping to production
Step 1: Obtain and scope your API key
Log into your CSV2GEO account, navigate to /api-keys, and create a key scoped to your checkout service. Use a separate key for each service that calls the API — this makes per-service metering readable on your dashboard and means a leaked key can be rotated without affecting other services. Set the key as an environment variable (CSV2GEO_API_KEY) in your application runtime. Never hardcode it.
Step 2: Instrument the validation call
Wire the validate_shipping_address function (or the Node equivalent) into your order-creation endpoint — the one that fires when the customer submits the checkout form. Call it synchronously, before the order is written to your database. The branch result controls whether you proceed, prompt, or block:
result = validate_shipping_address(request.data["shipping_address"])
if result["branch"] == "accept":
create_order(request.data)
return {"status": "ok"}
elif result["branch"] == "confirm":
return {
"status": "confirm",
"message": "We found a possible match — please confirm your address.",
"suggested": result["normalised"],
}
else: # reject
return {
"status": "error",
"message": "We could not locate this address. "
"Please check the street name and postcode.",
}The confirm branch returns the normalised address to the front end. Your checkout UX shows it as a suggestion with a two-button prompt ("Use this address" / "Edit address"). If the customer confirms, re-submit with a skip_validation: true flag and proceed directly to order creation — do not geocode again on the confirmed submission.
Step 3: Define and log your thresholds
Before you go live, log every geocoding result — confidence score, branch, raw input, normalised output — to your analytics pipeline. Do not log PII to a third-party analytics service; if your analytics stack is not under your control, log only the score and the branch, not the address itself. After two weeks of real traffic, plot the distribution of confidence scores against your returns data. This is the calibration pass. If the "accept" band is producing re-ships at a rate above your target, raise the accept threshold. If the "reject" band is blocking legitimate orders from rural addresses, lower the reject threshold or add a country/region rule.
Step 4: Handle the timeout and retry paths
Production checkout flows run under a latency budget. The geocoding call sits in the critical path — a 5-second hang blocks the customer. Use the fail-open timeout pattern shown in the code above. For retries, use exponential backoff with a maximum of two retries; a third failure is a signal to fail open and log the miss, not to keep the customer waiting. The exponential backoff post covers the implementation in detail.
Step 5: Cache aggressively on the server side
Addresses do not move. If the same address is submitted in checkout twice — a customer retrying after a payment failure, a repeat customer, a B2B buyer with a fixed ship-to — you do not need to geocode it again. Cache the geocoding result in Redis keyed on a normalised form of the raw address string with a TTL of 30 days. A one-line normalisation (lowercase, strip punctuation, collapse whitespace) before the cache key computation is enough to collapse most variants of the same address into one entry. The caching post shows the pattern in full; for a checkout service with moderate repeat traffic, a 30-day server-side cache typically reduces geocoding API calls by 60–80% on the address validation path.
Failure modes to design for explicitly
The timeout that becomes a block. If your geocoding client has no timeout and the API is slow, your checkout endpoint hangs for thirty seconds and the customer abandons. Always set a timeout. Always fail open on timeout. Log the timeout event — if it is happening more than once per hour in production, you have a latency-budget problem to debug, not a geocoding problem.
The confident wrong match. The geocoder returns confidence 0.92 for an address that is the wrong street entirely — a street with a similar name two towns away. This happens at low frequency and is hard to prevent purely with confidence scores. The component-comparison check above catches a subset of these cases. The remainder show up as re-ships. Track your re-ship rate by confidence band; if the 0.85+ band is producing unexpected failures, add a city or region consistency check against the customer's billing address.
The rural address that geocodes poorly everywhere. Rural addresses in low-density regions often match to street level or town level, not to house number. A threshold of 0.55 for the confirm branch will send most of these to the "confirm" branch, which is the right outcome — you show the customer the best match you have and let them confirm or correct. Do not automatically reject these; you will block real orders.
The international address outside your coverage. CSV2GEO covers 39 countries. If your checkout serves customers in a country not in that set, the geocoder will return low confidence or no result. Build a country-code check: if the country code from the customer's address is outside your covered set, skip the validation entirely and proceed to order creation with a flag that marks the address as unvalidated. Do not reject orders from uncovered countries; route them to a manual review queue.
The cost arithmetic
Every address validated at checkout is one geocoding API call, one credit. The free tier provides 3,000 calls per day — enough to validate every order on a small-to-medium checkout without spending anything during pilot. Paid plans start at $54 per month for 100,000 calls. At that rate, each checkout validation costs $0.00054 — roughly half a tenth of a cent. A re-ship typically costs between $8 and $25 depending on carrier and package weight. The break-even is one re-ship prevented for every 15,000 to 46,000 validated orders. Most e-commerce operations with a 1% bad-address rate break even in the first week of production traffic.
See the live pricing at csv2geo.com/pricing/api.
What this does not replace
Carrier delivery confirmation. Geocoding validation catches addresses the geocoder cannot resolve. It does not catch addresses that resolve cleanly but are rejected by the carrier at the point of delivery — a real street and number that is a vacant lot, a business address where nobody signs for parcels. Carrier delivery data feeds a different problem and requires a different integration.
Fraud signals. A confidence score says nothing about whether the address is being used fraudulently — a real address used by a bad actor is still a geocodable address. Fraud detection is a separate concern; the geocoding gate should not be stretched to carry that weight.
Unit-level disambiguation. In a multi-unit building, the geocoder may resolve to the building confidently without being able to confirm that flat 4B exists. This is a known limitation of address geocoding generally. The confirm branch catches the cases where confidence drops because of the missing unit number; it does not guarantee that the unit the customer typed is real.
Frequently Asked Questions
Why geocode at checkout rather than at address entry?
Both are defensible, but validating at the point of order submission is the more reliable pattern. Address entry can happen before the customer has completed the full address, which produces false low-confidence results. At submission you have the complete address string. You can also cache the validation result on the server and reuse it on retry without a second API call.
Should I validate billing addresses as well as shipping addresses?
The geocoding-confidence gate is designed for deliverability, which is a shipping concern. Billing address validation is a payment fraud concern with a different toolchain — AVS checks from your payment processor are the standard instrument there. Geocoding the billing address is useful for segmentation analytics but is not a fraud signal by itself.
What do I do with the confidence score when I store the order?
Store it. It is a useful feature for post-hoc analysis: plot re-ship rate against the confidence score band that the address scored at validation time, and you have a calibration dataset for tuning your thresholds. A column called geocode_confidence FLOAT NULL on your orders table costs nothing and pays dividends in the first returns-analysis sprint.
How do I handle the `confirm` branch without adding too much friction?
The confirm prompt should be a single step: show the geocoder's normalised address alongside the customer's input, with two buttons — "Use suggested address" and "Keep my address". The customer makes one click. Do not redirect to a new page; render the prompt inline in the checkout flow. Conversion drop on a well-designed inline confirm prompt is under 1% in practice.
Is it safe to call the API from a serverless function?
Yes. The endpoint is a standard HTTPS GET; any runtime that can make an outbound HTTPS request works. Set the timeout to 5 seconds and handle the fail-open case explicitly — cold-start latency on serverless functions can eat into your budget before the geocoding call even fires. If cold-start variance is a problem, warm the function with a synthetic geocoding call on startup, or use the server-side cache pattern to absorb repeat addresses without a round-trip.
Do I need to tell the customer their address is being geocoded?
This is a legal question, not an engineering one, and the answer varies by jurisdiction. In the UK and EU, geocoding a customer-provided address as part of order fulfilment generally falls under the legitimate-interests basis for processing without explicit consent, provided you are not retaining the coordinates as a separate data asset. Review with your data-protection counsel; do not take engineering blog posts as legal advice.
What if my order volumes exceed the free tier during a sale event?
The free tier provides 3,000 calls per day. A sale event that pushes you over that limit will start returning 429 Too Many Requests errors from the API. The fail-open pattern in the code above means those errors translate to accepted orders with unvalidated addresses, not to blocked checkouts. Upgrade to a paid plan before the event; the entry tier at $54/month is 100,000 calls, which covers most mid-market sale spikes. See csv2geo.com/pricing/api.
Related Articles
- Geocoding confidence scores explained — how the score is computed and what it does and does not tell you
- Caching geocoding results — 90% cost reduction — the server-side cache pattern that cuts your API spend on repeat addresses
- Exponential backoff — when to retry, when to stop — the retry policy that keeps checkout live when the geocoding API is slow
- Benchmarking geocoding APIs — honest numbers — what to measure when evaluating geocoding accuracy for address validation
- Idempotent geocoding — safe to retry — why geocoding calls are safe to retry and how to design the retry path correctly
---
*I.A. / CSV2GEO Creator*
Use our batch geocoding tool to convert thousands of addresses to coordinates in minutes. Start with 100 free addresses.
Try Batch Geocoding Free →