Mapping your insurance book exposure by zone

Batch geocode your policy book and assign every address to a risk zone. REST patterns, Python + Node code, latency budgets included.

| June 17, 2026
Mapping your insurance book exposure by zone

Most carriers can tell you their aggregate exposure by state in thirty seconds. Fewer can tell you what percentage of their book sits inside a named wind corridor, within two kilometres of a mapped wildfire buffer, or below 5 m elevation in a coastal county. The reason is almost never a shortage of risk models — it is that translating a million policy addresses into zone assignments is harder than it looks when you are starting from a raw CSV.

The data problem is not new. What has changed is that the tooling for solving it no longer requires a six-month GIS project, a specialist data engineer, and a licensed spatial database. A batch geocoding endpoint plus a boundary lookup endpoint, both over plain REST, is now a credible production architecture for the zone-assignment pipeline a carrier's catastrophe team actually needs.

This post is the blueprint. By the end you will have working code that takes a book of policy addresses from raw strings to zone-enriched rows — with confidence scores, a retry strategy, an honest cost model, and the failure modes you will hit in production labelled clearly so you can design around them.

Why address-to-zone is harder than it looks

The naive version of this problem is: geocode the address, do a point-in-polygon test, write the zone label. That is correct, and it is also incomplete in three ways that bite every team eventually.

Geocoding confidence is not binary. A geocoder that cannot resolve an address to rooftop level will fall back to street level, then to centroid of the postal code, then to centroid of the city. Each fallback level increases the spatial error. A postal-code centroid might be 3 km from the actual property — which is enough to place an address in the wrong flood zone, the wrong wind tier, or on the wrong side of a county boundary that splits a rate territory. If you do not track confidence and apply a minimum threshold before calling the result "good", you are silently mis-rating some fraction of your book, and you will not find out until the loss ratio surprises you after a cat event.

Zone polygons are irregular and overlapping. A single address can sit inside a Special Flood Hazard Area, a state-defined wind pool zone, a wildfire urban-interface buffer, and a named cat-model tier simultaneously. None of these are mutually exclusive, and none of them align to ZIP codes or state boundaries. The right data model for the output is not zone = X but a set of zone memberships per address, each with its own source and its own confidence.

Address quality in a book of business is variable. Policies written 15 years ago may have addresses that were valid then and are now unmatchable — rural routes converted to 911-formatted addresses, businesses that moved without a policy update, seasonal properties with mailing addresses that differ from the insured location. A production pipeline must separate "geocoded confidently", "geocoded with low confidence — needs review", and "could not geocode" into three distinct buckets, handle all three, and make the split visible to the team running the job.

Build for all three cases before you ship.

The two endpoints that power the pipeline

`POST /api/v1/geocode/batch` — takes up to 500 addresses per request as a JSON array and returns a geocoded result per address, with coordinates, a confidence score, and a match type (rooftop, street, postal, city). This is the right endpoint for processing a book of business: one HTTP call per 500 rows, ordered response, null result for addresses that could not be resolved.

`GET /api/v1/boundaries` — takes a latitude and longitude and returns the set of administrative and regulatory boundaries that contain that point. At minimum you get country, region, county, and postal code. With the right layers parameter you can request specific boundary types — flood zone, census tract, fire district — and the endpoint returns a membership array with the polygon name, the source, and the coverage type. One credit per point.

Both endpoints share the same API key, the same authentication header, and the same error shape. A pipeline that chains them does so with two REST calls per 500 addresses — geocode the batch, then point-test each successfully geocoded address against boundaries.

The free tier covers 3,000 calls per day. A 500-address batch is one call, so 3,000 calls per day is 1.5 million address rows per day before you reach for a paid plan. For an initial book-enrichment run on a large portfolio, the entry paid tier at $54/month for 100,000 calls is the starting point. See the live pricing at csv2geo.com/pricing/api.

What the batch geocode response looks like

Before writing any pipeline code, understand the shape of the response. A two-address batch:

curl -s -X POST "https://csv2geo.com/api/v1/geocode/batch" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $CSV2GEO_KEY" \
  -d '{
    "addresses": [
      {"id": "POL-001", "q": "1234 Bayshore Dr, Tampa, FL 33611"},
      {"id": "POL-002", "q": "9999 Nonexistent Ln, Nowhere, ZZ 00000"}
    ]
  }'

Returns something like:

{
  "meta": {"count": 2, "matched": 1, "failed": 1},
  "results": [
    {
      "id": "POL-001",
      "lat": 27.9005,
      "lng": -82.4892,
      "confidence": 0.94,
      "match_type": "rooftop",
      "formatted_address": "1234 Bayshore Dr, Tampa, FL 33611, USA"
    },
    {
      "id": "POL-002",
      "lat": null,
      "lng": null,
      "confidence": null,
      "match_type": null,
      "formatted_address": null,
      "error": "no_match"
    }
  ]
}

Three things to note. First, the response is ordered and the same length as the input — results[i] always corresponds to addresses[i]. Second, the id field you supply is reflected back, so you can join the response to your policy rows without positional arithmetic. Third, a failed geocode returns null for spatial fields plus an error field — it does not throw an HTTP error, which means a batch with some failures still returns 200 OK. Branch on the error field per row, not on the HTTP status code.

A production Python pipeline

The full pipeline in 80 lines. It reads a policy CSV, geocodes in batches of 500, point-tests each result against boundaries, and writes three output files: enriched rows, low-confidence rows for manual review, and failed rows for address correction.

import csv
import os
import time
import requests

API        = "https://csv2geo.com/api/v1"
KEY        = os.environ["CSV2GEO_KEY"]
BATCH_SIZE = 500
MIN_CONF   = 0.75   # anything below this goes to the review bucket

HEADERS = {"X-API-Key": KEY, "Content-Type": "application/json"}

def geocode_batch(rows):
    payload = {"addresses": [{"id": r["policy_id"], "q": r["address"]} for r in rows]}
    r = requests.post(f"{API}/geocode/batch", json=payload,
                      headers=HEADERS, timeout=60)
    r.raise_for_status()
    return r.json()["results"]

def get_boundaries(lat, lng):
    r = requests.get(f"{API}/boundaries",
                     params={"lat": lat, "lng": lng,
                             "layers": "flood,wind,fire,admin",
                             "api_key": KEY},
                     timeout=30)
    if r.status_code == 404:
        return {}
    r.raise_for_status()
    return r.json().get("boundaries", {})

def chunks(seq, n):
    for i in range(0, len(seq), n):
        yield seq[i : i + n]

with (open("policy_book.csv") as fin,
      open("enriched.csv",     "w", newline="") as fout,
      open("low_confidence.csv","w", newline="") as flow,
      open("failed.csv",       "w", newline="") as ffail):

    reader   = csv.DictReader(fin)
    base_fields = reader.fieldnames
    out_fields  = base_fields + ["lat","lng","confidence","match_type",
                                  "flood_zone","wind_zone","fire_zone"]

    writers = {
        "ok":   csv.DictWriter(fout,  fieldnames=out_fields),
        "low":  csv.DictWriter(flow,  fieldnames=out_fields),
        "fail": csv.DictWriter(ffail, fieldnames=base_fields + ["error"]),
    }
    for w in writers.values():
        w.writeheader()

    rows = list(reader)
    row_map = {r["policy_id"]: r for r in rows}

    for batch in chunks(rows, BATCH_SIZE):
        geo_results = geocode_batch(batch)

        for geo in geo_results:
            original = row_map[geo["id"]]

            if geo.get("error") or geo["lat"] is None:
                original["error"] = geo.get("error", "no_match")
                writers["fail"].writerow(original)
                continue

            lat, lng, conf = geo["lat"], geo["lng"], geo["confidence"]
            bounds = get_boundaries(lat, lng)

            row = {**original,
                   "lat": lat, "lng": lng,
                   "confidence": conf,
                   "match_type": geo["match_type"],
                   "flood_zone": bounds.get("flood", {}).get("name", ""),
                   "wind_zone":  bounds.get("wind",  {}).get("name", ""),
                   "fire_zone":  bounds.get("fire",  {}).get("name", "")}

            bucket = "ok" if conf >= MIN_CONF else "low"
            writers[bucket].writerow(row)

        time.sleep(0.1)   # light pacing — see backoff note below

The time.sleep(0.1) between batches is a courtesy, not a requirement. On a paid plan you have headroom for concurrent requests — remove the sleep and run the boundary calls in a thread pool if wall-clock time matters. The exponential backoff strategy for 429 and 503 responses is covered in depth in Exponential Backoff — When to Retry, When to Stop; the short version is: back off on 429, retry up to three times on 503, dead-letter anything that fails all three retries.

The same pipeline in Node

For teams running the enrichment job inside a Node service or a Lambda function:

import { createReadStream, createWriteStream } from 'node:fs';
import { parse } from 'csv-parse/sync';
import { stringify } from 'csv-stringify/sync';

const API  = 'https://csv2geo.com/api/v1';
const KEY  = process.env.CSV2GEO_KEY;
const BATCH = 500;
const MIN_CONF = 0.75;

async function geocodeBatch(rows) {
  const body = {
    addresses: rows.map(r => ({ id: r.policy_id, q: r.address }))
  };
  const res = await fetch(`${API}/geocode/batch`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json', 'X-API-Key': KEY },
    body: JSON.stringify(body),
  });
  if (!res.ok) throw new Error(`geocode batch ${res.status}`);
  return (await res.json()).results;
}

async function getBoundaries(lat, lng) {
  const url = `${API}/boundaries?lat=${lat}&lng=${lng}` +
              `&layers=flood,wind,fire,admin&api_key=${KEY}`;
  const res = await fetch(url);
  if (res.status === 404) return {};
  if (!res.ok) throw new Error(`boundaries ${res.status}`);
  return (await res.json()).boundaries ?? {};
}

// Caller omitted for brevity; the pattern is identical to the Python version:
// chunk rows → geocodeBatch → per-row getBoundaries → write three output files

The csv-parse and csv-stringify packages are the only dependencies beyond the standard library. For a Lambda you can inline the parsing and use the Node 18+ built-in fetch.

Designing the zone-assignment schema

The output of the pipeline is only as useful as the schema you attach it to. Three design decisions that matter.

Store raw zone labels, not numeric tiers. The boundary endpoint returns the official zone label as a string — AE, X500, WUI-High, and so on. Your actuarial team and your catastrophe model both expect the raw label. Translating to a numeric tier (flood_tier = 3) at pipeline time means you own the translation table forever, and it breaks whenever the zone definitions change. Store the raw label; compute the tier at query time from a mapping table that your actuarial team owns.

Separate match_type from confidence. A rooftop match at 0.94 confidence and a postal match at 0.94 confidence are not equivalent. The rooftop match placed the address within metres of the property. The postal match placed it at the centroid of a ZIP code that might be 3 km wide. Both have the same confidence score — confidence reflects the geocoder's certainty about *what it resolved to*, not whether the result is spatially precise enough for your use case. Add a secondary filter: match_type IN ('rooftop', 'street') for the confident bucket; route postal and city matches to the review bucket regardless of confidence.

Version the enrichment run. Add an enriched_at timestamp and a pipeline_version field to every enriched row. When you re-run the pipeline after a zone-definition update or a geocoder improvement, you can identify which rows have stale zone assignments and re-process only those. A book that gets re-enriched annually without versioning is a book where you cannot answer "which policies were rated on the old wind-zone map" after a cat event.

How to handle the three output buckets

Every book enrichment run produces three populations. Each needs a different workflow.

Confident matches go straight into the rating engine or the cat-model input file. No human touch needed. The pipeline can write these directly to the downstream table.

Low-confidence matches need a human eye or an address-correction step. Build a lightweight review queue — a simple web form is enough — that shows the underwriter the original address, the geocoded point on a map, and the zone assignments. The underwriter either confirms (moves to confident) or corrects the address (re-geocodes, then confirms). Aim to clear the low-confidence bucket within one business week of a book run.

Failed geocodes represent addresses the geocoder could not resolve at all. Common causes: PO Box addresses used as insured location (common in rural books), old rural route formats, addresses in territories or commonwealths with thin coverage, simple typos from policy entry. Route these to a data-hygiene team. The fix is usually a phone call or a satellite-image cross-reference against a parcel database — not something you automate. Expect 1-3% of a mature book to land here.

Track the ratio of each bucket over time. A confident-match rate that drops between two quarterly runs is a signal that your address data quality is deteriorating — new business being written with sloppy address entry, or a system migration that mangled address formatting.

Latency budget for a million-policy book

No invented numbers here — the honest answer is "measure it for your network and plan size." What you can reason about:

  • At BATCH_SIZE = 500, a million policies require 2,000 geocode batch calls.
  • Each boundary call is one HTTP round-trip per successfully geocoded address. With a thread pool of 20 concurrent workers, a million boundary calls run in roughly wall-clock time = (1,000,000 / 20) × mean_round_trip. On a cloud instance in the same region as the API, mean round-trip is typically under 100 ms.
  • The dominant variable is your own network and the degree of parallelism you run. Start with 10 concurrent boundary workers on a pilot of 10,000 policies, measure actual wall-clock time, then extrapolate to your full book size.

The approach that does not scale is a sequential for-loop with no concurrency. For book sizes above 50,000, parallelise the boundary calls. For book sizes above 500,000, run the pipeline on a cloud VM co-located with the API rather than from a developer laptop.

For a deeper treatment of concurrency tuning, see Concurrency Tuning for Geocoding Pipelines.

Step-by-step production checklist

Step 1: Audit your address data before you geocode

Pull a 1,000-row random sample from the book. Run it through the geocoder manually. Record the distribution of match_type and confidence. If more than 10% land below MIN_CONF or in postal/city match types, your address quality is the bottleneck — fix the source data first. Common quick wins: strip PO Box prefixes, expand state abbreviations, append ZIP codes where missing.

Step 2: Set your confidence threshold and match-type rules

Decide what counts as "good enough for automated zone assignment" for your specific use case. A flood-zone assignment for a coastal property needs higher precision than a wind-tier assignment for an inland mountain property. A reasonable default: confidence >= 0.75 AND match_type IN ('rooftop', 'street'). Adjust the threshold up (to 0.85) if your book is concentrated in complex urban markets; adjust down cautiously only if you have a strong manual review step.

Step 3: Run a pilot on a single state or county

Before committing to a full book run, enrich one geographic slice end-to-end. Validate the zone assignments against any ground truth you have — prior cat-model runs, official flood-map lookups, anything that gives you a cross-check. Fix pipeline bugs before you scale. The investment is one afternoon; the payoff is not discovering a systematic match_type bug 800,000 rows into a book run.

Step 4: Run the full book with parallelism and dead-letter logging

Scale to the full book with the thread pool sized to your network's sweet spot (see above). Write every failed call — HTTP errors, timeouts, geocode failures — to a dead-letter file with the full error and the input row. Do not swallow errors silently. At the end of the run, the dead-letter file is your action list.

Step 5: Load the enriched output and run a sanity-check query

Before handing the enriched CSV to the cat modelling team, run five checks: (1) row count equals input count; (2) confident-match rate is within 3% of your pilot sample; (3) zone-label distribution makes geographic sense (coastal book has high flood-zone count; inland book does not); (4) no null lat/lng rows in the confident bucket; (5) enriched_at and pipeline_version fields are populated on every row. If all five pass, ship it. If any fail, investigate before the output reaches a downstream model.

Step 6: Schedule quarterly re-enrichment runs

Zone definitions change. The flood map gets revised after a major event. A state expands its wind-pool boundary. A new wildfire-interface buffer is drawn after a season's fires. Enrichment is not a one-time job — it is a quarterly maintenance task. Version the schema so each run is traceable, and build an alerting rule that flags policies whose zone assignment changed between runs for an underwriter review.

Cost model for a real book

A 500,000-policy book, quarterly enrichment:

| Operation | Count per run | Credits | |---|---|---| | Geocode batch (500/call) | 1,000 calls | 1,000 credits | | Boundary lookup (1/address, confident matches ~97%) | ~485,000 calls | 485,000 credits | | Total per quarter | | ~486,000 credits |

On the entry paid tier ($54/month for 100,000 calls), a run of this size spans roughly 5× the monthly allowance — so the right bracket is one tier up, or you split the run across two billing months. The initial geocode run is the most expensive because there is no cache to hit. Subsequent quarterly runs benefit from caching on addresses that have not changed — see Caching Geocoding Results — 90% Cost Reduction for the exact pattern. In practice, 80-90% of a stable book's addresses are unchanged quarter-over-quarter, which reduces the effective quarterly credit spend substantially.

The marginal cost per policy enriched is well under a cent. The cost of mis-rating a coastal policy in the wrong flood zone after a major hurricane is orders of magnitude larger.

Frequently Asked Questions

What confidence threshold should we use for automated zone assignment? Start at 0.75 combined with match_type IN ('rooftop', 'street'). This is a pragmatic default that works well for most US residential books. Raise it to 0.85 if your portfolio is concentrated in high-density urban markets where address ambiguity is higher. Never lower it below 0.70 for automated assignment — below that threshold, the spatial error is large enough to produce material zone mis-assignments in complex boundary areas.

Can we use the batch geocode endpoint for initial address validation at policy-binding time? Yes, and it is a good idea. A real-time geocode call at point of sale catches obvious bad addresses — mistyped ZIP codes, streets that do not exist in the given city — before they enter the book. Use the single-address GET /api/v1/geocode endpoint for real-time validation; use the batch endpoint for bulk book enrichment. Both are covered by the same plan.

How do we handle policies where the mailing address differs from the insured location? Store both addresses. Geocode the insured location address for zone assignment and risk rating. Geocode the mailing address for communications. If the insured location is not available and the mailing address is a PO Box, flag the policy for manual address collection. Do not use a PO Box for zone assignment — a PO Box centroid can be miles from the insured property.

What does the boundaries endpoint return for a point on a zone boundary line? For points within a few metres of a zone boundary, the endpoint resolves to one side of the line based on the polygon data. This is a known limitation of point-in-polygon testing. For policies where the geocoded point lands within 50 m of a zone boundary, add a boundary_proximity flag and route them to the underwriter queue — the spatial uncertainty of the geocode is large enough that the policy could genuinely be on either side.

Is there a way to enrich the book without storing coordinates ourselves? Technically yes — you can pass free-text addresses directly to the boundaries endpoint via the q parameter, which geocodes internally and returns the boundary result. But storing coordinates is strongly recommended for a book of business: you need them for every downstream spatial join, for cat-model inputs, for distance-to-coast calculations, and for any future enrichment run. Pay the one-time geocoding cost and keep the coordinates.

What happens when a zone definition changes after we have already enriched the book? The boundary endpoint always returns the current zone definitions. If a map revision changes the flood zone for 3,000 addresses in your book, a re-run of the boundary lookup for those addresses will return the new zone label. The reason to version your enrichment runs is precisely to make this detectable: a quarterly re-run that finds changed zone labels generates an automatic review list for the underwriting team.

How does this relate to the elevation enrichment we already do per policy? Elevation and zone boundaries are complementary signals. Elevation (from /api/v1/elevation) gives you the raw terrain height — a continuous number that feeds flood narrative, wind risk, and snow-load considerations. Zone boundaries give you the regulatory and cat-model classification — a label that determines the rate filed with your state regulator. Both live on the same enriched row, sourced from the same API key. See Adding Elevation to Property Data for the elevation pipeline in detail.

Related Articles

---

*I.A. / CSV2GEO Creator*

Ready to geocode your addresses?

Use our batch geocoding tool to convert thousands of addresses to coordinates in minutes. Start with 100 free addresses.

Try Batch Geocoding Free →