Bank branch catchment and CRA reporting with batch geocoding

Geocode loan applications, map branch catchment areas, and produce CRA assessment area data using batch REST calls. Patterns for BFSI engineering teams.

| June 20, 2026

Bank branch catchment and CRA reporting with batch geocoding

Community Reinvestment Act reporting demands a precise answer to a deceptively simple question: which census tract does each loan application belong to? The examiner will spot-check your HMDA LAR. The geocoding quality on file needs to hold up to scrutiny. Get the county wrong on a borderline address and you have a tract assignment error; get the tract wrong and you have a reporting error. Do that at scale and the examination conversation gets expensive.

The same coordinate-to-political-boundary lookup that powers CRA tract assignment also powers branch catchment analysis — which is the business problem that sits upstream of every CRA conversation. Where does the branch's loan origination actually come from geographically? Which geographies are underserved relative to the branch's assessment area? What does the deposit-taking geography look like compared to the lending geography?

This post walks through both problems end to end. The tools are the CSV2GEO Batch Geocoding endpoint and the Divisions endpoint, which returns the full stack of administrative subdivisions — county, census tract, MSA, state — for any coordinate. No GIS team needed. No shapefile management. No annual TIGER file import. REST calls from Python or Node, billed per call, with 461 million US addresses in the index and 39 countries covered.

The two problems and why they share an answer

CRA tract assignment. Every loan application your institution originates must carry a census tract code on the HMDA LAR filing. The regulatory answer to "how do I get the tract?" is: geocode the property address to a coordinate, then look up which census tract polygon contains that coordinate. That second step is the Divisions endpoint. The first step is the Geocoding or Batch Geocoding endpoint. Together they are a two-call pipeline per application.

Branch catchment analysis. A branch's assessment area is defined partly by its lending activity: typically the counties in which it originated or purchased a majority of its loans during the prior year. Before you can compute that, you need to know the county and tract for each loan — the same Divisions lookup. Once you have tracts and counties per loan, you can draw catchment rings, compare branch service areas, and identify geographies where lending activity is thin relative to population density.

Both problems reduce to the same pattern: geocode an address, get a coordinate, resolve the coordinate to its administrative divisions. The difference is what you do with the output — regulatory filing versus internal analytics — but the pipeline is the same.

What the Divisions endpoint returns

curl -G "https://csv2geo.com/api/v1/divisions" \
  --data-urlencode "lat=38.9072" \
  --data-urlencode "lng=-77.0369" \
  --data-urlencode "api_key=$CSV2GEO_API_KEY"

Response (abbreviated):

{
  "lat": 38.9072,
  "lng": -77.0369,
  "divisions": {
    "country": { "name": "United States", "code": "US" },
    "state": { "name": "District of Columbia", "fips": "11" },
    "county": { "name": "District of Columbia", "fips": "11001" },
    "census_tract": { "geoid": "11001006202", "name": "062.02" },
    "msa": { "name": "Washington-Arlington-Alexandria, DC-VA-MD-WV", "code": "47900" },
    "place": { "name": "Washington" }
  }
}

The geoid field on census_tract is the 11-digit FIPS code that belongs in column 19 of your HMDA LAR. You do not need to parse a shapefile; you do not need to maintain a local PostGIS topology. The API resolves the point-in-polygon lookup server-side and gives you a string you can write directly to your filing.

The endpoint also returns the MSA code, which matters for large-bank CRA examiners who assess community development performance within the full metropolitan area. County FIPS and state FIPS both come back in the same call, so the single Divisions call produces all the geographic keys your HMDA LAR needs.

Batch geocoding: processing a HMDA LAR in one overnight run

A mid-size community bank might originate 4,000 to 20,000 HMDA-reportable applications per year. A large regional institution might file 200,000. Neither number justifies a GIS team or a spatial database — they justify a good batch-geocoding call.

The Batch Geocoding endpoint accepts up to 100 addresses per request. At 100 addresses per call, a 20,000-application book is 200 HTTP calls. On a residential broadband connection with modest concurrency, that completes in a few minutes. On a production server with proper concurrency, it is seconds. See Concurrency Tuning — Geocoding Sweet Spot for the right concurrency number to use without tripping the rate limiter.

A minimal Python pipeline — geocode, then Divisions-lookup, then write enriched output:

import csv
import os
import time
import requests

API   = "https://csv2geo.com/api/v1"
KEY   = os.environ["CSV2GEO_API_KEY"]
BATCH = 100  # batch geocode accepts up to 100 per call

def chunks(seq, n):
    for i in range(0, len(seq), n):
        yield seq[i : i + n]

def geocode_batch(addresses):
    """addresses: list of raw address strings. Returns list of {lat, lng, confidence}."""
    payload = {"addresses": addresses, "api_key": KEY}
    r = requests.post(f"{API}/geocode/batch", json=payload, timeout=60)
    r.raise_for_status()
    return r.json()["results"]

def get_divisions(lat, lng):
    r = requests.get(
        f"{API}/divisions",
        params={"lat": lat, "lng": lng, "api_key": KEY},
        timeout=15,
    )
    r.raise_for_status()
    return r.json()["divisions"]

in_fields  = ["application_id", "property_address", "city", "state", "zip"]
out_fields = in_fields + [
    "lat", "lng", "confidence",
    "county_fips", "census_tract_geoid", "msa_code",
    "geo_source"
]

with open("applications.csv") as fin, \
     open("applications_enriched.csv", "w", newline="") as fout:

    reader = csv.DictReader(fin)
    writer = csv.DictWriter(fout, fieldnames=out_fields)
    writer.writeheader()
    rows = list(reader)

    for batch in chunks(rows, BATCH):
        # Build a single address string per row for the geocoder.
        addrs = [
            f"{r['property_address']}, {r['city']}, {r['state']} {r['zip']}"
            for r in batch
        ]
        geo_results = geocode_batch(addrs)

        for row, geo in zip(batch, geo_results):
            lat = geo.get("lat")
            lng = geo.get("lng")
            conf = geo.get("confidence", 0)

            divisions = {}
            if lat is not None and conf >= 0.7:
                try:
                    divisions = get_divisions(lat, lng)
                except requests.HTTPError:
                    pass  # log and continue; human review queue

            row["lat"]               = lat
            row["lng"]               = lng
            row["confidence"]        = conf
            row["county_fips"]       = (divisions.get("county") or {}).get("fips")
            row["census_tract_geoid"]= (divisions.get("census_tract") or {}).get("geoid")
            row["msa_code"]          = (divisions.get("msa") or {}).get("code")
            row["geo_source"]        = "csv2geo_v1"
            writer.writerow(row)

        time.sleep(0.1)  # polite pacing between batches

The same Node equivalent, for teams that already have a Node pipeline:

import { createReadStream, createWriteStream } from 'node:fs';

const API = 'https://csv2geo.com/api/v1';
const KEY = process.env.CSV2GEO_API_KEY;

async function geocodeBatch(addresses) {
  const r = await fetch(`${API}/geocode/batch`, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({ addresses, api_key: KEY }),
  });
  if (!r.ok) throw new Error(`geocode batch http ${r.status}`);
  return (await r.json()).results;
}

async function getDivisions(lat, lng) {
  const url = `${API}/divisions?lat=${lat}&lng=${lng}&api_key=${KEY}`;
  const r = await fetch(url);
  if (!r.ok) throw new Error(`divisions http ${r.status}`);
  return (await r.json()).divisions;
}

Two details that matter in production:

Confidence threshold. Any geocode result below 0.7 confidence should not flow straight to the LAR. Route it to a human-review queue. The confidence score is explained in depth in Geocoding Confidence Scores Explained. For HMDA purposes, "geocoded by street address" is a valid method code; "geocoded by ZIP centroid" is not equivalent and should be flagged differently in your filing.

Rate limiting. The time.sleep(0.1) in the Python script is polite but not always sufficient at high concurrency. Read Rate Limiting — Token Bucket vs Leaky Bucket before scaling to multiple workers. The right pattern is a shared token bucket across your worker pool, not a per-worker sleep.

Resolving the Divisions call at scale

The pipeline above makes one Divisions call per application after geocoding. That is correct but potentially slow if you are processing 200,000 rows sequentially. Two optimisations that do not require changing the API contract:

Cache by coordinate, not by application. Multiple applications on the same street often resolve to the same census tract. A local in-process dict (or Redis if you have multiple workers) keyed by (round(lat, 4), round(lng, 4)) collapses many Divisions calls for applications in the same area. In practice, a 200-application book in a single ZIP code might make 200 geocode calls but only 15 Divisions calls because the coordinate space is small. See Caching Geocoding Results — 90% Cost Reduction for the broader pattern.

Parallelise with bounded concurrency. The Divisions call is fast, but sequential calls on 20,000 rows waste wall-clock time. Use asyncio.gather in Python or Promise.all in Node with a semaphore that limits to 10–20 concurrent requests. Do not fan out all 20,000 simultaneously — that will hit the rate limiter and force retries. The sweet spot is modest concurrency with exponential backoff on 429s: see Exponential Backoff — When to Retry, When to Stop.

Branch catchment analysis: from loan coordinates to service area maps

Once you have a lat/lng and a county FIPS for every loan in your book, branch catchment analysis is an aggregation exercise. The shape of the pipeline is:

Group loans by originating branch.
For each branch, count loans per county FIPS and per census tract GEOID.
Compute the 80th-percentile radius — the distance from branch to applicant coordinate that covers 80% of originations.
Flag any county that contains more than 5 originations but sits outside the branch's currently-declared assessment area.

Step 3 needs nothing more than the Haversine formula and the lat/lng you already geocoded. Step 4 is a join between your county-aggregated origination table and the assessment area table you maintain in your loan origination system. Both are pure SQL once the geocoding and Divisions enrichment are done.

A worked aggregation query (PostgreSQL dialect, after enrichment is in the loans table):

-- Branch catchment: origination density per county, per branch
SELECT
  originating_branch_id,
  county_fips,
  COUNT(*) AS origination_count,
  AVG(loan_amount) AS avg_loan_amount,
  -- flag counties not in the declared assessment area
  CASE
    WHEN county_fips NOT IN (
      SELECT county_fips FROM assessment_areas
      WHERE branch_id = loans.originating_branch_id
    ) THEN true
    ELSE false
  END AS outside_assessment_area
FROM loans
WHERE application_year = EXTRACT(YEAR FROM CURRENT_DATE) - 1
  AND geo_confidence >= 0.7
GROUP BY originating_branch_id, county_fips
ORDER BY originating_branch_id, origination_count DESC;

The geo_confidence >= 0.7 filter is important: you do not want ZIP-centroid geocodes inflating county counts for the wrong county when the centroid lands near a county boundary.

Handling address quality problems in a lending book

Loan applications are not clean. Borrowers transpose street numbers. Loan officers abbreviate. Addresses sometimes belong to a parcel that was split or merged since the last database update. Three failure modes and how to handle each:

No match returned. The geocoder returns an empty results array. Do not silently drop the row. Route it to a human-review queue with the raw address and a flag of geocode_failure. On the HMDA LAR, this row should carry a geocoding method code that reflects the manual lookup that resolved it.

Low-confidence match. The geocoder returns a coordinate but with confidence < 0.7. This often indicates the address matched at the street level or the ZIP centroid, not to a specific parcel. Flag for human review. In a typical bank lending book, expect 2–5% of addresses to fall into this bucket — mostly rural routes, new construction, and condominium unit addresses with inconsistent unit formatting.

County boundary ambiguity. A coordinate within ~50 metres of a county boundary may resolve to either county depending on projection rounding. The Divisions endpoint uses server-side point-in-polygon with the authoritative TIGER boundary set, so this is handled correctly — but log a warning when the returned county differs from the county the loan officer recorded in the LOS. That discrepancy is worth a human glance before filing.

For a deeper treatment of geocoding accuracy and what "match type" means in practice, see Reverse-Geocoding Accuracy and the Distance Meters.

How to do it: a production pipeline in five steps

Step 1: Extract and normalise addresses from your LOS

Pull the prior year's HMDA-reportable applications from your loan origination system. Normalise the address fields into a single string: {house_number} {street}, {city}, {state} {zip}. Do not include unit numbers in the main address string unless the geocoder is clearly unit-aware — unit suffixes confuse most geocoders for multi-family properties.

Write the normalised addresses to a staging CSV with columns: application_id, address_string, los_county_fips, los_census_tract (the values your loan officers manually entered in the LOS, which you will compare against the geocoded values for QA).

Step 2: Run the batch geocoder

Feed the staging CSV into the batch geocoding script shown earlier in this post. Run it in chunks of 100. For a 20,000-row book, budget 200 API calls for geocoding. Log every response to a local file; do not rely only on the enriched CSV. If the job fails at row 15,000, you want to resume from the checkpoint, not re-geocode the first 15,000.

At the free tier (3,000 calls/day), a 200-call geocode run completes in a single day with headroom. Paid tiers start at $54/month for 100,000 calls, which covers a 20,000-application book with room for the Divisions calls, reruns, and QA spot-checks.

Step 3: Run the Divisions enrichment

For every row where confidence >= 0.7, call the Divisions endpoint to get county_fips, census_tract_geoid, and msa_code. Use the coordinate-based cache described earlier to collapse redundant calls. Log the raw Divisions response per application; you will need it for exam documentation.

For rows with confidence < 0.7, flag them in the staging table as needs_manual_review. Do not populate the census_tract_geoid from the geocoder — populate it from the manual lookup that your compliance team performs.

Step 4: QA — compare geocoded values against LOS-entered values

Join the enriched table against your staging CSV on application_id. Compare county_fips (geocoded) against los_county_fips (entered by the loan officer). A mismatch rate above 3% is a signal that either your loan officer training on address entry needs attention, or your geocoder is consistently wrong for a specific geography. Investigate both.

A useful diagnostic: group mismatches by state and ZIP. If mismatches cluster in one state, the problem is usually a LOS data-entry convention that differs from the geocoder's expected format.

Step 5: Produce the LAR-ready output and archive the evidence

Write the final output: application_id, census_tract_geoid, county_fips, msa_code, geocode_confidence, geocode_method (set to "api_street_level" for confidence ≥ 0.7, "manual" for manually resolved rows). This file becomes the geocoding evidence record for your CRA exam file.

Archive the raw geocoder responses — both the batch geocoding responses and the Divisions responses — in your document management system keyed by (application_id, filing_year). Examiners occasionally ask for the underlying geocoding data. Having the raw API responses on file is far cleaner than regenerating them 18 months later when the examiner calls.

Observability: what to instrument

A geocoding pipeline that runs once a year is easy to ignore until it fails silently. Four metrics worth wiring into your APM before the annual run:

Match rate. geocoded_successfully / total_applications. Should be above 95% for a clean US loan book. Below 90% is a problem.

High-confidence rate. confidence >= 0.7 / total_applications. This is the number that determines how much of your LAR has machine-geocoded tract assignments versus manual. Regulators expect the machine rate to be high for a well-run institution.

County mismatch rate. geocoded_county != los_county / geocoded_successfully. Target below 3%.

Divisions call cache hit rate. If you implement the coordinate cache described earlier, track the hit rate. Above 30% is typical for urban books; above 60% is common for suburban books with geographic clustering.

For a full treatment of geocoding pipeline observability, see Observability for Geocoding Pipelines.

Cost modelling for a real BFSI workload

A community bank with 8,000 HMDA applications per year:

| Operation | Calls | Notes | |---|---|---| | Batch geocoding (100/call) | 80 | 8,000 addresses ÷ 100 | | Divisions lookups | ~8,000 | 1 per geocoded address | | Re-runs and QA spot-checks | ~500 | estimated 6% overhead | | Total | ~8,580 | |

At the paid entry tier ($54/month for 100,000 calls), a single month covers the entire annual HMDA geocoding run with 91% of the monthly allowance left over for other workloads. The free tier (3,000 calls/day) covers a run of this size in three days with no credit card.

A regional bank with 80,000 applications: ~85,000 total calls, still within a single mid-tier month. The per-application cost for geocoding and tract lookup at this scale is well under $0.001.

Pricing details are at csv2geo.com/pricing/api.

What the pipeline does not replace

Two things that sit outside the scope of a geocoding API and belong in your GIS or compliance tooling:

Official assessment area definition. CRA assessment areas are defined by the institution under regulatory guidelines, not by a geocoding API. The geocoder tells you where each loan is; your compliance team decides which counties constitute the assessment area. Do not conflate the two.

HMDA edits and validation. The CFPB's HMDA Platform runs edit checks on your LAR. Passing a census tract GEOID through the API does not guarantee it will pass HMDA edits — a tract that was redistricted between your data vintage and the filing period may return an error. Always run the CFPB's edit-check tool against your final LAR before submission.

Frequently Asked Questions

What geocoding method code should I report in my HMDA LAR for API-geocoded addresses?

HMDA method codes vary by filing year and CFPB guidance — check the current HMDA Filing Instructions Guide. Typically, an address geocoded to a point match by a commercial API qualifies as "geocoded by street address." ZIP-centroid matches (low-confidence results) qualify for a different method code. Structure your confidence threshold so you can report the method code accurately per row.

Does the Divisions endpoint return the 2020 census tract vintage?

The tract vintage in the response reflects the current TIGER boundary set. HMDA filings require the tract vintage specified in the current FFIEC census product — confirm the vintage year in the HMDA Filing Instructions Guide for your filing year and compare it against the tract GEOIDs returned by the API. For most filing years the vintages align; for years following a decennial census, confirm before filing.

What happens when an address is in a new development that is not yet in the geocoder's index?

The geocoder returns a low-confidence match or no match. Route it to manual review. For new-construction loan originations, the lot address is often geocodable even when the building's unit addresses are not — try the lot/parcel address if the unit address fails.

Can I use the API for fair-lending analysis, not just CRA?

The geocoded coordinates and census tract GEOIDs are the same fields used in HMDA fair-lending analysis. Once you have tract assignments for every application, you can join against FFIEC census demographic data (income, minority population) to run the disparity analyses your fair-lending counsel requires. The geocoding API is agnostic to the downstream use.

How does batch geocoding handle duplicate addresses in the input?

The API processes all rows you send, including duplicates. If you have multiple applications at the same address (a multi-unit building), each row gets its own geocode response. Apply your coordinate-cache on the Divisions calls downstream to avoid billing for the same point-in-polygon lookup multiple times.

Is there an audit trail for regulatory purposes?

The API itself does not maintain a per-call audit log on your behalf. Your pipeline should archive the raw JSON responses — geocode batch responses and Divisions responses — keyed by application ID and filing year in your own document management system. That is the evidence record for an examiner. The API key access log in your dashboard shows call counts and timestamps, which is useful for demonstrating that geocoding was performed in a timely, systematic manner.

Can the free tier handle a pilot run before we commit to a paid plan?

Yes. The free tier allows 3,000 calls per day without a credit card. A pilot of 500 HMDA applications — geocoding plus Divisions lookups — runs in a single day within the free tier. That is enough to validate match rates, inspect confidence distributions, and confirm that county FIPS and tract GEOIDs look correct against your LOS data before committing.

Benchmarking geocoding APIs — honest numbers — how to evaluate match rate, confidence, and boundary accuracy before you rely on a geocoder for regulatory filings
Caching geocoding results — 90% cost reduction — coordinate-based caching patterns that collapse Divisions call volume on geographically clustered loan books
Exponential backoff — when to retry, when to stop — the retry policy for batch runs that keeps you below the rate limit without losing progress
Observability for geocoding pipelines — the four metrics that tell you whether your HMDA geocoding run is healthy before you file
Geocoding confidence scores explained — what the confidence field actually measures and how to set the threshold that separates machine-geocoded from manual-review rows

---

*I.A. / CSV2GEO Creator*

Ready to geocode your addresses?

Use our batch geocoding tool to convert thousands of addresses to coordinates in minutes. Start with 100 free addresses.

Try Batch Geocoding Free →

Share this post: Twitter Facebook LinkedIn

← Back to Blog