HIPAA-safe patient address pipeline with no_record and Divisions

Geocode patient addresses under HIPAA without logging PHI. Learn no_record mode, Divisions for care-area mapping, and BAA-ready REST patterns.

| May 30, 2026
HIPAA-safe patient address pipeline with no_record and Divisions

Geocoding patient addresses sounds like a one-afternoon integration. You have addresses. The API takes addresses. It returns coordinates. Ship it.

It is not a one-afternoon integration. HIPAA's minimum-necessary rule applies to every system that touches protected health information, including a geocoding vendor. The moment a patient's home address transits your geocoding provider's infrastructure, that provider becomes a business associate under the Privacy Rule — and unless you have a signed Business Associate Agreement (BAA) in place, you are out of compliance before the first API call returns.

Even with a BAA signed, the pipeline has three problems most teams do not think about until an internal audit finds them.

Problem one: server-side request logs. Most geocoding APIs log every inbound request, including the query string that contains the address. An address alone is not always PHI, but a patient's address in the context of a healthcare application almost certainly is. Server-side logs that retain q=123+Maple+St+Springfield+IL&patient_id=89231 for thirty days at a vendor you do not control is exactly the kind of third-party data store that gets surfaced in a breach investigation.

Problem two: geocoding accuracy that cannot be audited. When the geocoder confidently places a patient at the wrong building — wrong wing of a housing complex, wrong building in a shared postcode — downstream care-area logic routes them to the wrong facility or flags them as an outlier in a population-health cohort. Confidence scores matter more in healthcare than in most industries because the consequences of a wrong coordinate are clinical, not just a misrouted delivery.

Problem three: administrative-boundary enrichment for care-area routing. A latitude/longitude pair is not enough for most healthcare operations. You need to know which health service area, which county, which census tract the patient falls in — and you need that as a structured field you can join against your facility coverage tables, not as a free-text address.

This post solves all three. We cover the no_record parameter that suppresses server-side logging of PHI, the divisions response object that returns structured administrative boundaries alongside coordinates, and the REST patterns that your security team, legal counsel, and InfoSec auditors will approve before the BAA conversation even starts.

The compliance surface you are actually managing

Before code, one clear-eyed look at what HIPAA requires of a geocoding integration. This is not legal advice — run your BAA by counsel — but these are the engineering facts your legal team will ask about.

Business Associate Agreement. If your application is a Covered Entity or a Business Associate of one, and patient addresses transit a geocoding vendor's infrastructure, that vendor is a downstream Business Associate. You need a signed BAA before the first production call. CSV2GEO offers a BAA; the process is documented at csv2geo.com/pricing/api. Your security team should confirm the BAA covers the specific processing described — geocoding of home addresses for healthcare purposes.

Minimum necessary. The minimum-necessary standard requires that PHI shared with a business associate is limited to what is needed for the service. A geocoding call for patient routing needs the address. It does not need the patient's date of birth, diagnosis code, or MRN. Structure your calls so the geocoding payload is only the address string. Do not append identifiers to the query string.

Retention limits. Server-side logs that retain request payloads beyond the BAA's permitted purpose and retention period are a risk. The no_record parameter in CSV2GEO's geocoding endpoint tells the server not to write the inbound address to persistent request logs. The coordinate result is returned to your application and retained nowhere on the vendor side beyond the in-flight request. Your application is then the system of record — and your retention controls apply.

Breach notification surface. If a geocoding vendor experiences a breach and your patients' addresses were in their logs, you have a breach notification obligation. Suppressing server-side logging via no_record removes your patients' addresses from that breach surface.

What no_record does and does not do

no_record=true is a query parameter on the CSV2GEO geocoding endpoint. When present, it instructs the API to:

  • Process the inbound address and return coordinates normally
  • Not write the address string to persistent server-side request logs
  • Not store the address in any request analytics pipeline

It does not:

  • Affect in-transit encryption (all endpoints are HTTPS; no_record does not change that)
  • Suppress your own application logs (if your application logs the raw address before the API call, that is your retention problem, not the vendor's)
  • Change billing or rate-limit accounting (calls are counted for quota purposes without retaining the payload)

The parameter is a straightforward implementation of what the minimum-necessary standard asks for: the vendor processes the address, returns the result, and does not accumulate a corpus of patient addresses in its infrastructure.

What the divisions field gives you

divisions is a structured response object returned alongside the geocoded coordinate. It contains the administrative boundaries that a point falls within — country, state, county, and sub-county divisions where available.

For healthcare operations, this matters in three specific ways.

Care-area routing. Health systems define service areas by county, by health service area (HSA), or by custom zone. If your facility coverage table joins on county FIPS code, you need that code as a clean field — not a free-text county name you have to parse and normalise. The divisions object returns county_fips and state_fips as structured fields. One JOIN against your coverage table, no fuzzy string matching.

Population-health cohort assignment. Epidemiological work and value-based care programs segment populations by census tract, by metropolitan statistical area, or by a custom geography. divisions returns the census tract GEOID and the county subdivision code alongside the coordinate. A patient's record goes from address string to county_fips=17031,tract=170318420100 in one call — a join-ready key into every public health dataset published at census-tract level.

Regulatory reporting. CMS quality reporting and some state Medicaid programs require patient counts by county. divisions makes that count a GROUP BY on a field in your patient table, not an offline geocoding exercise every time a report is due.

Building the pipeline: step by step

Step 1: Geocode with no_record=true and capture the full response

The minimal curl to establish the pattern:

curl -s -G "https://csv2geo.com/api/v1/geocode" \
  --data-urlencode "q=742 Evergreen Terrace, Springfield, IL 62704" \
  --data-urlencode "no_record=true" \
  --data-urlencode "api_key=$CSV2GEO_API_KEY"

The response shape:

{
  "meta": { "count": 1 },
  "results": [
    {
      "formatted": "742 Evergreen Terrace, Springfield, IL 62704",
      "lat": 39.7817,
      "lng": -89.6501,
      "confidence": 0.91,
      "divisions": {
        "country_code": "US",
        "state": "Illinois",
        "state_code": "IL",
        "county": "Sangamon County",
        "county_fips": "17167",
        "county_subdivision": "Springfield township",
        "tract": "170318420100",
        "postal_code": "62704"
      }
    }
  ]
}

The divisions object is the structured boundary data. The confidence score is what you use to decide whether this coordinate is good enough to route on — a score below 0.7 typically means the geocoder matched at postcode centroid level, not at address level, and routing on it is a clinical risk.

In Python with requests:

import os
import requests

API = "https://csv2geo.com/api/v1/geocode"
KEY = os.environ["CSV2GEO_API_KEY"]

def geocode_patient_address(address: str) -> dict | None:
    """
    Geocode a single patient address with PHI-suppressed server logs.
    Returns the first result dict or None if no match.
    """
    r = requests.get(
        API,
        params={
            "q": address,
            "no_record": "true",
            "api_key": KEY,
        },
        timeout=15,
    )
    r.raise_for_status()
    results = r.json().get("results", [])
    if not results:
        return None
    return results[0]

In Node with fetch:

const API = 'https://csv2geo.com/api/v1/geocode';
const KEY = process.env.CSV2GEO_API_KEY;

async function geocodePatientAddress(address) {
  const params = new URLSearchParams({
    q: address,
    no_record: 'true',
    api_key: KEY,
  });
  const r = await fetch(`${API}?${params}`);
  if (!r.ok) throw new Error(`http ${r.status}`);
  const body = await r.json();
  return body.results?.[0] ?? null;
}

Neither example logs the address string — that is intentional. Log the patient identifier and the result coordinate, not the raw address, in your application logs.

Step 2: Apply a confidence gate before routing

A geocoding result with confidence 0.45 places the patient at postcode centroid — possibly a kilometre or more from their actual home. For a non-healthcare application that is a nuisance. For a care-area routing decision it is a clinical risk: the patient might be assigned to a facility whose service area boundary falls between the centroid and their actual home.

CONFIDENCE_THRESHOLD = 0.70  # minimum to route; tune to your clinical risk tolerance

def process_patient_geocode(patient_id: str, address: str, coverage_table: dict):
    result = geocode_patient_address(address)

    if result is None:
        return {"patient_id": patient_id, "status": "no_match", "action": "manual_review"}

    confidence = result.get("confidence", 0.0)
    if confidence < CONFIDENCE_THRESHOLD:
        return {
            "patient_id": patient_id,
            "status": "low_confidence",
            "confidence": confidence,
            "action": "manual_review",
        }

    county_fips = result.get("divisions", {}).get("county_fips")
    facility = coverage_table.get(county_fips, "out_of_network")

    return {
        "patient_id": patient_id,
        "status": "routed",
        "lat": result["lat"],
        "lng": result["lng"],
        "county_fips": county_fips,
        "facility": facility,
        "confidence": confidence,
    }

The function never logs the address. The output record contains the patient identifier and the routing decision — both of which belong in your system of record — but not the PHI input that you do not want accumulating in application logs or analytics pipelines.

Step 3: Batch processing for bulk patient cohorts

Population-health programmes and care-gap analysis typically run against a cohort of thousands or tens of thousands of patients. The geocoding endpoint is a per-address call — there is no address-batch endpoint — but you can pipeline concurrent requests safely.

import concurrent.futures
import time

def batch_geocode_patients(patients: list[dict], max_workers: int = 8) -> list[dict]:
    """
    patients: list of {"patient_id": str, "address": str}
    Returns list of result dicts, same order as input.
    """
    results = [None] * len(patients)

    def worker(idx, patient):
        try:
            geo = geocode_patient_address(patient["address"])
            results[idx] = {"patient_id": patient["patient_id"], "result": geo}
        except requests.HTTPError as e:
            if e.response.status_code == 429:
                # Back off and let the caller retry the whole batch
                raise
            results[idx] = {"patient_id": patient["patient_id"], "result": None, "error": str(e)}

    with concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) as pool:
        futures = {pool.submit(worker, i, p): i for i, p in enumerate(patients)}
        for future in concurrent.futures.as_completed(futures):
            future.result()  # re-raises any unhandled exception

    return results

max_workers=8 is a reasonable starting point for the free tier (3,000 calls/day) and for paid tiers where the per-second rate limit is generous. Tune downward if you see 429s. The full backoff strategy is covered in Exponential Backoff — When to Retry, When to Stop.

Do not log the address in the worker. Log the patient identifier, the HTTP status, and the timing — that is enough for observability without creating a PHI audit surface. The metrics pattern for this is covered in detail in Observability for Geocoding Pipelines — Metrics That Matter.

Step 4: Join divisions data to your coverage table

The county_fips field in divisions is a five-digit FIPS code that joins directly to every CMS provider file, every census boundary dataset, and most state Medicaid coverage tables. The join is a lookup, not a fuzzy match.

import csv

def load_coverage_table(path: str) -> dict:
    """
    CSV with columns: county_fips, facility_id, facility_name
    Returns dict keyed on county_fips.
    """
    table = {}
    with open(path) as f:
        for row in csv.DictReader(f):
            table[row["county_fips"]] = {
                "facility_id": row["facility_id"],
                "facility_name": row["facility_name"],
            }
    return table

# Usage:
coverage = load_coverage_table("coverage.csv")
fips = geocode_result.get("divisions", {}).get("county_fips")
facility = coverage.get(fips)  # None if out of network

This is the entire care-area routing layer. The geocoder produced a structured key; your coverage table is indexed on that key; the JOIN is O(1). The alternative — parsing a free-text county name out of a formatted address string and fuzzy-matching it to your coverage table — is a maintenance liability and a source of silent routing errors.

If your coverage table uses census tract instead of county (common in value-based care programmes with narrower network definitions), the tract field in divisions gives you the full eleven-digit GEOID directly.

Step 5: Persist the right fields and nothing else

What belongs in your patient datastore after geocoding:

| Field | Type | Reason to keep | |---|---|---| | lat, lng | FLOAT | Required for map rendering, distance calculations, catchment-area analysis | | county_fips | CHAR(5) | Care-area routing JOIN key | | tract | CHAR(11) | Population-health cohort key | | confidence | FLOAT | Audit trail for routing decisions; low-confidence rows trigger re-review | | geocoded_at | TIMESTAMP | Staleness check; re-geocode if address changes or after 12 months |

What does not belong in your geocoding result record:

  • The raw address string (it is already in your patient demographic table; storing it again widens the PHI surface)
  • Any field from the geocoder that you do not join against or display
  • The full API response JSON as a blob (a schema-less blob that contains PHI is the worst of both worlds for audits)

Write a migration that adds exactly those five columns to your patient table. The geocoding pipeline writes exactly those five fields. Everything else in the API response is discarded after the routing decision is made.

Caching patient geocodes safely

Patient addresses change — people move. But within a care episode or a population-health cohort run, the coordinate is stable. Caching the geocoding result in your application (not at the HTTP layer, since no_record calls should not be cached by a shared CDN that might log request URLs) saves both money and latency.

A safe pattern: cache the geocoding result in your patient table itself. When an address arrives, check whether the patient already has a non-null lat and a geocoded_at within the last twelve months. If they do, skip the API call. If not, call the API and write the five fields.

from datetime import datetime, timezone, timedelta

GEOCODE_TTL_DAYS = 365

def needs_geocoding(patient_row: dict) -> bool:
    if patient_row.get("lat") is None:
        return True
    geocoded_at = patient_row.get("geocoded_at")
    if geocoded_at is None:
        return True
    age = datetime.now(timezone.utc) - geocoded_at
    return age > timedelta(days=GEOCODE_TTL_DAYS)

A cohort of 50,000 patients with 85% retention between runs requires roughly 7,500 live API calls, not 50,000. At the paid entry tier — $54/month for 100,000 calls — the monthly geocoding cost for a mid-sized population-health programme is well within the base subscription. See Caching Geocoding Results — 90% Cost Reduction for the broader argument.

What no_record does not fix

Honest scope. no_record removes your patients' addresses from vendor-side persistent logs. It does not solve every compliance problem in the pipeline.

Your own logs. If your application framework logs every inbound HTTP request body, and patient addresses arrive in a POST body or a request parameter, your logs contain PHI regardless of what the vendor does. Audit your logging configuration before the security review, not during it. At minimum, mask or exclude the address field in your request-logging middleware.

Third-party analytics in your front end. If your application has a patient-facing address entry form, and a third-party analytics tag fires on form submission, the address may transit a third-party collector that has no HIPAA posture at all. This is out of scope for a geocoding integration but is the single most common finding in healthcare application security reviews.

Compliance is not binary. A BAA and no_record together significantly reduce the PHI exposure at the geocoding vendor. They are necessary but not sufficient. Your InfoSec team's threat model and your legal counsel's reading of the BAA scope are the authoritative sources — this post describes the engineering controls available, not a legal compliance framework.

A note on the Divisions field outside the US

CSV2GEO covers 461M+ addresses across 39 countries. The divisions object is populated for all covered countries, but the subfields vary by jurisdiction. In the United States you get county_fips and tract. In Canada you get province and census division codes. In European jurisdictions you get NUTS codes and local administrative unit identifiers where they are standardised.

For healthcare organisations operating in a single country, this is transparent — the fields you join against are always the same. For organisations with cross-border patient populations, check the response shape for your specific country coverage before building the JOIN logic. The API reference at csv2geo.com/api documents the per-country field availability.

Frequently Asked Questions

Does CSV2GEO sign a BAA for healthcare customers? Yes. The BAA process is initiated through the pricing and account pages. Confirm with your legal counsel that the scope of the BAA covers geocoding of patient home addresses for the specific purpose you are implementing — routing, population health, or clinical operations.

Does `no_record=true` affect billing or rate-limit counting? No. Calls with no_record=true are counted against your quota and billed exactly as normal calls. The parameter suppresses persistent payload logging only — it has no effect on accounting.

What confidence threshold should we use for clinical routing? 0.70 is a defensible starting point. Below that, the geocoder is typically matching at postcode centroid level, not at address level, which introduces geographic uncertainty of 500 m or more — sufficient to cross a care-area boundary. Calibrate against your specific patient population's address quality. Rural populations with non-standard addressing may require a lower threshold plus a manual-review workflow for low-confidence results.

Can we cache `no_record` responses in a shared CDN layer? No — and you should not want to. A CDN cache keyed on request URL would log the patient's address in the CDN's access logs, defeating the purpose. Cache geocoding results in your application database, keyed on patient ID, with a staleness check on geocoded_at. That cache is under your HIPAA controls.

The `divisions` object is missing `tract` for some addresses. What does that mean? Census-tract assignment requires a precise coordinate match to a boundary polygon. If the geocoder matched at postcode centroid level (low confidence), the centroid may fall in a different tract than the patient's actual home, and the field may be omitted or flagged as approximate. This is another reason to gate tract-dependent logic on a confidence threshold above 0.70.

Is the free tier appropriate for a HIPAA pilot? The free tier (3,000 calls/day) is sufficient for a technical pilot with synthetic or de-identified test addresses. Before moving real patient data into the integration — even in a staging environment — have the BAA signed and confirmed. Your BAA applies to the account, not to the pricing tier, so upgrading from free to paid does not require a new agreement.

How do we handle patients whose addresses fail geocoding entirely? A null result or a confidence below threshold should route to a manual-review queue, not silently to an out-of-network bucket. Log the patient ID and the failure reason (no_match vs low_confidence) to your operational dashboard so the care coordination team can follow up. Never route a patient based on a zero-confidence fallback coordinate.

Related Articles

---

*I.A. / CSV2GEO Creator*

Ready to geocode your addresses?

Use our batch geocoding tool to convert thousands of addresses to coordinates in minutes. Start with 100 free addresses.

Try Batch Geocoding Free →