HIPAA-Compliant Geocoding: no_record Patterns and BAA Considerations

Geocode patient addresses without storing PHI: no_record flag patterns, BAA scope, audit trail design.

| May 03, 2026

HIPAA-Compliant Geocoding: no_record Patterns and BAA Considerations

A patient's home address, on its own, is just a string. Tied to a name, a diagnosis, a member ID, or any other identifier of an individual receiving care, it becomes Protected Health Information under HIPAA. Geocoding that string — turning it into a latitude and longitude — passes it through a third-party processor. That is the part most engineering teams underweight, and it is the part compliance teams notice first.

This post is a practical implementation guide for engineers shipping geocoding inside healthcare systems. It is not legal advice. It will not replace a conversation with your privacy officer. What it will do is show you the technical patterns that make that conversation short: the no_record flag, the boundary of what a Business Associate Agreement covers, the audit-trail shape your auditor wants to see, and the cache design that does not turn your Redis cluster into a PHI repository.

I will be honest about what CSV2GEO does and does not offer here. We support a no_record flag on the geocoding API. We do not currently publish a signed BAA. If a BAA is a hard requirement for your workflow today, the patterns below still apply — route through a vendor that signs one — and we will note when that branch is the right call.

What HIPAA actually says about geocoding

The HIPAA Privacy Rule's de-identification standard (45 CFR 164.514) lists eighteen identifiers that must be removed before health information can be considered de-identified under the Safe Harbor method. Geographic and address data appears in two of them, and both matter for geocoding.

| # | Identifier | Geocoding relevance | |---|---|---| | (B) | All geographic subdivisions smaller than a state | Street, city, county, ZIP code, and equivalent geocodes (lat/lng) all qualify | | (C) | Dates (other than year) directly related to an individual | Often co-shipped with addresses in claims data |

The rule grants one exception for ZIP codes: the first three digits may be retained if the geographic unit they identify contains more than 20,000 people according to the most recent Census. ZIP+4, full ZIPs of low-population areas, and any sub-ZIP geometry — including the latitude/longitude pair you get back from a geocoder — are PHI when bound to an individual receiving care.

That last clause is the one that catches teams. The coordinates you write to your database after geocoding a patient's address are themselves PHI, because they identify a geographic subdivision smaller than a state with rooftop precision. Treating the input as PHI but the output as "just numbers" is a common mistake.

The Privacy Rule also allows the Expert Determination method (45 CFR 164.514(b)(1)), where a qualified statistician certifies that re-identification risk is very small. Some research datasets use this. For most operational systems — a clinic routing patients, a payer mapping member density, a pharmacy network analysis — Safe Harbor is the working assumption, and Safe Harbor says the address and its geocode are PHI.

The no_record flag

CSV2GEO's API accepts a no_record=true parameter on both /v1/geocode and /v1/reverse. When present, the query string (the address, or the latitude/longitude pair) is not stored in the request access log. The geocoder still processes the request and returns the result; it just does not persist the input alongside the timing and status metadata it normally records for billing and observability.

The response, when no_record=true is set, includes a confirmation field: _hipaa_compliant: true. That is your auditable signal that the flag was honored. Capture it in your client-side log (alongside a hash of the address, never the address itself) and you have evidence that the flag was active for that request.

# Forward, single
curl -G "https://csv2geo.com/api/v1/geocode" \
  --data-urlencode "q=123 Main Street, Boston, MA 02118" \
  --data-urlencode "country=US" \
  --data-urlencode "no_record=true" \
  -H "Authorization: Bearer $CSV2GEO_KEY"

# Response (truncated)
{
  "query": "123 Main Street, Boston, MA 02118",
  "results": [{ "location": { "lat": 42.3357, "lng": -71.0723 }, "accuracy": "houseNumber", "accuracy_score": 0.97 }],
  "_hipaa_compliant": true
}

// Node.js
async function geocodePHI(address) {
  const url = new URL('https://csv2geo.com/api/v1/geocode');
  url.searchParams.set('q', address);
  url.searchParams.set('country', 'US');
  url.searchParams.set('no_record', 'true');

  const res = await fetch(url, {
    headers: { Authorization: `Bearer ${process.env.CSV2GEO_KEY}` },
  });
  if (!res.ok) throw new Error(`HTTP ${res.status}`);
  const data = await res.json();

  if (data._hipaa_compliant !== true) {
    throw new Error('no_record flag was not honored — refusing to return result');
  }
  return data.results[0]?.location ?? null;
}

# Python
import os
import requests

def geocode_phi(address: str) -> dict | None:
    r = requests.get(
        "https://csv2geo.com/api/v1/geocode",
        params={"q": address, "country": "US", "no_record": "true"},
        headers={"Authorization": f"Bearer {os.environ['CSV2GEO_KEY']}"},
        timeout=10,
    )
    r.raise_for_status()
    data = r.json()
    assert data.get("_hipaa_compliant") is True, "no_record was not honored"
    results = data.get("results") or []
    return results[0]["location"] if results else None

The flag is a technical control, not a legal one. It removes the input from the vendor's access log; it does not by itself make a vendor a Business Associate, and it does not remove PHI handling responsibilities from your side. Read the next section before you ship this.

What a BAA covers (and what it doesn't)

A Business Associate Agreement is a contract required under 45 CFR 164.504(e) when a covered entity (or another business associate) discloses PHI to a vendor that creates, receives, maintains, or transmits PHI on its behalf. The contract binds the vendor to the relevant safeguards of the HIPAA Security Rule (45 CFR 164.308 and 164.312), to a specific breach-reporting cadence under HITECH (typically 60 days), and to flow-down terms for any subcontractors that also touch the data.

A BAA is not a stamp that makes a vendor "HIPAA-compliant" in the abstract. There is no such status. A vendor either has a contract that allocates PHI-handling responsibilities or it does not. Within a signed BAA, both parties still have to actually implement the controls — encryption in transit, encryption at rest where applicable, access logging, breach response procedures, employee training. The contract is the legal instrument; the controls are the engineering.

A common misconception: "this vendor signs BAAs, therefore I can send anything to it." Wrong. The BAA scopes what data the vendor processes and how. If a BAA covers your transactional flow but you also pump PHI into a different vendor endpoint that the BAA does not list, that endpoint is out of scope and you have a disclosure problem.

Honest disclosure: CSV2GEO does not currently publish a signed BAA. The no_record flag and HIPAA-aware response shape are technical building blocks, not a contractual one. If your privacy officer requires a BAA before any patient address leaves your network — and most do, for direct-care workflows — you have three options:

Route patient geocoding through a vendor that signs a BAA (Google Cloud Healthcare Geocoding, AWS Location Service via a BAA-covered AWS account, Esri ArcGIS with a healthcare add-on, Smarty for address validation with their HIPAA tier).
Run a self-hosted geocoder for PHI flows (Pelias, Nominatim, Photon over your own data). Slower, less coverage, but no vendor in the loop.
De-identify the address before sending it out — strip the unit number and house number, geocode to street centroid only, and decide whether the residual coordinate is acceptably non-identifying for your population. This is a Safe Harbor-style maneuver and your privacy officer needs to sign off.

CSV2GEO is appropriate for non-PHI healthcare workflows: aggregate population studies, facility-locator features for the public, market analysis on de-identified data, address validation for billing addresses where the address is not coupled with health information. It is not the right fit, today, for routing identified patients in a clinic system.

The PHI minimization principle

The Privacy Rule's "minimum necessary" standard (45 CFR 164.502(b)) requires you to disclose only the PHI needed for the task. A geocoder needs an address. It does not need a name, a date of birth, a member ID, a diagnosis, an account number, or a phone number. The cleanest implementation strips everything except the address before the request leaves your network.

// scrubPHI.js
const PHI_FIELDS = [
  'name', 'first_name', 'last_name', 'middle_name', 'patient_name',
  'dob', 'date_of_birth', 'birthdate',
  'mrn', 'member_id', 'account_number', 'ssn', 'tax_id',
  'phone', 'email', 'fax',
  'diagnosis', 'icd10', 'cpt', 'note', 'comment',
];

export function scrubForGeocoding(record) {
  // Whitelist: only address parts are forwarded
  return {
    line1: record.address_line_1 ?? '',
    line2: record.address_line_2 ?? '',
    city: record.city ?? '',
    state: record.state ?? '',
    postal_code: record.postal_code ?? '',
    country: record.country ?? 'US',
  };
}

export function assemble(scrubbed) {
  return [scrubbed.line1, scrubbed.line2, scrubbed.city, scrubbed.state, scrubbed.postal_code]
    .filter(Boolean)
    .join(', ');
}

A whitelist is safer than a denylist. If a new field appears in your record schema next quarter — caregiver_phone, say — a denylist will leak it. A whitelist will not.

A safe Node.js client for HIPAA flows

Putting the scrubber, the no_record flag, no-logging, and error handling together:

// hipaa-geocode.mjs
import crypto from 'node:crypto';
import { scrubForGeocoding, assemble } from './scrubPHI.js';

const KEY = process.env.CSV2GEO_KEY;
if (!KEY) throw new Error('CSV2GEO_KEY missing');

// Hash addresses before they go anywhere near a logger
function addrHash(s) {
  const salt = process.env.GEOCODE_HASH_SALT;
  if (!salt) throw new Error('GEOCODE_HASH_SALT missing');
  return crypto.createHmac('sha256', salt).update(s).digest('hex').slice(0, 16);
}

export async function geocodePatientAddress(record) {
  const scrubbed = scrubForGeocoding(record);
  const address = assemble(scrubbed);
  const hash = addrHash(address);

  const url = new URL('https://csv2geo.com/api/v1/geocode');
  url.searchParams.set('q', address);
  url.searchParams.set('country', scrubbed.country);
  url.searchParams.set('no_record', 'true');

  const t0 = Date.now();
  let status = 0;
  try {
    const res = await fetch(url, {
      headers: { Authorization: `Bearer ${KEY}` },
    });
    status = res.status;
    if (!res.ok) {
      logGeocodeEvent({ addr_hash: hash, accuracy: null, status, ms: Date.now() - t0 });
      return null;
    }
    const data = await res.json();
    if (data._hipaa_compliant !== true) {
      throw new Error('no_record flag not honored');
    }
    const r = data.results?.[0] ?? null;
    logGeocodeEvent({
      addr_hash: hash,
      accuracy: r?.accuracy ?? 'no_match',
      score: r?.accuracy_score ?? 0,
      status,
      ms: Date.now() - t0,
    });
    // Return only what the caller needs — never the formatted_address
    return r ? { lat: r.location.lat, lng: r.location.lng, accuracy: r.accuracy } : null;
  } catch (err) {
    logGeocodeEvent({ addr_hash: hash, error: err.code ?? 'unknown', status, ms: Date.now() - t0 });
    throw err;
  }
}

function logGeocodeEvent(evt) {
  // No raw addresses, no formatted_address, no components
  console.log(JSON.stringify({ event: 'geocode', ...evt, ts: new Date().toISOString() }));
}

Five things that client does that an off-the-shelf example would not:

Strips PHI to an address whitelist before any network call.
Sets no_record=true unconditionally and refuses the result if _hipaa_compliant is not echoed back.
Hashes the address with a salted HMAC before any logging — your logs contain b3f9ac1d..., never 123 Main St.
Logs accuracy and timing, not address content. Auditable, not leaky.
Returns coordinates and accuracy only. The formatted_address (which sometimes echoes the input verbatim) and the parsed components are dropped.

Audit trail patterns

When a SOC 2 or HIPAA auditor asks "show me how you handle PHI in your geocoding flow," they want to see four things in your logs:

A non-reversible identifier for the request — a salted hash, not the raw address.
A timestamp to UTC, ISO 8601.
The system actor — service account or job ID, not a human user, since the geocoder is being called by automation.
The result class — accuracy bucket, status, latency. Enough to detect failures, not enough to reconstruct the input.

# log_geocode_event.py
import hashlib, hmac, json, os, time
from datetime import datetime, timezone

SALT = os.environ["GEOCODE_HASH_SALT"].encode()

def addr_hash(address: str) -> str:
    return hmac.new(SALT, address.encode(), hashlib.sha256).hexdigest()[:16]

def log_geocode_event(*, address: str, accuracy: str | None, score: float | None, status: int, latency_ms: int, actor: str):
    event = {
        "event": "geocode_request",
        "addr_hash": addr_hash(address),
        "accuracy": accuracy,
        "accuracy_score": round(score, 2) if score is not None else None,
        "status": status,
        "latency_ms": latency_ms,
        "actor": actor,
        "ts": datetime.now(timezone.utc).isoformat(),
    }
    # Send to your central log pipeline — never to local stdout in production
    print(json.dumps(event))

What is conspicuously absent: the raw address, the formatted_address, the components, the lat/lng. None of those should appear in an application log. The lat/lng pair belongs in your operational database, alongside the patient record it indexes — not in the audit log of the geocoder client.

If you are running observability dashboards (and you should — see observability for geocoding pipelines), aggregate by accuracy bucket, not by addr_hash. The hash is for incident forensics, not for grouping.

Caching PHI-derived geocodes

Caching is where most teams accidentally turn a transient geocoder call into a persistent PHI store. The right pattern is: cache the lat/lng output, key on a salted hash of the input, never store the address itself.

// hipaa-cache.mjs — Redis with HMAC-keyed entries
import { createClient } from 'redis';
import crypto from 'node:crypto';

const redis = createClient({ url: process.env.REDIS_URL });
await redis.connect();

const SALT = process.env.GEOCODE_HASH_SALT;
if (!SALT) throw new Error('GEOCODE_HASH_SALT missing');

function key(address, country) {
  const h = crypto.createHmac('sha256', SALT)
    .update(`${country}|${address.trim().toLowerCase()}`)
    .digest('hex');
  return `geo:v1:${h}`;
}

export async function getCached(address, country) {
  const v = await redis.get(key(address, country));
  return v ? JSON.parse(v) : null;
}

export async function setCached(address, country, location, ttl = 60 * 60 * 24 * 30) {
  // Value: only lat/lng/accuracy. No address echo.
  await redis.set(key(address, country), JSON.stringify(location), { EX: ttl });
}

Two design choices:

Salted HMAC, not plain SHA-256. A plain hash is reversible by dictionary attack against a known address space (the US has around 160 million distinct mailing addresses; a SHA-256 rainbow table is feasible). HMAC with a server-held salt is not.
Value contains no address echo. The cached value is {lat, lng, accuracy}. If a logger or backup picks up the Redis dump, there is no PHI in it.

Rotate the salt and you invalidate the cache (and the linkage between cache entries and original addresses). That property is useful for incident response.

If your privacy officer's threat model includes "Redis dump leaks," consider also encrypting the cached values with a key from your KMS. A salted HMAC plus envelope-encrypted values is belt-and-suspenders, but in healthcare the second layer is often the difference between a notable event and a reportable breach.

What about reverse geocoding GPS data?

Reverse geocoding takes a coordinate pair and returns an address. In a healthcare context, GPS pings from a patient's wearable, a remote monitoring device, or a home-health worker's tablet are PHI when tied to the patient. The same no_record=true flag works for /v1/reverse:

async function reversePHI(lat, lng) {
  const url = new URL('https://csv2geo.com/api/v1/reverse');
  url.searchParams.set('lat', String(lat));
  url.searchParams.set('lng', String(lng));
  url.searchParams.set('no_record', 'true');

  const res = await fetch(url, {
    headers: { Authorization: `Bearer ${process.env.CSV2GEO_KEY}` },
  });
  if (!res.ok) throw new Error(`HTTP ${res.status}`);
  const data = await res.json();
  if (data._hipaa_compliant !== true) throw new Error('no_record not honored');
  // Return only what is needed — distance_meters and accuracy, not the full address
  const r = data.results?.[0];
  return r ? { distance_m: r.distance_meters, accuracy: r.accuracy } : null;
}

If your application needs a coarse location ("which county?") rather than a street address, request only that bucket and discard the finer-grained components. For high-volume IoT contexts, see reverse geocoding for IoT fleets — the same throughput patterns apply, with the additional constraint that you cannot cache by raw lat/lng either; bucket coordinates into geohash prefixes if you need a cache key.

When you actually need a BAA

The technical patterns above are necessary but not sufficient for these scenarios. In each, the geocoder is acting as a Business Associate of the covered entity, and a signed BAA is the right answer:

Direct patient routing inside a clinic information system. The address is a current patient's, the system is operated by a covered entity, the geocoder is a downstream subprocessor. BAA required.
Claims data with patient addresses. A payer sending claims through a geocoder for member geography analysis. The claim is PHI, the address is PHI, the geocoder is a Business Associate.
A geocoder embedded in a SaaS sold to covered entities, used for PHI workflows. Your customer's BAA flows down to your subprocessors. If your geocoder is one, it needs a BAA with you, and you need one with your customer.
Population health analytics where individuals can be re-identified. Even if names are stripped, granular geographic data plus dates plus diagnosis can re-identify in small cells. Treat the input as PHI and contract accordingly.

For everything else — public facility locators, marketing analyses on de-identified data, address validation for non-clinical contexts — the technical controls in this post are typically what your privacy officer is asking for.

For broader privacy regimes, the GDPR-and-geocoding companion post covers the European overlap, and the SOC 2 post covers the controls auditors expect for geocoding pipelines in general.

Frequently Asked Questions

Does setting `no_record=true` make my system HIPAA-compliant?

No. The flag is a technical control on the vendor side that removes query content from the vendor's access log. HIPAA compliance for your system depends on a portfolio of controls — administrative, physical, technical — across your entire flow. The flag is a useful brick, not the wall.

Do I still need a BAA if I use `no_record`?

If the geocoder is processing PHI on behalf of a covered entity, yes. The Privacy Rule's BAA requirement is contractual; technical no-logging does not satisfy it. CSV2GEO does not currently sign BAAs, so for direct-PHI flows you should route through a vendor that does, while keeping the technical patterns from this post.

Can I cache the geocoded coordinates?

Yes, and you should — it cuts cost and latency. Key the cache with a salted HMAC of the address (never the raw address), store only {lat, lng, accuracy} as the value, and consider envelope-encrypting the value if your threat model includes a Redis dump. Rotate the salt periodically.

Does HITECH change anything?

HITECH (2009) added breach notification requirements, expanded enforcement, and made business associates directly liable for HIPAA violations rather than only contractually liable to the covered entity. For your geocoding implementation, the practical effect is: if your geocoder vendor has a breach involving PHI, they have to notify you within 60 days, and you have to notify affected individuals. This is one reason to keep the input footprint small (no_record, scrub before send) — fewer fields exposed means a narrower notification.

How does this overlap with GDPR?

GDPR also requires data minimization (Article 5(1)(c)) and treats coordinates tied to identified persons as personal data. The no_record flag and PHI-scrubbing patterns satisfy the technical end of both regimes. The big GDPR-specific addition is the right to erasure (Article 17), which means your cache needs a deletion path keyed by something traceable to the data subject. See the GDPR companion post for details.

How do I prove that `no_record=true` was used during an audit?

Capture the response's _hipaa_compliant: true field in your client log alongside the salted hash of the request and the timestamp. That tuple — hash, timestamp, _hipaa_compliant: true — is the evidence. Auditors accept it because they can sample requests, recompute the hash from your test input, and verify the flag was active. No raw address ever leaves your system.

What if the address has a unit number containing a name (e.g., "Apt: Smith Family")?

This happens. Apartment-line free text occasionally contains names or other identifiers. Two defenses: (1) regex-scrub apartment lines for likely-name patterns before sending; (2) cap apartment-line length and strip multi-word entries that include common given names. If your input source consistently produces dirty unit lines, parse them with ML-style address normalization and forward only the structured fields. When in doubt, drop the unit line — the geocoder rarely needs it for a houseNumber-level match.

I.A. / CSV2GEO Creator

GDPR and Geocoding: Data Minimization, Right to Erasure, and Logging
SOC 2 for Geocoding Pipelines: What Auditors Actually Look For
Geocoding for Healthcare: Patient Routing Without Leaking PHI
Observability for Geocoding Pipelines: Metrics That Actually Matter
Idempotent Geocoding: Why and How to Make Calls Safe to Retry

Ready to geocode your addresses?

Use our batch geocoding tool to convert thousands of addresses to coordinates in minutes. Start with 100 free addresses.

Try Batch Geocoding Free →

Share this post: Twitter Facebook LinkedIn

← Back to Blog