Mass appraisal geocoding: placing every parcel correctly
Batch geocode a county parcel roll in hours, not weeks. REST patterns, division lookups, and failure-mode handling for property assessors.
A county assessor's office runs a mass appraisal cycle once a year — sometimes more often if the local tax authority is under political pressure to revalue quickly. The input is the parcel roll: hundreds of thousands of rows, each with a situs address, an APN, a legal description, and an assessed value. The output is that same roll with verified coordinates, administrative division labels (county, municipality, tax district, school district), and a confidence score that tells the appraisal analyst which addresses need a human look before the valuation model runs.
Getting the coordinates wrong has concrete consequences. A parcel placed two hundred metres east of where it actually sits might land in a different school district. A different school district means a different mill rate. A different mill rate means a different tax bill. When that discrepancy turns up at an appeal hearing — and it will — the assessor's office defends it. The defensibility of the geocoding is as important as the geocoding itself.
This post walks through the full pipeline: batching the parcel roll, calling the divisions endpoint, handling the failure modes that most assessor shops hit on their first pass, and building output that survives audit. It is a working engineering post, not a product tour. Code is curl, Python, and Node. Every number quoted is verifiable.
Why assessors need geocoding, not just address validation
The distinction matters. Address validation tells you whether a string is a plausible postal address. Geocoding tells you where on the earth's surface that address is. For mass appraisal you need both, in sequence, and the second step is the harder one.
Valuation models are spatial. Comparable sales are found by proximity. Neighbourhood boundaries are polygons. School-district mill rates are applied by which side of a line the parcel falls on. None of that works without a coordinate. An address string "1842 Elm St" is not enough; you need 39.74123, −104.98765 before any spatial join can run.
Administrative divisions are not in the address. A situs address tells you the street. It does not tell you whether the parcel is inside the city limits or the unincorporated county, which school district boundary it falls in, or which special assessment district applies. Those labels come from a reverse-division lookup against the coordinate — and they are the labels that determine which tax rate applies.
The parcel roll has dirty data. Address fields in county databases accumulate years of inconsistent data entry: abbreviated street types, missing unit designators, transposed numbers, directional prefixes dropped by one clerk and retained by another. A batch geocoder with a fuzzy matcher converts most of these to usable coordinates; a strict validator rejects them. You need the geocoder, then you flag the low-confidence rows for manual review rather than rejecting them outright.
The two endpoints you need
`POST /api/v1/geocode/batch` — takes a JSON array of address objects, returns geocoded results including coordinates, confidence score, and normalised address. The batch size limit is documented in the API reference; a common production pattern is 100 to 500 addresses per request depending on address complexity and your timeout budget. Confidence scores run from 0.0 to 1.0; below 0.7 is the threshold where manual review is warranted for a legal document like a tax roll.
`GET /api/v1/divisions` — takes a lat/lng and returns the administrative divisions that contain that point: country, state/province, county, municipality, and sub-municipal units where available. For US addresses this covers county, incorporated place, census-designated place, and in some jurisdictions the township. This is the call that answers "which tax district is this parcel in?" — the question that determines the mill rate.
Both calls use the same API key. Both are billed per address, not per batch. The batch call saves network round-trips and dramatically simplifies your pipeline concurrency model; it does not change per-address cost.
CSV2GEO covers 461 million addresses across 39 countries. For a US county assessor, coverage is deep: rural routes, numbered highways, private roads with 911 addresses, and the unaddressed parcels you geocode from legal description rather than street address. The 3,000-call free tier is enough to run a pilot on a small township's parcel roll before you commit to a paid plan.
Structuring the pipeline
A mass appraisal geocoding pipeline has four stages. Get the stages right and the edge cases fall into natural handling paths.
Stage 1: Clean and normalise the situs address. Strip double spaces, expand common abbreviations (ST → Street, AVE → Avenue, N → North), and split combined address fields that some county systems store as a single text blob. You do not need perfect normalisation — the geocoder's fuzzy matcher handles most of the residual mess — but obviously broken rows (missing house numbers, all-null fields) should be flagged before they consume API credits.
Stage 2: Batch geocode. Send the parcel roll to /api/v1/geocode/batch in chunks. Write the returned lat/lng and confidence score back to the parcel roll. Rows where the geocoder returns no result or a confidence below your threshold go into a manual review queue.
Stage 3: Division lookup. For every successfully geocoded parcel, call /api/v1/divisions with the returned coordinate. Write the administrative division labels back to the parcel row. This is the step that assigns each parcel to a tax district.
Stage 4: Output and audit trail. The enriched parcel roll — original fields plus coordinate, confidence, divisions, and a geocoding timestamp — goes into the appraisal system. The audit trail records the API call timestamp, the API version, and the confidence score for every row so that an appeal hearing has a documented basis.
REST patterns: geocoding a parcel batch
The following examples assume your parcel roll has been chunked into batches of 100 rows and written to a JSON array. Adjust batch size to match your latency budget; smaller batches have lower tail latency on individual requests, larger batches reduce total request count.
curl
curl -s -X POST "https://csv2geo.com/api/v1/geocode/batch" \
-H "Content-Type: application/json" \
-H "X-Api-Key: $CSV2GEO_API_KEY" \
-d '{
"addresses": [
{"id": "APN-001", "q": "1842 Elm Street, Denver, CO 80202"},
{"id": "APN-002", "q": "9314 Wagon Wheel Rd, Arvada, CO 80002"}
]
}' | jq '.results[] | {id, lat, lng, confidence, formatted_address}'The id field is your APN or parcel reference. CSV2GEO echoes it back in the response so you can zip() results to inputs by ID rather than by position — this matters in production because you want to handle partial failures cleanly without corrupting row alignment.
Python
import csv
import json
import os
import time
import requests
API = "https://csv2geo.com/api/v1/geocode/batch"
KEY = os.environ["CSV2GEO_API_KEY"]
BATCH = 100
RETRY_AFTER = 5 # seconds between retries on 429
def chunks(seq, n):
for i in range(0, len(seq), n):
yield seq[i:i+n]
def geocode_batch(addresses):
payload = {"addresses": [{"id": row["apn"], "q": row["situs_address"]} for row in addresses]}
for attempt in range(5):
r = requests.post(API, json=payload,
headers={"X-Api-Key": KEY}, timeout=60)
if r.status_code == 429:
time.sleep(RETRY_AFTER * (2 ** attempt))
continue
r.raise_for_status()
return {res["id"]: res for res in r.json()["results"]}
raise RuntimeError("batch geocode failed after retries")
with open("parcel_roll.csv") as fin, \
open("parcel_roll_geocoded.csv", "w", newline="") as fout:
reader = csv.DictReader(fin)
out_fields = reader.fieldnames + ["lat", "lng", "confidence", "formatted_address"]
writer = csv.DictWriter(fout, fieldnames=out_fields)
writer.writeheader()
rows = list(reader)
for batch in chunks(rows, BATCH):
results = geocode_batch(batch)
for row in batch:
geo = results.get(row["apn"], {})
row["lat"] = geo.get("lat")
row["lng"] = geo.get("lng")
row["confidence"] = geo.get("confidence")
row["formatted_address"] = geo.get("formatted_address")
writer.writerow(row)The exponential backoff on 429 is the critical production detail. A 50,000-parcel run at 100 addresses per batch is 500 requests. If you send all 500 in parallel you will hit rate limits; if you send them sequentially you are leaving throughput on the table. See Concurrency tuning for geocoding sweet spot for the exact concurrency model that maximises throughput without triggering rate limits.
Node
import fs from 'node:fs';
import { parse } from 'csv-parse/sync';
import { stringify } from 'csv-stringify/sync';
const API = 'https://csv2geo.com/api/v1/geocode/batch';
const KEY = process.env.CSV2GEO_API_KEY;
const BATCH = 100;
function chunks(arr, n) {
const out = [];
for (let i = 0; i < arr.length; i += n) out.push(arr.slice(i, i + n));
return out;
}
async function geocodeBatch(rows) {
const payload = {
addresses: rows.map(r => ({ id: r.apn, q: r.situs_address }))
};
const r = await fetch(API, {
method: 'POST',
headers: { 'Content-Type': 'application/json', 'X-Api-Key': KEY },
body: JSON.stringify(payload),
});
if (!r.ok) throw new Error(`HTTP ${r.status}`);
const data = await r.json();
return Object.fromEntries(data.results.map(res => [res.id, res]));
}
const rows = parse(fs.readFileSync('parcel_roll.csv'), { columns: true });
const enriched = [];
for (const batch of chunks(rows, BATCH)) {
const results = await geocodeBatch(batch);
for (const row of batch) {
const geo = results[row.apn] ?? {};
enriched.push({
...row,
lat: geo.lat ?? '',
lng: geo.lng ?? '',
confidence: geo.confidence ?? '',
formatted_address: geo.formatted_address ?? '',
});
}
}
fs.writeFileSync('parcel_roll_geocoded.csv',
stringify(enriched, { header: true }));REST patterns: the divisions lookup
Once every parcel has a coordinate, the divisions call assigns it to its administrative hierarchy. One call per coordinate; no batching on this endpoint, so run it concurrently against your geocoded output.
curl
curl -s "https://csv2geo.com/api/v1/divisions" \
--data-urlencode "lat=39.7392" \
--data-urlencode "lng=-104.9903" \
--data-urlencode "api_key=$CSV2GEO_API_KEY" \
| jq '{county, municipality, state, country}'A parcel in downtown Denver returns something like:
{
"country": "US",
"state": "Colorado",
"county": "Denver County",
"municipality": "Denver"
}A parcel in unincorporated Jefferson County returns a county label and no municipality — the distinction that determines whether city tax applies.
Python (concurrent division lookups)
import concurrent.futures
import requests
import os
DIV_API = "https://csv2geo.com/api/v1/divisions"
KEY = os.environ["CSV2GEO_API_KEY"]
def fetch_divisions(row):
if not row.get("lat") or not row.get("lng"):
return row # skip rows that failed geocoding
r = requests.get(
DIV_API,
params={"lat": row["lat"], "lng": row["lng"], "api_key": KEY},
timeout=20,
)
if r.status_code == 200:
divs = r.json().get("divisions", {})
row["county"] = divs.get("county", "")
row["municipality"] = divs.get("municipality", "")
row["state"] = divs.get("state", "")
return row
# geocoded_rows: list of dicts from Stage 2
with concurrent.futures.ThreadPoolExecutor(max_workers=8) as pool:
enriched_rows = list(pool.map(fetch_divisions, geocoded_rows))Eight workers is a conservative starting point for the divisions endpoint. Monitor your 429 rate; if you see none, step up to 16. If you start seeing retries, step back to four. The concurrency tuning post has the measurement methodology.
Step-by-step: mass appraisal geocoding for a county parcel roll
Step 1: Export and clean the parcel roll
Export the situs-address fields from your CAMA system to CSV. At minimum you need: APN, situs number, situs direction prefix, situs street name, situs street type, situs city, situs state, situs ZIP. Concatenate these into a single situs_address field. Strip leading and trailing whitespace from each component before concatenating — a trailing space in the street name is enough to confuse a fuzzy matcher into returning a low-confidence result.
Flag any rows with a null or zero house number before they reach the API. These are vacant land parcels or parcels described by section-township-range — they geocode poorly from address and need a centroid from the GIS system instead. Separate them into a by_legal_description queue and handle them with your internal GIS, not the address geocoder.
Sanity-check the count: a county with 200,000 parcels should have at least 180,000 valid situs addresses. If more than 10% are null or zero-house-number rows, the export query is probably wrong.
Step 2: Run the batch geocoding job
Send the cleaned address list through /api/v1/geocode/batch in batches of 100. Log the following per batch: batch start index, batch end index, HTTP status code, and the count of results with confidence ≥ 0.7. This log is your first QA signal — a batch with an unusually low high-confidence count probably caught a dirty data segment.
For a 200,000-parcel county, this is 2,000 API requests. At the concurrency level documented in the tuning post, a county-scale run completes in well under an hour on a standard compute instance. The per-address cost is what matters for the budget line: at paid pricing starting at $54/month for 100,000 calls, a 200,000-parcel run is in the second pricing bracket. Calculate the exact cost using the pricing page before you start.
Write the output — lat, lng, confidence, formatted address — back to the parcel roll. Keep the original situs address field; you will need it for the audit trail.
Step 3: Triage low-confidence results
Anything below 0.7 confidence goes into a manual review queue. The most common reasons:
- Typo in house number. "1842" typed as "1824". The geocoder places the result near the correct block but confidence drops because the exact number does not match.
- Missing directional. "Elm Street" when the official name is "West Elm Street". The geocoder usually finds it, but confidence drops because the match is partial.
- Rural route or highway address. "Route 2 Box 47" is not a geocodable address; you need the 911-assigned address or a GIS centroid.
- New subdivision not yet in the reference dataset. A subdivision platted six months ago may not yet appear in any geocoding reference. Coordinate from the developer's plat is the right source.
The confidence scores explained post has a full taxonomy of what each confidence band means in practice. For a legal document, establish your threshold in writing before the appraisal cycle starts so the review policy is defensible at appeal.
Step 4: Run the divisions lookup
For every parcel with lat/lng, call /api/v1/divisions. Write county, municipality, and state back to the parcel row. Flag parcels where the returned county does not match the expected county for the parcel roll — these are coordinate outliers where the geocoder placed the parcel outside the county boundary, which is almost always a data-quality signal.
Cross-reference the municipality field against your tax district table. Parcels in unincorporated county (municipality field empty or null) get the county mill rate; parcels in an incorporated place get the city mill rate. This is the join that the appraisal system needs to compute the correct tax bill, and it is the join that manual geocoding processes get wrong most often when analysts eyeball addresses rather than computing spatial containment.
For a jurisdiction with special assessment districts — fire districts, water districts, hospital districts — the divisions endpoint returns the standard administrative hierarchy. Special assessment boundaries typically live in your GIS system as local polygon layers; the workflow is to spatial-join the enriched coordinate against those local layers after the API call, not to expect the API to know about your local special districts.
Step 5: Write the audit-ready output and cache the results
The final enriched parcel roll should include at minimum: original APN, original situs address, normalised/formatted address from the geocoder, lat, lng, confidence score, API call timestamp, county from divisions, municipality from divisions, and a geocode_source flag (api_batch, gis_centroid, manual_review, etc.) for every row.
Cache aggressively. Parcels do not move. An address that was geocoded in January is still at the same coordinate in November. Write the enriched output to a cache keyed by APN, and on your next appraisal cycle only re-geocode rows where the situs address changed. For a 200,000-parcel county, the annual churn rate is typically 2-5% of addresses (new subdivisions, address corrections, boundary annexations). Caching the other 95% drops your annual API spend by approximately 90%. The 90% cost reduction caching post has the full caching strategy with TTL reasoning.
Failure modes that sink first attempts
Position-based response joining. Some teams join the geocoder's output to their input by row position rather than by ID. This works until a batch has a partial failure and the response array is shorter than the input array — at which point the join silently misaligns every subsequent row. Always use the echoed id field to join; never rely on array position.
Ignoring null confidence. Some rows come back geocoded but with a null confidence score. This happens for addresses that match the reference data but where the match quality computation encountered an edge case. Treat null confidence as below-threshold — send it to manual review rather than accepting it as high confidence.
Overwriting GIS centroids with low-confidence geocodes. If your GIS system already has a surveyed centroid for a parcel (typically the case for large commercial properties), do not overwrite it with a batch geocode result. The batch geocode is for parcels where you do not have a reliable existing coordinate, not for replacing surveyed positions.
Flooding the API with division calls in a tight loop. The divisions endpoint is one call per point, not batched. Teams that loop over 200,000 rows with a synchronous requests.get() inside the loop will take six hours and will almost certainly trigger rate limiting. Concurrent execution with a thread pool, as shown above, is the correct pattern. See Exponential backoff — when to retry, when to stop for the retry logic to wrap around the concurrent calls.
Not logging the geocoding timestamp. An appeal filed eighteen months after the appraisal cycle needs to know which version of the reference data was used when the parcel was geocoded. Log the timestamp on every row. It is one field; it saves you a painful conversation in a hearing room.
Throughput budget for a county-scale run
A worked example for a county with 180,000 addressable parcels (after removing the vacant-land and legal-description queue):
| Stage | Calls | Batch size | Request count | Notes | |---|---|---|---|---| | Batch geocoding | 180,000 | 100 | 1,800 | at 8 concurrent workers | | Division lookup | ~170,000 | 1 | 170,000 | only successfully geocoded rows | | Manual review | ~10,000 | — | — | human queue, no API |
At 8 concurrent workers on the geocoding stage and 16 on the division stage, a county-scale run typically completes within a half-day on a t3.medium or equivalent. The bottleneck is almost always not the API — it is the rate at which your local database can accept writes. Batch the database inserts (500 rows per insert) and the pipeline stays balanced.
For the division stage, 170,000 calls at 16 concurrent workers is well within the published call budgets for the paid tiers. If you are running annual appraisal cycles and caching results, the effective per-cycle call count drops to 5-10% of the total parcel count by year two.
What to put in the appraisal system vs what to archive
Put in the appraisal system: lat, lng, confidence, county, municipality, formatted address, geocode source flag. These are the fields that appraisal models, GIS viewers, and tax billing systems need in real time.
Archive separately: the full API response JSON per batch, the API call timestamp, the API endpoint and parameter set, and the raw situs address that was submitted. This archive is your audit trail. It does not need to be queryable — a flat file per appraisal cycle in S3 or your document management system is sufficient. A hearing officer asking "what address did you submit to the geocoder and what did it return?" needs to see the archived JSON, not the summarised fields.
Frequently Asked Questions
What confidence threshold should an assessor's office use for accepting a geocode result without manual review?
0.7 is a reasonable default starting point, but calibrate it against your county's data quality. Run a pilot batch of 500 parcels for which you already have surveyed coordinates, geocode them with the API, and compute the median distance error at each confidence band. For most US counties with reasonably clean address data, confidence ≥ 0.7 produces results within 50 metres of the surveyed position — acceptable for tax-district assignment. Confidence below 0.5 is almost always a data-quality problem in the input address.
How do I handle parcels that have multiple situs addresses — for example, a commercial building with a main address and a loading-dock address?
Geocode the primary situs address only for the coordinate used in valuation and division assignment. Geocode the secondary addresses if they are relevant to your fire-inspection or mailing workflows, but keep them in a separate table keyed by APN. The appraisal system should have one canonical coordinate per parcel for spatial joins.
The county I work in straddles two states. How does the divisions endpoint handle parcels near the state line?
The endpoint returns the administrative hierarchy for the point you provide. A parcel whose coordinate is 50 metres on the Colorado side of the Colorado-Nebraska state line will return Colorado state, the correct Colorado county, and the correct Colorado municipality (if any). Geocoding errors of more than 50 metres near a state line are a data-quality signal worth investigating; the correct response is to verify the coordinate against your GIS boundary layer, not to trust either the geocoder or the boundary layer blindly.
Can I use this pipeline for newly annexed parcels where the address has changed but the physical location has not?
Yes. Geocode the new address. If the new address returns a coordinate within a few hundred metres of the old coordinate, the parcel is in the same location and you can update the address fields while keeping the existing coordinate — or replace it with the freshly geocoded one and log the change. If the coordinates diverge by more than expected, it is a data-quality flag: the new address may be a transcription error.
Is there an SDK that handles batching and retry automatically?
Python and Node SDKs exist and are documented at csv2geo.com. That said, the REST API is simple enough that most production assessor pipelines wrap it in a small internal client — twenty to forty lines of code — rather than taking a dependency on an SDK version. The REST patterns in this post are what most teams ship to production.
How does this compare to running our own geocoding instance on-premises?
Self-hosting a geocoder requires maintaining a reference dataset (461M+ addresses across 39 countries in our case), a fuzzy matching engine, and update pipelines for new addresses. The infrastructure cost is non-trivial, and the data-currency problem is persistent — new subdivisions, address corrections, and annexations need to flow into your local dataset. For most county assessor offices the operational cost of self-hosting exceeds the API cost by a wide margin. The API also gives you the divisions endpoint, which a self-hosted geocoder typically does not include.
What happens if my rate limit is hit mid-batch during a large appraisal run?
The API returns HTTP 429 with a Retry-After header. Your retry logic should read that header and wait the indicated seconds before resending the batch — do not discard the batch, do not re-queue it at the back of the line. The exponential backoff post covers the exact retry implementation. A well-instrumented pipeline logs every 429 response with a timestamp so you can diagnose whether the rate limit was hit due to a concurrency misconfiguration or a genuinely undersized plan.
Related Articles
- Enriching property data with the elevation API — add ground elevation to every parcel coordinate for flood and terrain signals
- Benchmarking geocoding APIs — honest numbers — what to measure when evaluating a geocoder for a legal-document workflow
- Caching geocoding results — 90% cost reduction — parcels do not move; cache the enriched results and drop your annual API spend by 90%
- Geocoding confidence scores explained — full taxonomy of what each confidence band means in practice and where to set your review threshold
- Concurrency tuning for geocoding sweet spot — the exact worker-count methodology for maximising throughput on a county-scale batch run
---
*I.A. / CSV2GEO Creator*
Use our batch geocoding tool to convert thousands of addresses to coordinates in minutes. Start with 100 free addresses.
Try Batch Geocoding Free →