Turning GPS pings into proof-of-visit addresses for field teams

Convert raw lat/lng from field-worker GPS into verified, human-readable proof-of-visit addresses via reverse geocoding. Patterns, code, failure modes.

| June 21, 2026
Turning GPS pings into proof-of-visit addresses for field teams

A field technician presses "job complete" at 10:43 am and your system records lat: 51.5074, lng: -0.1278. You now know where your GPS chip was at 10:43 am. You do not know whether the technician was standing at the customer's front door, sitting in the van two streets away, or walking out of a coffee shop. A coordinate is not proof. A verified street address — matched against the job's expected address, with a distance delta — is proof.

This post is about the engineering pattern that turns raw GPS pings from field-worker devices into auditable, human-readable proof-of-visit records. The core primitive is reverse geocoding: give the API a lat/lng, get back a structured address and a confidence score. Built around that primitive are the three pieces that make the pattern production-grade: distance validation, address matching, and the audit trail that satisfies operations managers, enterprise customers, and — in regulated industries — external auditors.

By the end you will have working code in Python and Node, a latency budget discussion, the failure modes that catch teams by surprise, and a clear view of what "confidence" means in this context.

Why this problem is harder than it looks

The naive version is one line: reverse-geocode the GPS ping, display the result. Teams ship that in a weekend and then spend the next three months patching edge cases that the naive version cannot handle.

GPS accuracy is worse than you think in the field. A phone GPS chip in clear sky is accurate to 3–5 metres. The same chip next to a building, inside a vehicle, or in an urban canyon with tall reflective facades drifts to 15–50 metres. A drift of 40 metres in a dense city puts your reverse-geocoded address one or two doors away from the actual site, or — in a block with a rear lane — across the street. An address that is plausibly close but wrong is more dangerous than a null value, because it looks valid.

Address matching is an inexact string problem. The job record in your dispatch system was entered by a scheduler who typed "12 Victoria St, London" with no postcode. The reverse-geocoded result comes back as "12 Victoria Street, Westminster, London, SW1H 0NN." Exact string matching will fail. You need normalised comparison, and you need a fallback to distance-based validation for the cases where string normalisation still produces no match.

Field workers know they are being tracked. A proof-of-visit system that can be trivially gamed — check in from the van, pocket the phone — creates a compliance record that looks right and means nothing. The counter-measure is not surveillance, it is layering: distance-from-expected-location, timestamp plausibility, and dwell-time confirmation all independently constrain the space of valid check-ins. None of this requires confrontational UX; it just requires that the backend does more than store a coordinate.

The API call happens on a mobile device on a patchy 4G connection. It will fail. Your proof-of-visit record must either wait for the call to succeed or store the raw GPS ping and resolve it asynchronously, with a clear UI state that tells the dispatcher and the customer what the status is. Both architectures are valid; the choice depends on whether you need the address on the device in real time.

The two architectures

Before writing any code, pick the one that fits your operational model.

Synchronous — address is shown to the technician before the job closes. The mobile client reverse-geocodes the GPS ping, shows the result ("You are at 12 Victoria Street — is this correct?"), and the technician confirms or flags a discrepancy. The address is stored as part of the job-close event. The advantage is that the technician can self-correct — if they are genuinely at the wrong address (happens more than you think: two customers on the same street), they catch it before the data is written. The disadvantage is latency and connectivity dependency: if the API call fails, the job-close flow stalls.

Asynchronous — GPS ping is stored immediately, address resolves in the background. The mobile client stores the raw lat/lng with a UTC timestamp and marks the job closed. A backend worker picks up unresolved pings every 30 seconds, reverse-geocodes them, and writes the address back to the record. The advantage is that the job-close flow never stalls on an API call. The disadvantage is a window — typically under a minute — during which the job is closed but the proof-of-visit address is not yet available in the dashboard.

For most field-workforce deployments, the asynchronous model is more robust. The rest of this post assumes asynchronous processing, but the API calls are identical; only the orchestration changes.

What the reverse-geocode call returns

One call, one coordinate in, one structured address out. The CSV2GEO reverse-geocoding endpoint is GET /api/v1/reverse.

curl -G "https://csv2geo.com/api/v1/reverse" \
  --data-urlencode "lat=51.5074" \
  --data-urlencode "lng=-0.1278" \
  --data-urlencode "api_key=$CSV2GEO_API_KEY"

Returns:

{
  "result": {
    "address": "12 Victoria Street, Westminster, London, SW1H 0NN, United Kingdom",
    "street_number": "12",
    "street": "Victoria Street",
    "city": "London",
    "postcode": "SW1H 0NN",
    "country_code": "GB",
    "confidence": 0.91,
    "distance_m": 4
  }
}

Three fields matter for proof-of-visit.

`address` — the human-readable string that goes into the audit record. This is what the operations manager reads and what the enterprise customer sees in the visit report.

`confidence` — a 0–1 float that tells you how certain the reverse geocoder is about the address-to-coordinate mapping. A score above 0.8 is a reliable match. Below 0.6, the result should be flagged for manual review — it usually means the GPS ping landed in a gap between known address points and the geocoder has interpolated or snapped to the nearest known point, which may not be the site the technician actually visited.

`distance_m` — how far the GPS ping is from the matched address point in metres. This is your first-order proof signal: if the job's expected address is "12 Victoria Street" and the reverse-geocoded result says the pin is 4 metres from 12 Victoria Street, that is a strong visit confirmation. If distance_m is 120 metres, something is wrong — either GPS drift, a wrong address on the job record, or the technician was not at the site.

The distance the reverse geocoder returns is the ping-to-nearest-address-point distance, which is not the same as ping-to-expected-address distance. You need to compute that separately, and the next section shows how.

Step 1: Collect and store the raw ping

Before any API call, get the GPS ping off the device and into durable storage. Do not depend on the API call completing before the record is persisted. Schema:

CREATE TABLE visit_pings (
  id            UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  job_id        UUID NOT NULL REFERENCES jobs(id),
  technician_id UUID NOT NULL,
  lat           DECIMAL(9,6) NOT NULL,
  lng           DECIMAL(9,6) NOT NULL,
  accuracy_m    FLOAT,          -- GPS reported accuracy if available
  pinged_at     TIMESTAMPTZ NOT NULL,
  resolved_at   TIMESTAMPTZ,    -- null until reverse-geocode succeeds
  address       TEXT,
  confidence    FLOAT,
  distance_m    FLOAT,
  status        TEXT DEFAULT 'pending'  -- pending | confirmed | flagged | failed
);

Store accuracy_m if the device reports it. Android and iOS both expose the GPS accuracy estimate in metres. You will use it later as a prior on whether to trust the ping at all — a ping with accuracy_m > 50 in a dense urban area is a candidate for rejection before it hits the API.

Step 2: Reverse-geocode in the background worker

The worker runs on a 30-second poll or a queue trigger. In Python:

import os
import requests
from datetime import datetime, timezone

API = "https://csv2geo.com/api/v1/reverse"
KEY = os.environ["CSV2GEO_API_KEY"]

def reverse_geocode(lat: float, lng: float) -> dict:
    r = requests.get(
        API,
        params={"lat": lat, "lng": lng, "api_key": KEY},
        timeout=10,
    )
    if r.status_code == 429:
        raise RateLimitError("rate limited — back off")
    r.raise_for_status()
    return r.json()["result"]

def resolve_pending_pings(db):
    pings = db.query(
        "SELECT id, job_id, lat, lng FROM visit_pings "
        "WHERE status = 'pending' AND pinged_at > NOW() - INTERVAL '24 hours' "
        "LIMIT 50"
    )
    for ping in pings:
        try:
            result = reverse_geocode(ping["lat"], ping["lng"])
            job = db.get_job(ping["job_id"])
            dist_to_expected = haversine(
                ping["lat"], ping["lng"],
                job["expected_lat"], job["expected_lng"],
            )
            status = classify_visit(result["confidence"], dist_to_expected)
            db.update_ping(
                ping["id"],
                address=result["address"],
                confidence=result["confidence"],
                distance_m=dist_to_expected,
                status=status,
                resolved_at=datetime.now(timezone.utc),
            )
        except RateLimitError:
            break  # stop the batch; retry next cycle
        except Exception as e:
            db.mark_ping_failed(ping["id"], str(e))

In Node:

const API = 'https://csv2geo.com/api/v1/reverse';
const KEY = process.env.CSV2GEO_API_KEY;

async function reverseGeocode(lat, lng) {
  const url = `${API}?lat=${lat}&lng=${lng}&api_key=${KEY}`;
  const r = await fetch(url, { signal: AbortSignal.timeout(10_000) });
  if (r.status === 429) throw new Error('rate_limited');
  if (!r.ok) throw new Error(`http_${r.status}`);
  return (await r.json()).result;
}

async function resolvePendingPings(db) {
  const pings = await db.query(
    `SELECT id, job_id, lat, lng FROM visit_pings
     WHERE status = 'pending' AND pinged_at > NOW() - INTERVAL '24 hours'
     LIMIT 50`
  );
  for (const ping of pings) {
    try {
      const result = await reverseGeocode(ping.lat, ping.lng);
      const job = await db.getJob(ping.job_id);
      const distToExpected = haversine(ping.lat, ping.lng, job.expectedLat, job.expectedLng);
      const status = classifyVisit(result.confidence, distToExpected);
      await db.updatePing(ping.id, {
        address: result.address,
        confidence: result.confidence,
        distance_m: distToExpected,
        status,
        resolved_at: new Date().toISOString(),
      });
    } catch (e) {
      if (e.message === 'rate_limited') break;
      await db.markPingFailed(ping.id, e.message);
    }
  }
}

The haversine function computes the great-circle distance in metres between two lat/lng pairs. Do not use Euclidean distance on raw degrees — a one-degree delta in longitude at high latitudes is much smaller in metres than at the equator. Any haversine implementation from your language's standard geography library is fine; the implementation itself is not the interesting part here.

Step 3: Classify the visit

The classification step is where the proof-of-visit system earns its value. Three outcomes:

def classify_visit(confidence: float, dist_to_expected_m: float) -> str:
    if confidence < 0.6:
        return "flagged"          # geocoder uncertain — manual review
    if dist_to_expected_m > 150:
        return "flagged"          # more than 150 m from expected site
    if dist_to_expected_m > 50:
        return "flagged"          # borderline — flag but do not auto-reject
    return "confirmed"

The thresholds are a starting point, not a universal truth. A 150 m threshold is appropriate for a suburban single-family area where addresses are unambiguous and 150 m is definitely the next block. In a large industrial estate where a single address covers a 200 m warehouse frontage, you might relax the inner threshold to 100 m. Tune against your actual job data: pull a week of confirmed visits where you trust the outcome, compute dist_to_expected_m for all of them, and set the threshold at the 95th percentile.

Do not use the reverse geocoder's own distance_m field as your proof-of-visit distance. That field tells you how far the GPS ping is from the nearest known address point in the database — it is not the distance from the job's expected address. Those two numbers are correlated but not identical, especially when the expected address and the nearest database address point are different things (e.g. a rear-access service entrance vs. the front-door address on file).

Step 4: Validate against the job's expected address

Distance confirmation is necessary but not sufficient. A technician who parks 30 metres from the expected address in a city centre might be at the right building or the wrong building on the same block. The second validation layer is address-string comparison.

Normalise both strings before comparing. Strip punctuation, lowercase, expand common abbreviations (St → Street, Ave → Avenue, Blvd → Boulevard), and remove country name, postcode, and any administrative fields above city level. Then compare the core street-number-plus-street-name component.

import re

ABBREV = {
    "st": "street", "ave": "avenue", "blvd": "boulevard",
    "rd": "road", "ln": "lane", "dr": "drive", "ct": "court",
}

def normalise_address(addr: str) -> str:
    addr = addr.lower()
    addr = re.sub(r"[^a-z0-9 ]", " ", addr)
    tokens = addr.split()
    tokens = [ABBREV.get(t, t) for t in tokens]
    return " ".join(tokens)

def address_match_score(expected: str, resolved: str) -> float:
    exp_tokens = set(normalise_address(expected).split())
    res_tokens = set(normalise_address(resolved).split())
    if not exp_tokens:
        return 0.0
    overlap = len(exp_tokens & res_tokens)
    return overlap / len(exp_tokens)

An overlap score above 0.75 means most of the expected address tokens appear in the resolved address — a reasonable match. Below 0.5, the addresses are materially different. Combine it with the distance check: a visit where dist_to_expected_m < 50 and address_match_score > 0.75 is a strong confirmed visit. A visit where the distance is fine but the address tokens do not match at all is worth a closer look — it might be a genuinely confusing site (industrial campus, shopping centre), or it might be a data quality problem on the job record.

Step 5: Write the audit record

The proof-of-visit record that goes into the audit trail is not the raw ping. It is the enriched row:

{
  "job_id": "a3c1e7d0-...",
  "technician_id": "b2f0...",
  "pinged_at": "2026-06-21T10:43:17Z",
  "resolved_at": "2026-06-21T10:43:44Z",
  "raw_lat": 51.5074,
  "raw_lng": -0.1278,
  "resolved_address": "12 Victoria Street, Westminster, London, SW1H 0NN",
  "confidence": 0.91,
  "dist_to_expected_m": 18,
  "address_match_score": 0.83,
  "status": "confirmed"
}

Store this record immutably. Do not update the raw_lat/raw_lng fields once written. Do not delete flagged records — they are often the ones that matter most in a dispute. The audit trail is useful precisely because it includes the failures and the flags, not just the clean confirmations.

For enterprise customers who pull visit reports via your API, expose status, resolved_address, dist_to_expected_m, and pinged_at. Do not expose raw_lat/raw_lng in customer-facing APIs unless the customer has a specific need — the coordinates are operational data, not the proof artefact.

Failure modes in production

Three categories that catch teams out.

GPS spoofing and the technician who is gaming the system. A smartphone with a mock-location app can produce any coordinate the user wants. The reverse-geocode call will return a perfectly valid address for a perfectly fictional location. The counters: timestamp plausibility (the GPS ping arrives at a time consistent with the route), device-reported accuracy (spoofed coordinates often report implausibly small accuracy values), and dwell-time confirmation (a 3-second visit to a 2-hour job site is suspicious regardless of the address). None of these are foolproof, but together they raise the cost of spoofing above what most field workers will bother with.

Mass failure during network outage. If a regional 4G outage takes your workers offline for two hours, all their job-close pings queue on the device and arrive in a burst when connectivity returns. Your background worker needs to handle a sudden backlog gracefully. Rate-limit your batch size, respect 429 responses with exponential backoff, and make sure your database schema allows the resolved_at timestamp to be an hour behind pinged_at without triggering false-positive SLA alerts. See Exponential Backoff — When to Retry, When to Stop for the specific retry pattern.

Address database coverage at unusual sites. The reverse geocoder is backed by 461M+ addresses across 39 countries. For most urban and suburban sites, coverage is comprehensive. But field work happens at places the address database does not know about: new housing developments not yet registered, rural properties with informal addressing, construction sites, industrial facilities entered via an unmapped service road. For these, confidence will be low and distance_m from the nearest known address may be large. Your classification logic should produce "flagged — no nearby address point" rather than "confirmed at a nearby address that happens to be a farm 300 metres away." The flag sends the record to a human; the human notes "construction site, no formal address yet" and confirms the visit manually. Build that manual-confirm path before you ship.

Timezone handling in the audit record. Field workers cross timezone boundaries. A technician who starts a job in one timezone and finishes in another produces two timestamps that look inconsistent if you store local time without the offset. Store UTC throughout. Display local time in the UI with the IANA timezone derived from the GPS coordinates at the time of the ping — that is a one-liner using a timezone lookup library. The raw storage must be UTC with the offset; everything else is a display concern.

Caching — when reverse geocoding is free

Addresses do not move. If a technician visits the same site twice — the morning call and the afternoon follow-up — the second reverse-geocode call will return exactly the same result as the first. Cache at the coordinate level, rounded to five decimal places (roughly 1-metre resolution), with a long TTL.

import hashlib, json
from functools import lru_cache

def coord_cache_key(lat: float, lng: float) -> str:
    rounded = f"{lat:.5f},{lng:.5f}"
    return hashlib.sha1(rounded.encode()).hexdigest()

# In a Redis-backed cache:
def cached_reverse_geocode(lat, lng, cache, ttl_seconds=86400 * 7):
    key = coord_cache_key(lat, lng)
    hit = cache.get(key)
    if hit:
        return json.loads(hit)
    result = reverse_geocode(lat, lng)
    cache.setex(key, ttl_seconds, json.dumps(result))
    return result

At a field workforce scale of 5,000 stops per day — a figure covered in the dispatch-console post — a cache hit rate of 40–60% is realistic for a company that services recurring customers. That translates directly into a lower API bill and lower latency for the resolution step. See Caching Geocoding Results — 90% Cost Reduction for the full pattern.

Observability

A proof-of-visit pipeline has three metrics worth instrumenting:

Resolution lag — the difference between pinged_at and resolved_at in seconds. A healthy pipeline resolves pings within 60 seconds of receipt. A spike in resolution lag means the background worker is backed up, the API is slow, or connectivity is degraded. Alert at P95 > 120 seconds.

Flag rate — the proportion of pings classified as flagged vs confirmed. A healthy field-workforce operation should see a flag rate below 5% if the address data in the job system is clean. A sudden spike to 20% means either a data quality regression in job creation or a change in where your technicians are actually working (e.g. a new industrial estate contract where site addresses are non-standard). Do not silently absorb flag spikes; surface them to the operations team as an operational signal.

API error rate — 4xx and 5xx responses from the geocoding API, broken out by status code. 429s mean you need to adjust your batch pace. 4xxs other than 429 usually mean a malformed request (NaN coordinates, missing API key in the environment). 5xxs are transient and should self-heal with backoff. See Observability for Geocoding Pipelines for the full metrics taxonomy.

Cost model

At the free tier — 3,000 calls per day — a small operation of 30 technicians doing 100 jobs each per day (3,000 pings) runs entirely within the free tier if cache hit rate is above 0%. Most operations have enough repeat-site visits to keep the effective call count comfortably below 3,000.

A mid-sized operation of 200 technicians doing 50 jobs each (10,000 pings per day) with a 40% cache hit rate generates ~6,000 net API calls per day, or ~180,000 per month. The entry paid tier at $54/month covers 100,000 calls; one tier up covers 500,000. At those volumes the proof-of-visit system costs less than £0.001 per confirmed visit — an irrelevant line item against the labour cost of a field call. Full current pricing at csv2geo.com/pricing/api.

FAQ

What confidence score threshold should we use for auto-confirmation? Start at 0.8. Below that, flag for manual review. Tune against your first two weeks of production data — look at the manual reviews that came back "actually fine" and raise or lower accordingly. Different geographies warrant different thresholds: dense urban areas with clean address registers tend to produce consistently high confidence; peri-urban and rural areas produce wider distributions.

Should we store the GPS coordinates or just the resolved address? Both. The raw coordinates are the evidence; the resolved address is the human-readable summary. In a dispute, the coordinates let you reconstruct the full picture. Delete neither. If you have GDPR concerns about storing precise coordinates, round to four decimal places (~11 metres) in the long-term archive and keep the full precision only in a short-term audit log that purges after the contractual dispute window.

What if the technician is at the right building but the GPS puts them 80 metres away? Flag the visit and let the technician add a note at job-close time. The note becomes part of the audit record. A pattern of 80 m drifts from a single device is a signal that the device GPS hardware is degraded and the technician needs a phone replacement, not a disciplinary conversation.

How do we handle multi-dwelling buildings where multiple addresses share a GPS footprint? The reverse geocoder will return the nearest matched address point, which in a multi-dwelling building might be the front-door address, a specific flat number, or the building's postal address depending on how the address database models the building. For high-rise visits, add a manual step in the mobile UI: "Which floor / unit?" The GPS confirms the building; the manual step confirms the unit.

Does this work internationally? Reverse geocoding works across all 39 countries in the CSV2GEO database. Coverage density varies — major cities globally have comprehensive address-level coverage; rural areas in developing markets may return street-level or locality-level results rather than door-level results. The confidence score reflects this: lower confidence in lower-density areas is the honest signal to act on.

Can we use this as evidence in a contractual dispute? The audit record — raw coordinates, resolved address, timestamp, confidence score, and distance to expected site — is the evidence package. Whether it is admissible depends on your jurisdiction and contract terms, not on the API. What the API gives you is a consistent, tamper-evident record that is more defensible than a screenshot of a map. Engage your legal team on admissibility; the engineering concern is that the record is complete, immutable, and timestamped.

What is the difference between the reverse geocoder's `distance_m` and our computed distance to the expected address? The geocoder's distance_m tells you how far the GPS ping is from the nearest matched address point in the database. Your computed distance tells you how far the GPS ping is from the job's expected coordinates. These are the same number only when the nearest database address point is exactly the expected address — which is not always the case. Use your computed distance for the proof-of-visit classification; use the geocoder's distance_m as a data-quality signal on the match itself.

Related Articles

---

*I.A. / CSV2GEO Creator*

Ready to geocode your addresses?

Use our batch geocoding tool to convert thousands of addresses to coordinates in minutes. Start with 100 free addresses.

Try Batch Geocoding Free →