Geocoding UK Addresses: Postcodes, UPRN, and Edge Cases
UK address geocoding: postcode districts, UPRN persistence, NI vs GB quirks, and Royal Mail PAF compatibility.
UK addresses look superficially similar to US addresses — house number, street, town, postcode — and that surface similarity is exactly what wrecks pipelines built by teams whose default mental model was forged in North America. The UK does not work like the US. The postcode is not a coarse zone; for a meaningful slice of the country it identifies a single building. People search by postcode alone and expect a usable point on the map. Property identifiers are not derived from the address string; they are issued centrally and persist across name changes, demolitions, and re-numbering. And once you cross the Irish Sea into Northern Ireland, half of the assumptions your geocoder has been making quietly fail.
This post is the working notes from running production UK geocoding workloads through the CSV2GEO API: how UK postcodes are actually structured, how to use them as a primary lookup key, how UPRN fits into a clean dataset, what happens at the GB/NI boundary, and the parsing edge cases (flats, building names, Welsh duplications, Edinburgh stair numbers) that quietly hand you bad coordinates if you don't watch for them.
The UK postcode anatomy
A UK postcode is two halves separated by a single space. The first half is the outward code, the second is the inward code.
SW1A 2AA
└┬┘ └┬┘
│ │
│ └─ inward code: sector digit + unit letters
└───── outward code: area letters + district digitsWalking through SW1A 2AA the official way:
| Component | Value | Name | Meaning | |---|---|---|---| | Area | SW | postcode area | 1 of 124 (London SW = south-west London) | | District | SW1A | postcode district | sub-area, sometimes alpha-suffixed | | Sector | SW1A 2 | postcode sector | first half + first digit of inward | | Unit | SW1A 2AA | postcode unit | full postcode, ~15 deliveries on average |
A few rules to bake into validation:
- The area is one or two letters:
B(Birmingham),M(Manchester),EC(East Central London),BT(Belfast). - The district is the area plus 1 or 2 digits, optionally with a trailing letter for some London codes (
W1A,EC1V,SW1A). - The inward code is always exactly one digit followed by two letters.
- The single space between halves is canonical. Royal Mail mandates it. Treat input without a space as malformed-but-recoverable, not as a different postcode.
A regex that catches the structural shape of a real UK postcode (without trying to validate every edge case the Office for National Statistics has ever defined):
const UK_POSTCODE = /^([A-Z]{1,2}\d[A-Z\d]?) ?(\d[A-Z]{2})$/i;
function normalizePostcode(raw) {
const m = String(raw).trim().toUpperCase().match(UK_POSTCODE);
if (!m) return null;
return `${m[1]} ${m[2]}`; // canonical form: outward + space + inward
}
normalizePostcode('sw1a2aa'); // "SW1A 2AA"
normalizePostcode('SW1A 2AA'); // "SW1A 2AA"
normalizePostcode(' bt12 5gh '); // "BT12 5GH"
normalizePostcode('garbage'); // nullThat single function — uppercase, single space between outward and inward — eliminates the largest source of noise in UK address pipelines. Cache keys collapse, dedup rates climb, match rates jump several points. Do this before anything else touches the row.
Why UK postcodes are different
In the United States, a 5-digit ZIP covers somewhere between several hundred and several tens of thousands of addresses. ZIP+4 narrows it but most pipelines never see ZIP+4 in the input. In Germany, a 5-digit PLZ behaves similarly — coarse, town-or-district-scale.
UK postcodes are an order of magnitude finer. Approximately 1.8 million live postcode units cover roughly 30 million delivery points. The arithmetic gives an average of around 15 properties per postcode. The distribution has a long tail: in dense urban Victorian terraces a postcode can cover 30+ houses, while large commercial buildings frequently have a dedicated postcode of their own. SW1A 1AA is Buckingham Palace. SW1A 2AA is 10 Downing Street. W1A 1AA is the BBC Broadcasting House.
The cultural effect is that UK people think of the postcode as the address. Forms ask for postcode and house number; everything else (street, town) is auto-filled by a postcode lookup service. Mobile autocomplete returns full addresses from a postcode prefix in two keystrokes. Estate agents quote properties by postcode. When someone gives you "TW1 3BB" with no street, that is not an incomplete address — it is the address.
For geocoding pipelines this means three things:
- A postcode-only input is a first-class citizen, not a degraded one. Treat it as the primary lookup path, not a fallback.
- Postcode-level confidence is genuinely useful. A
postcodeaccuracy match in the UK puts you typically within 100m of the true point, often within 20m. In the US, a 5-digit-ZIP centroid can be miles off. - Cross-checking street name against postcode is high-signal. If a row says
Acacia Road, BT12 5GHand the postcode covers a different street, the house number is probably correct and the street is wrong, not the other way around.
Single-postcode geocoding
This is the case nothing-but-a-postcode pipelines are built around. The query is just the postcode; the response gives you a usable point.
curl -X POST https://csv2geo.com/api/demo/geocode \
-H 'Content-Type: application/json' \
-d '{"q":"SW1A 2AA","country":"GB"}'Response (trimmed):
{
"results": [
{
"formatted_address": "SW1A 2AA, London, United Kingdom",
"location": { "lat": 51.5014, "lng": -0.1419 },
"accuracy": "postcode",
"accuracy_score": 0.95,
"components": {
"postal_code": "SW1A 2AA",
"city": "London",
"country": "United Kingdom"
}
}
]
}The postcode accuracy level is the right level for this query — there is no house number in the input, so claiming houseNumber precision would be a lie. The point is the postcode unit centroid, which on SW1A 2AA lands inside the property the postcode identifies.
For a full address with house number, the accuracy upgrades:
curl -X POST https://csv2geo.com/api/demo/geocode \
-H 'Content-Type: application/json' \
-d '{"q":"10 Downing Street, London","country":"GB"}'{
"results": [
{
"formatted_address": "10 Downing Street, London SW1A 2AA, United Kingdom",
"location": { "lat": 51.5034, "lng": -0.1276 },
"accuracy": "houseNumber",
"accuracy_score": 1,
"components": {
"house_number": "10",
"street": "Downing Street",
"city": "London",
"postal_code": "SW1A 2AA",
"country": "United Kingdom"
}
}
]
}A composite query with street and postcode resolves to street-level precision at the postcode centroid:
{
"results": [
{
"formatted_address": "Parliament Square, London SW1P 3JX, United Kingdom",
"location": { "lat": 51.5006, "lng": -0.1273 },
"accuracy": "street",
"accuracy_score": 0.92,
"components": {
"street": "Parliament Square",
"city": "London",
"postal_code": "SW1P 3JX",
"country": "United Kingdom"
}
}
]
}The general rule: accuracy_score ≥ 0.9 with accuracy of houseNumber or street is safe to use as-is. postcode matches at ≥ 0.9 are safe for any UK use case where the property granularity does not matter — territory assignment, drive-time isochrones, sales-region rollups, area demographic enrichment. They are not safe for last-mile delivery routing, because the geocoder cannot know which of the 15 properties on the postcode is yours.
UPRN: the stable identifier
The Unique Property Reference Number is a 12-digit identifier issued by GeoPlace (a joint venture between Ordnance Survey and the Local Government Association) for every addressable land or property unit in Great Britain. It exists because the address string is unstable: streets get renamed, properties get re-numbered after demolitions, postcodes get split. The same physical front door can have three different addresses across a decade. UPRNs do not change.
Some properties of UPRNs worth knowing:
- 12 digits, typically right-padded with zeros so most look like
100023336956. - Issued the moment a planning application creates an addressable unit, so they exist before the property does.
- Persist through name and number changes; the UPRN of
1 Acacia Avenuestays the same if it gets renumbered to1A Acacia Avenue. - Free to use under the Open UPRN dataset from Ordnance Survey, which lists every UPRN with a coordinate but no address.
- Not present in Northern Ireland — the equivalent there is the Pointer dataset, which uses
OBJECTIDas a primary key.
CSV2GEO does not currently return UPRN as a field on a geocode response. The workaround when you need UPRN in your output is a two-step:
# Step 1: geocode the address normally
geocoded = csv2geo_forward(address)
lat, lng = geocoded['location']['lat'], geocoded['location']['lng']
# Step 2: look up the UPRN by spatial join against OS Open UPRN
# (Open UPRN is a CSV: UPRN, X (easting), Y (northing) in OSGB36)
uprn = nearest_uprn(lat, lng, max_distance_meters=20)For the spatial join, the practical trick is to pre-load OS Open UPRN into a spatial index (R-tree, PostGIS, or even a sorted geohash array) once, and look up nearest-neighbour by coordinate at runtime. The match radius matters: anything beyond 20m and you risk attaching the UPRN of a neighbouring property, especially in dense terraces where front doors are 5m apart. If your geocoded result is at street accuracy (centroid of a long road), you should not attach a UPRN at all — there is no honest way to pick one of the 200 candidates.
Royal Mail PAF compatibility
The Postcode Address File (PAF) is the canonical Royal Mail database of every deliverable address in the UK. About 32 million records. Updated continuously. PAF is licensed — pricing depends on volume but it is not free, and the licence terms restrict redistribution.
For most pipelines you do not need PAF directly. You need outputs that look like PAF outputs — specifically the field structure (organisation, building name, building number, dependent thoroughfare, thoroughfare, dependent locality, locality, postcode) so downstream systems that expect PAF format are happy. Map your geocoded components to that schema and 95% of integrations work without paying the licence.
When you actually do need PAF (the address has to round-trip through a CASS-style validation, or you are submitting bulk mail to a carrier that requires PAF-clean input), the relevant free alternatives that get you most of the way there:
- OS Open Names — free dataset of place names, postcodes, and roads with coordinates.
- OS Open UPRN — UPRNs and coordinates only, no address strings.
- OS Open USRN — Unique Street Reference Numbers and coordinates.
Combine those three and you have everything except the building number / building name layer. Geocode through the API to fill that layer. The UK's open address coverage is genuinely good — better than most of the world — and a hybrid pipeline that uses OS Open files for stable identifiers and a real geocoder for street-and-number resolution is a solid pattern.
NI vs GB
Northern Ireland is administratively part of the UK but operationally a separate addressing universe. Postcodes start with BT (BT1 through BT94); the Royal Mail PAF covers it but the Ordnance Survey of Great Britain does not. NI's equivalent is Land & Property Services (LPS) and the Pointer dataset, which is the authoritative address database for NI.
Practical differences a pipeline needs to know:
| Feature | GB (England, Scotland, Wales) | NI | |---|---|---| | Postcode range | All except BT | BT1–BT94 | | Property identifier | UPRN (Open UPRN, free) | Pointer OBJECTID (licence required) | | Street identifier | USRN (Open USRN, free) | Pointer street ID | | Open dataset coverage | Excellent (OS Open) | Limited | | Geocoder match rate (typical) | High | Lower | | Postcode density | ~15 properties/postcode | similar |
The headline issue: open data for NI is patchy. If you need rooftop accuracy in Belfast or Derry, expect a noticeably lower match rate than Manchester or Glasgow, and budget for manual review of low-confidence rows. If you only need postcode-level precision, NI is fine — the postcode system itself is identical to GB.
Address parsing edge cases
These are the rows that quietly turn into bad coordinates if you just throw them at a geocoder. Most are solved with pre-geocoding parsing; a few need post-processing.
Flat numbers. Flat 3, 25 High Street, Reading RG1 7BA — the input has two numbers and the geocoder needs to know which is the unit and which is the property. Standard UK convention is Flat N, M Street, with M being the building number on the street. A defensive parse strips Flat \d+,? to a separate unit field before sending the rest as the geocode query, and re-attaches it to the result.
function extractFlat(address) {
const m = address.match(/^(?:Flat|Apartment|Apt|Unit)\s+([\w\d]+),?\s+(.*)$/i);
if (!m) return { unit: null, address };
return { unit: m[1], address: m[2] };
}Building names instead of numbers. Oak Cottage, Mill Lane, Tetbury GL8 8RR. Rural addresses in particular use building names. The geocoder will not find a house number, but the postcode is enough — match against postcode and accept the postcode-centroid coordinate.
Care-of addresses. c/o The White Hart, High Street, Bath BA1 2QP. Strip the c/o ..., prefix entirely. The recipient is at the address; the c/o is delivery instruction, not a location.
Welsh bilingual addresses. Heol y Brenin / King Street, Caerdydd / Cardiff CF10 3DD. Both languages are official in Wales and either is valid on Royal Mail. The geocoder should accept either form. The defensive normaliser splits on / and tries both halves; the first to match with high confidence wins.
Scottish floor/flat notation. Edinburgh and Glasgow tenements use 10/3 Leith Walk, Edinburgh EH6 8LN — that means flat 3 of building 10. The slash is significant and is not a typo. A US-trained parser sees 10/3 and either drops the /3 or treats the whole thing as a fraction. Handle this before geocoding by splitting the slash into building_number=10, flat=3 and sending only the building number plus street to the API.
Postcodes embedded in the road field. When an address parser gets confused, you can end up with the postcode parked inside street. UK postcodes are easy to spot post-parse — the regex above pulls them out reliably. Always re-validate after parsing.
For more on this class of work, the postcode handling guide across countries covers the equivalents in Germany, the Netherlands, Brazil, and elsewhere — many of the same ideas apply with different specifics.
A working batch script
Putting the moving parts together: a Node script that reads a CSV of UK addresses, normalises postcodes, splits flats off as a separate column, geocodes against CSV2GEO, and writes a CSV with lat, lng, accuracy, and error columns. About 80 lines.
// uk-geocode.mjs
import { createReadStream, createWriteStream } from 'node:fs';
import { parse } from 'csv-parse';
import { stringify } from 'csv-stringify';
import pLimit from 'p-limit';
const KEY = process.env.CSV2GEO_KEY;
const limit = pLimit(8);
const UK_POSTCODE = /\b([A-Z]{1,2}\d[A-Z\d]?) ?(\d[A-Z]{2})\b/i;
function normalizePostcode(raw) {
if (!raw) return null;
const m = String(raw).toUpperCase().match(UK_POSTCODE);
return m ? `${m[1]} ${m[2]}` : null;
}
function extractFlat(address) {
const m = address.match(/^(?:Flat|Apartment|Apt|Unit)\s+([\w\d]+),?\s+(.*)$/i);
return m ? { unit: m[1], rest: m[2] } : { unit: null, rest: address };
}
async function geocode(q) {
const url = new URL('https://csv2geo.com/api/v1/geocode');
url.searchParams.set('q', q);
url.searchParams.set('country', 'GB');
const res = await fetch(url, { headers: { Authorization: `Bearer ${KEY}` } });
if (!res.ok) return { error: `http_${res.status}` };
const data = await res.json();
const r = data.results?.[0];
if (!r) return { error: 'no_match' };
if (r.accuracy_score < 0.85) return { error: 'low_confidence' };
return {
lat: r.location.lat,
lng: r.location.lng,
accuracy: r.accuracy,
postcode: r.components?.postal_code ?? null,
error: '',
};
}
const out = createWriteStream('out.csv');
const stringer = stringify({
header: true,
columns: ['id', 'address', 'unit', 'postcode_normalized', 'lat', 'lng', 'accuracy', 'error'],
});
stringer.pipe(out);
const tasks = [];
const parser = createReadStream('in.csv').pipe(parse({ columns: true }));
for await (const row of parser) {
tasks.push(limit(async () => {
const { unit, rest } = extractFlat(row.address || '');
const pc = normalizePostcode(row.postcode || rest);
const query = pc ? `${rest.replace(UK_POSTCODE, '').trim()}, ${pc}` : rest;
const g = await geocode(query);
stringer.write({ id: row.id, address: row.address, unit, postcode_normalized: pc, ...g });
}));
}
await Promise.all(tasks);
stringer.end();
await new Promise(r => out.on('finish', r));Three things this script does that a naive port of a US pipeline does not:
- Normalises the postcode separately from the rest of the address. If the CSV has a
postcodecolumn at all, it wins over whatever the address string contains. - Splits flat numbers off before sending to the API. The flat goes into a separate output column; only the building part is sent for geocoding.
- Sets a conservative confidence threshold (0.85) because UK rejection criteria are stricter than US — a low-confidence postcode match is much more likely to be the wrong postcode rather than an approximate version of the right one.
For larger jobs you should switch to the batch endpoint and process in chunks of 1,000. The pattern from the Node.js tutorial translates one-for-one — same retry logic, same backoff, same rate-limit handling.
Frequently Asked Questions
Why does only-postcode work as a complete query in the UK?
Because of postcode density. With around 15 properties per postcode unit on average — and many large buildings owning a dedicated postcode — the centroid of a UK postcode is a more useful point than most countries' street centroids. Forms in the UK have asked for postcode + house number for thirty years; a postcode-only query is the same shape, just minus the building number. The geocoder is not guessing; it is returning the centroid of a unit that genuinely exists at that location.
Can I get UPRN directly from the CSV2GEO API?
Not currently. The API returns coordinates, address components, and accuracy fields but not UPRN. The workaround is to geocode normally and then spatial-join the result against OS Open UPRN (free, downloadable as CSV). Match within a 20m radius for houseNumber accuracy; do not attempt UPRN attachment for street or postcode accuracy results.
How should I handle Northern Ireland addresses?
Detect them by the BT postcode prefix and route them through the same geocoder — the API handles GB and NI uniformly. Expect a slightly lower match rate at house-number accuracy due to weaker open-data coverage in NI compared to GB. For postcode-level work, NI is indistinguishable from GB in practice.
Should I uppercase postcodes before sending?
Yes, always. Royal Mail's canonical form is uppercase. Geocoders are usually case-insensitive but cache keys are not — sw1a 2aa, SW1A 2AA, and Sw1a 2aa will produce three separate cache entries unless you normalise. Uppercase + single space between halves, every time.
Do the Crown Dependencies (Isle of Man, Jersey, Guernsey) use UK postcodes?
They use UK-format postcodes (IM, JE, GG prefixes respectively) but they are not part of the UK and their data is not in OS or PAF. Geocoder coverage tends to be thinner. Treat them as a separate country in your routing logic and expect higher rejection rates. The Channel Islands in particular have postcode units that map to entire parishes, so postcode-centroid precision is much coarser than mainland UK.
How accurate is geocoding for high-rise flats?
The coordinate is for the building, not the flat. A 200-flat tower geocoded by Flat 47, Tower Block, 25 Marsh Wall, London E14 9BD resolves to a single rooftop point that is correct for all 200 flats. If you need flat-level precision, you need UPRN attachment plus a building floor-plan dataset, which is well outside what any geocoder API provides. For most use cases — territory assignment, drive-time, demographic enrichment — building-level precision is fine.
How do I match my output against PAF format?
Map the geocoder's component fields to PAF's field names: house_number → building_number, street → thoroughfare, city → post_town, and so on. The match is approximate — PAF has fields like dependent_thoroughfare and double_dependent_locality that not every geocoder populates — but for downstream systems that just want to read PAF-shaped CSV, the mapping is good enough. If you need genuine PAF compliance (CASS-style certified output), license PAF directly; the maths-and-meet-spec exercise is what you're paying for.
I.A. / CSV2GEO Creator
Related Articles
- Postcode handling across countries
- Address parsing before geocoding: cleaning inputs
- 200 countries: address formats for geocoding
- Geocoding confidence scores explained
- Address standardization pipelines
Use our batch geocoding tool to convert thousands of addresses to coordinates in minutes. Start with 100 free addresses.
Try Batch Geocoding Free →