5+1 Tips for Quick and Easy Geocoding with CSV2GEO

6 actionable tips to improve geocoding accuracy. Separate columns, country codes, UTF-8 encoding, postal codes, and data cleaning.

By CSV2GEO Team | August 04, 2024
5+1 Tips for Quick and Easy Geocoding with CSV2GEO

The difference between a 70% match rate and a 98% match rate usually has nothing to do with the geocoding engine — it’s the input data. After processing millions of address files through CSV2GEO, clear patterns emerge: the same mistakes cause the same failures, and a few simple preparation steps eliminate most of them. These five tips (plus one bonus) will help you get the most accurate results from every geocoding job.

Tip 1: Structure Your Data in Separate Columns

The single biggest improvement you can make is splitting your addresses into separate columns: street, city, state, postal code, and country. A single "full address" column forces the geocoder to parse the string and guess where each component begins and ends. Is "Springfield" a city or a street name? Is "200" a house number or part of a postal code? Separate columns remove that guesswork entirely.

ApproachExampleMatch RateWhy
Single column123 Main St Springfield IL 62701~85%Parser must guess component boundaries
Separate columns123 Main St | Springfield | IL | 62701 | US~97%Each field is unambiguous
Separate + country123 Main St | Springfield | IL | 62701 | US~99%Country eliminates cross-border confusion

If your data is already in a single column, that’s fine — CSV2GEO handles combined addresses well. But if you have the option to split columns during export from your CRM, ERP, or database, always do it. The accuracy gain is measurable and consistent.

Tip 2: Always Include the Country

Springfield exists in 34 US states. Portland appears in Maine and Oregon. Richmond is a city in Virginia, a borough in London, a suburb in Melbourne, and a town in South Africa. Without a country column, the geocoder must guess which one you mean — and it will sometimes guess wrong.

ISO CodeCountryISO CodeCountryISO CodeCountry
USUnited StatesDEGermanyAUAustralia
GBUnited KingdomFRFranceBRBrazil
CACanadaESSpainJPJapan
NLNetherlandsITItalyINIndia

Use the 2-letter ISO 3166-1 alpha-2 code (US, GB, DE, BR) rather than full country names. CSV2GEO accepts both, but ISO codes eliminate spelling variations and language differences — "Deutschland" vs "Germany" vs "Allemagne" all become simply "DE." CSV2GEO covers 200+ countries with 39 offering rooftop-level data.

Tip 3: Master File Encoding

Encoding problems are invisible until they destroy your results. The most common scenario: you open a UTF-8 CSV in Excel, edit it, and save it — Excel silently converts it to Windows-1252. Accented characters like ü, ñ, ç, é, and ã turn into garbled sequences (mojibake). "München" becomes "München," and the geocoder cannot match it. This affects every language with non-ASCII characters: German, French, Spanish, Portuguese, Turkish, Polish, Czech, and many more.

  • Always save CSVs as UTF-8. In Excel: File > Save As > CSV UTF-8.
  • European CSVs often use semicolons (;) as delimiters instead of commas. CSV2GEO auto-detects this, but verify in the preview step.
  • If you see characters like Ã, Â, or � in the preview, your file has an encoding problem. Re-export from the source as UTF-8.
  • Google Sheets exports as UTF-8 by default — a safe choice when Excel causes issues.

A quick encoding test: open your file in a plain text editor (Notepad++, Sublime Text, VS Code) and check that accented characters display correctly. If they do there, they’ll work in CSV2GEO.

Tip 4: Preview Before Processing

CSV2GEO shows you a preview of the first 10 results before processing the full file. This is your quality gate — use it. Check three things in the preview:

CSV2GEO column mapping preview screen showing address fields auto-detected before batch geocoding
  1. Column mapping: Verify that CSV2GEO correctly identified which column is Street, City, State, Zip, and Country. The AI detection is accurate but not infallible, especially with unconventional column names.
  2. Relevance scores: Urban addresses should return 0.9–1.0. If preview rows show scores below 0.7, something is wrong — usually a column mapping issue or encoding problem.
  3. Spot-check coordinates: Click a result on the preview map. Does the pin land where you expect? A pin in the wrong city means the country or state mapping is off.

The preview is free and instant. Processing a 50,000-row file only to discover the country column was mapped as "City" wastes time and rows. One minute of preview checking saves hours of re-processing.

Tip 5: Include Postal Codes

Postal codes are the single most powerful disambiguation tool in geocoding. A street name narrows results to a city; a postal code narrows results to a neighborhood or even a specific street block. When your data includes valid postal codes, match rates jump dramatically.

CountryFormatExampleLeading Zero Risk
United States5 digits (ZIP)02101Yes — Boston area starts with 0
United KingdomAlphanumericSW1A 1AANo
Germany5 digits (PLZ)01067Yes — eastern Germany starts with 0
CanadaA1A 1A1K1A 0B1No
Australia4 digits0800Yes — NT and ACT start with 0
Brazil8 digits (CEP)01310-100Yes — São Paulo starts with 0
France5 digits01000Yes — Ain department starts with 0
Netherlands4 digits + 2 letters1012 JSNo

The leading zero problem: Excel treats postal code columns as numbers and strips leading zeros. US ZIP 02101 (Boston) becomes 2101, German PLZ 01067 (Dresden) becomes 1067, and Australian postcode 0800 (Darwin) becomes 800. Format postal code columns as "Text" before entering data, or use .xlsx format.

Format postal code columns as "Text" in Excel before entering data. If you receive a file with already-stripped zeros, you may need to pad them back manually: a US ZIP of "2101" should be "02101," a German PLZ of "1067" should be "01067." A simple Excel formula like =TEXT(A1,"00000") handles this for 5-digit postal codes.

Bonus +1: Clean Out Non-Address Data

Real-world address data is messy. Fields that should contain only addresses often include delivery instructions, suite designations, attention lines, and notes that confuse geocoders. Common offenders:

  • "Attn: John Smith" prepended to the street line
  • "Leave at back door" appended to the address
  • "Suite 400, Floor 3" mixed into the street field
  • "c/o Company Name" in the address line
  • Phone numbers or email addresses in address fields

The street field should contain only the street address. Everything else — suite numbers, attention lines, delivery instructions — should go in a separate "address line 2" column or be removed. CSV2GEO ignores the second address line during geocoding, making it a safe place to store extra information without affecting accuracy.

Why Data Quality Matters More Than You Think

A geocoding engine can only be as good as the data it receives. Even the most sophisticated matching algorithm cannot find a correct location for an address that contains typos, transposed digits, or invented street names. Common data quality issues include: misspelled street names ("Brodway" instead of "Broadway"), transposed postal code digits ("10012" instead of "10021"), and outdated addresses where streets have been renamed or buildings demolished.

For large datasets, running a simple validation pass before geocoding saves significant time. Check that postal codes have the correct number of digits for their country, that state codes are valid, and that no rows have obviously incomplete data (city without a street, or a postal code without a country). A 5-minute validation step can prevent hours of re-processing on a 100,000-row file.

CSV2GEO’s relevance scores help you identify which rows had data quality issues after processing. But catching problems before processing is always more efficient than diagnosing them after. When match rates drop below 90%, the input data is almost always the cause — not the geocoding engine.

Advanced: Reading Relevance Scores

Every geocoded result includes a relevance score between 0 and 1. Understanding what these scores mean helps you identify which results need manual review.

Score RangeMeaningAction
0.95–1.0Exact match — rooftop levelNo action needed. Coordinates are precise.
0.85–0.94Strong match — street levelUsually correct. May be interpolated between known house numbers.
0.70–0.84Partial match — postal code or city levelReview these. May indicate a missing or incorrect street name.
0.50–0.69Weak match — region levelLikely wrong or very imprecise. Check input data for errors.
Below 0.50Poor match or no matchData quality issue. Address may be incomplete or invalid.

A useful post-processing strategy: filter your results by relevance score. Process the 0.95+ results immediately, manually review the 0.70–0.94 range, and flag anything below 0.70 for data cleaning and re-geocoding. For large files, this triage approach is far more efficient than reviewing every single result. Most files have 85–95% of rows scoring 0.95+, meaning only 5–15% need any attention at all.

Real-World Example: Before and After

To illustrate how these tips work in practice, consider a real scenario: a logistics company with 25,000 customer addresses spread across 12 European countries. Their first geocoding attempt produced a 72% match rate. After applying these tips, the same dataset hit 96%.

Change MadeBeforeAfterMatch Rate Impact
Added country column (ISO codes)No country fieldDE, FR, NL, ES, IT, etc.+12% (72% → 84%)
Fixed encoding to UTF-8Windows-1252 with mojibakeClean UTF-8+4% (84% → 88%)
Separated combined address columnSingle "address" fieldStreet, city, state, postal code+5% (88% → 93%)
Padded postal code leading zeros"1067" (Dresden)"01067"+2% (93% → 95%)
Removed delivery instructions"Leave with neighbor" in addressClean street only+1% (95% → 96%)

Each individual tip made a modest improvement. Combined, they transformed an unusable dataset into production-quality geocoded data. The total effort was about 2 hours of data cleaning — far less than the time that would have been spent manually correcting 7,000 failed matches.

Putting It All Together

1️⃣

Structure

Split addresses into separate columns for street, city, state, postal code, and country. This alone improves match rates by 10–15%.

2️⃣

Country

Always include a country column with ISO 2-letter codes (US, GB, DE). Eliminates cross-border ambiguity entirely.

3️⃣

Encoding

Save as UTF-8. Check for mojibake in preview. European files may use semicolon delimiters.

4️⃣

Preview

Check column mapping, relevance scores, and pin locations before processing the full file.

5️⃣

Postal Codes

Include them. Format as Text to preserve leading zeros. They dramatically narrow the search area.

Clean Data

Remove delivery instructions, attention lines, and non-address data from address fields.

Apply these tips to your next batch geocoding job. Upload your file, check the preview, and process. 100 rows per day free, no credit card. For larger volumes or programmatic access, the geocoding API provides 18 endpoints with 1,000 free requests per day.

Frequently Asked Questions

How many countries does CSV2GEO support?

CSV2GEO covers 200+ countries with 461M+ addresses in the database. 39 countries have rooftop-level data for the highest precision. The database includes both major markets and developing regions.

What is the best file format for geocoding?

CSV saved as UTF-8 or Excel .xlsx. Both preserve data accurately. Avoid .xls (legacy format) and tab-delimited files unless necessary. CSV2GEO accepts CSV, TSV, XLS, and XLSX.

How do I fix leading zeros in postal codes?

In Excel, select the postal code column, right-click > Format Cells > Text, then re-enter the values. For new files, format the column as Text before pasting data. Alternatively, save as .xlsx which preserves formatting.

What does a relevance score of 0.8 mean?

A score of 0.8 means the geocoder found a strong but not exact match — typically street-level rather than rooftop-level. The coordinates are usually close but may be interpolated. Worth spot-checking if precision matters.

Can I geocode addresses in any language?

Yes. CSV2GEO handles addresses in their local language. German addresses in German, Japanese in Japanese, Arabic in Arabic. UTF-8 encoding is essential for non-Latin scripts.

How many addresses can I geocode for free?

100 rows per day via file upload, 1,000 requests per day via the API. No credit card required.

Should I use a single address column or separate columns?

Separate columns are always better. Street, city, state, postal code, and country in their own columns give the geocoder unambiguous input. A single combined column works but expect 10–15% lower match rates.

How do I geocode addresses with special characters?

Save your file as UTF-8 encoding. CSV2GEO automatically normalizes accented characters (umlauts, tildes, cedillas) during matching. "München" and "Munchen" both resolve to Munich, Germany. The key is preventing mojibake (garbled characters) by using the correct encoding.

What is the difference between a relevance score and a confidence score?

In CSV2GEO, the relevance score (0–1.0) indicates how closely the geocoded result matches your input. A score of 1.0 means exact rooftop match. Lower scores indicate the geocoder used broader matching (street-level, postal code centroid, or city centroid). Think of it as a precision indicator for each individual result.

More Resources

For step-by-step file geocoding instructions, see the CSV geocoding guide or the Excel geocoding guide. To convert individual addresses, try address to lat long.

Need help? Visit our Help center or contact us.

I.A.

CSV2GEO Creator

Related Articles

Ready to geocode your addresses?

Use our batch geocoding tool to convert thousands of addresses to coordinates in minutes. Start with 100 free addresses.

Try Batch Geocoding Free →