How to Geocode Large Files: Complete Guide for 10K to 1M+ Rows

How to geocode large files from 10K to 1M+ rows. Preparation, processing times, forward and reverse geocoding, pricing, and API batch endpoints.

By CSV2GEO Team | August 04, 2024
How to Geocode Large Files: Complete Guide for 10K to 1M+ Rows

Most geocoding tutorials assume you have a few hundred addresses. But real-world datasets are rarely that small. Customer databases, mailing lists, property records, and logistics files routinely contain tens of thousands to millions of rows. Geocoding at this scale requires different preparation, different expectations for processing time, and awareness of issues that simply don’t arise with small files.

This guide covers everything you need to know about geocoding large files — from 10,000 to over 1,000,000 rows. It applies to both forward geocoding (addresses to coordinates) and reverse geocoding (coordinates to addresses). Whether you’re processing a corporate address database or a national property registry, the same principles apply.

File Size Categories

CategoryRow CountTypical Use CaseEstimated Time
SmallUnder 1,000Store locators, event venues, contact listsUnder 1 minute
Medium1,000–50,000Customer databases, mailing lists, CRM exports1–10 minutes
Large50,000–500,000Property records, insurance portfolios, logistics10–60 minutes
Enterprise500,000+National registries, census data, fleet management1–4 hours

Processing time depends on data quality as much as row count. A clean 500,000-row file with separate address columns and country codes processes faster than a messy 100,000-row file with combined address strings and missing fields. Preparation is the highest-leverage step.

Preparation: Before You Upload

Large file geocoding is expensive in time and credits. Spending 15 minutes on preparation can save hours of re-processing. Follow this checklist before uploading any file over 10,000 rows.

Save as UTF-8. Encoding errors (mojibake) affect accent characters in international addresses. Save your CSV as UTF-8 from Excel: File > Save As > CSV UTF-8. For .xlsx files, encoding is handled automatically.

Use consistent columns. Every row should use the same columns in the same order. Mixed formats — some rows with separate street/city/state, others with a single combined column — cause mapping errors. Standardize before uploading.

Include country codes. For files with addresses from multiple countries, a country column with ISO 2-letter codes (US, GB, DE, BR) is essential. Without it, "Springfield" could match any of 34 US states or cities in other countries. CSV2GEO covers 200+ countries.

Remove blank rows. Large exports from databases often contain blank rows, header rows repeated mid-file, or summary rows at the bottom. These create failed geocoding attempts and waste processing time. Filter them out before uploading.

Run a pilot file. Extract the first 50–100 rows and upload them as a test. Check column mapping, relevance scores, and spot-check coordinates on the map. Fix any issues before processing the full file. This pilot is free (100 rows/day).

Forward Geocoding: Addresses to Coordinates

Forward geocoding converts street addresses into latitude/longitude coordinates. This is the most common geocoding operation for large files.

  1. Prepare your file following the checklist above. Separate columns for street, city, state, postal code, and country produce the best results.
  2. Upload to csv2geo.com/batchgeocoding. CSV2GEO accepts CSV, TSV, XLS, and XLSX files.
  3. Verify the AI column mapping. For large files, this step is critical — a wrong mapping on 500,000 rows is a costly mistake.
  4. Preview the first 10 results. Check relevance scores and verify pin locations on the map.
  5. Process the full file. You can close the browser tab — processing continues on the server. Come back later to download results.
CSV2GEO processing a large geocoding file with progress indicator showing thousands of addresses being converted to coordinates

The output file includes all your original columns plus latitude, longitude, relevance score, and matched address. Download it as CSV or Excel.

Reverse Geocoding: Coordinates to Addresses

Reverse geocoding converts latitude/longitude coordinates into street addresses. This is common for GPS data, IoT device logs, and datasets that have coordinates but need human-readable addresses.

  1. Prepare a file with latitude and longitude columns. Column names like "lat," "latitude," "lng," "longitude," "lon" are auto-detected.
  2. Upload to csv2geo.com/reversegeocoding or use the batch geocoding tool and toggle to Reverse mode.
  3. Verify column mapping — ensure latitude and longitude are correctly identified.
  4. Preview and process. The output includes the nearest street address, city, state, postal code, and country for each coordinate pair.

Reverse geocoding is particularly useful for fleet management (converting GPS tracks to addresses), mobile app data (where users checked in), and property data (converting parcel centroids to street addresses). Large reverse geocoding files are common in logistics: a fleet of 200 delivery trucks generating GPS pings every 30 seconds creates millions of coordinate pairs per month that need to be converted to street addresses for reporting and route analysis.

Processing Time Estimates

Row CountData QualityEstimated TimeNotes
10,000Clean, separate columns1–2 minutesTypical CRM export
10,000Combined address string2–4 minutesParser adds overhead
50,000Clean, with country codes5–10 minutesStandard commercial dataset
100,000Mixed quality15–30 minutesSome rows need fallback strategies
250,000Clean30–45 minutesInsurance portfolios, property lists
500,000Clean45–90 minutesNational-scale datasets
1,000,000Clean2–4 hoursEnterprise. Consider splitting into batches.

These are estimates. Actual times depend on server load, address complexity, and how many rows require fallback geocoding strategies. Files with mostly US, UK, German, or Australian addresses process faster because these countries have the deepest databases (the US alone has over 150M addresses in CSV2GEO). Files with addresses from countries with sparser coverage may take longer per row as the system tries multiple matching strategies.

You do not need to keep your browser open during processing. Upload the file, verify the preview, start processing, and close the tab. The job runs on the server and your results will be ready when you come back. You can check progress anytime from your dashboard.

Common Issues with Large Files

Large files surface problems that small test files hide. At 10,000+ rows, the statistical likelihood of encountering every type of data quality issue approaches certainty. Here are the most common problems specific to large-file geocoding:

Mixed address formats. A 200,000-row file often comes from merging multiple source databases. Rows 1–50,000 might have separate columns; rows 50,001–200,000 might have combined addresses. The column mapping that works for the first set fails for the second. Check for consistency before uploading.

Multiple countries without a country column. International datasets frequently omit the country field. The geocoder must guess the country from context, which works for distinctive addresses but fails for ambiguous ones. Adding a country column is the single highest-impact fix for international files.

Encoding corruption in the middle of the file. A file can start as clean UTF-8 but contain corrupted rows deep in the middle — often from copy-paste operations or database exports that mixed encodings. The preview (first 10 rows) looks fine, but row 87,432 has garbled characters. Scan the full file in a text editor if you suspect encoding issues.

Duplicate rows. Large exports often contain duplicates. Geocoding 50,000 duplicates wastes processing time and credits. De-duplicate before uploading. A simple "Remove Duplicates" in Excel or a pandas drop_duplicates() in Python handles this.

Output Verification Checklist

After processing a large file, do not assume all results are correct. Run these checks:

  • Relevance score distribution: How many rows scored above 0.9? Above 0.7? Below 0.5? A healthy large file should have 85–95% above 0.9.
  • Coordinate sanity check: Are all latitudes and longitudes within expected bounds? US addresses should have latitudes between 24 and 50, longitudes between -125 and -66.
  • Country mismatches: Sort by country in the output. Do unexpected countries appear? This indicates parsing errors in the input.
  • Zero coordinates: Filter for lat=0 and lng=0. These indicate complete geocoding failures (the infamous "null island" off the coast of West Africa).
  • Duplicate coordinates: Many rows returning the exact same lat/lng may indicate they all matched to a city centroid rather than individual addresses.

Forward vs Reverse: When to Use Each

Forward and reverse geocoding serve fundamentally different needs, and large files often benefit from both. Understanding when to use each saves time and produces better results.

Use forward geocoding when you have street addresses and need coordinates. This is the standard operation for customer databases, mailing lists, property records, and any dataset where humans entered address information. The output adds lat/lng columns to your existing data.

Use reverse geocoding when you have coordinates and need human-readable addresses. This is common for GPS tracking data (fleet vehicles, field workers, delivery drivers), IoT sensor logs, satellite imagery analysis, and datasets exported from GIS systems that store locations as coordinate pairs.

Some large projects need both. A national retailer might forward-geocode their store addresses for a store locator, then reverse-geocode customer GPS check-in data to understand which neighborhoods their customers come from. CSV2GEO handles both directions with the same file upload workflow — just toggle between Forward and Reverse mode.

Pricing for Large Volumes

TierRowsPricePer Row
Free100/day$0$0.000
Starter5,000$15$0.003
Standard25,000$45$0.0018
Professional100,000$95$0.00095
Enterprise500,000$295$0.00059

Volume pricing decreases per-row cost significantly. Processing 500,000 rows costs less than $0.001 each. For volumes above 500,000 or recurring processing needs, contact us for custom pricing.

461M+Total Addresses
200+Countries
1M+Max Rows Per File

Optimizing Large File Performance

Beyond data quality, several strategies can reduce processing time for large files and improve overall results.

  • Sort by country before uploading. Files with addresses grouped by country process more efficiently because the geocoder can batch queries to the same country database without switching context.
  • Remove known-bad rows. If you know certain rows have incomplete addresses (missing street names, PO boxes, "N/A" placeholders), filter them out before uploading. Geocoding them wastes time and credits.
  • Use the pilot file strategy. Extract 100 rows that represent the variety in your dataset (different countries, urban and rural, complete and sparse). Geocode this pilot first. If the results look good, process the full file with confidence.
  • Download results promptly. Large result files are available for download for a limited time after processing. Download them as soon as processing completes to avoid needing to re-process.

For Developers: API Batch Endpoint

For programmatic processing of large datasets, the geocoding API provides 18 endpoints including batch processing. Send up to 100 addresses per API call for efficient bulk geocoding.

curl -X POST "https://csv2geo.com/api/v1/batch-geocode" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "addresses": [
      {"q": "123 Main St, New York, NY 10001", "country": "US"},
      {"q": "456 Oak Ave, Los Angeles, CA 90001", "country": "US"},
      {"q": "789 Pine Rd, Chicago, IL 60601", "country": "US"}
    ]
  }'

Python example for processing a large file in batches:

from csv2geo import Client
import csv

client = Client("YOUR_API_KEY")

with open("addresses.csv") as f:
    reader = csv.DictReader(f)
    batch = []
    for row in reader:
        batch.append({"q": row["address"], "country": row["country"]})
        if len(batch) == 100:
            results = client.batch_geocode(batch)
            # process results...
            batch = []
    if batch:  # remaining rows
        results = client.batch_geocode(batch)

1,000 free API requests per day. Get your key at csv2geo.com/api-keys.

Frequently Asked Questions

How large a file can CSV2GEO handle?

CSV2GEO processes files with over 1,000,000 rows. For very large files (500K+), processing runs on the server and you can close the browser — come back later to download results. There is no hard row limit.

How long does it take to geocode 100,000 rows?

Typically 15–30 minutes for clean data with separate columns and country codes. Combined address strings or missing fields add processing time. Preview a small batch first to estimate.

Can I geocode large files for free?

The free tier provides 100 rows per day via file upload and 1,000 API requests per day. For large files, paid tiers start at $15 for 5,000 rows.

Should I split my file into smaller batches?

For files under 500,000 rows, upload as a single file. The system handles queuing and processing efficiently. Above 500,000, splitting into 250K–500K batches can be safer in case of network issues during upload.

Can I close the browser during processing?

Yes. Processing continues on the server. Come back to your dashboard to check progress and download results when complete.

How do I reverse geocode a large file?

Upload your file with lat/long columns to csv2geo.com/reversegeocoding or toggle Reverse mode in the batch tool. Same process as forward geocoding but with coordinate input.

What output format do I get?

CSV or Excel download with all your original columns plus latitude, longitude, relevance score, and the matched/resolved address. The file structure mirrors your input with geocoded columns appended.

What if some rows fail to geocode?

Failed rows appear in the output with empty coordinate fields and a low relevance score. Filter these out, fix the address data, and re-geocode just the failed subset. Common causes: incomplete addresses, non-existent streets, PO boxes.

Start Processing Your Large File

Ready to geocode? Upload your file and process up to 100 rows free every day. For large volumes, pricing starts at $15 for 5,000 rows.

For data preparation tips, see our geocoding tips guide. For CSV formatting help, see the CSV geocoding guide.

Need help? Visit our Help center or contact us.

I.A.

CSV2GEO Creator

Related Articles

Ready to geocode your addresses?

Use our batch geocoding tool to convert thousands of addresses to coordinates in minutes. Start with 100 free addresses.

Try Batch Geocoding Free →