PHP Geocoding API Tutorial: Guzzle Pools and Production Patterns

PHP geocoding with Guzzle async pools, retry middleware, and Laravel queue integration. Production-ready code.

| May 06, 2026

PHP Geocoding API Tutorial: Guzzle Pools and Production Patterns

PHP runs more geocoding workloads than the conference talks suggest. Magento and Shopware stores normalising checkout addresses, Laravel listing aggregators ingesting MLS feeds, WordPress directories with a million POIs, marketing-automation tools that want a lat,lng next to every CRM record. The language is everywhere this work actually happens, and the right tool for the HTTP layer has not changed in years: Guzzle 7 with async pools. You get bounded concurrency without forking, without pcntl, without spinning up Swoole or ReactPHP for a job that just needs a few thousand parallel HTTP calls.

This is a working PHP tutorial. Every snippet was written, copied into a file, and executed against the live API on a clean PHP 8.2 install with Guzzle 7.8 before publishing. By the end you will have a Guzzle pool that geocodes 1,000 addresses at concurrency 16, a retry middleware that honours Retry-After, a Laravel queue job that survives a 100K-row CSV, and a streaming League\Csv reader that never holds the whole file in memory. About 200 lines of PHP total, none of it framework-coupled unless you want it to be.

The endpoint used throughout is csv2geo.com/api/v1. The free tier is 1,000 forward/reverse requests per day plus 100 batch rows per day, no credit card required. Sign in and grab a key from /api-keys if you want to follow along.

The endpoint

Two endpoints cover the work: forward (address to coordinates) and reverse (coordinates to address). Both accept GET for single requests; forward also accepts POST with an array for batch.

# Forward (single)
GET https://csv2geo.com/api/v1/geocode?q=ADDRESS&country=US

# Reverse (single)
GET https://csv2geo.com/api/v1/reverse?lat=LAT&lng=LNG

# Batch forward (Starter plan and up)
POST https://csv2geo.com/api/v1/geocode
Body: { "addresses": ["addr1", "addr2", ...] }

# Auth: Authorization: Bearer KEY

A quick sanity check from the shell before we write any PHP:

curl -s "https://csv2geo.com/api/v1/geocode?q=1+Apple+Park+Way+Cupertino+CA&country=US" \
  -H "Authorization: Bearer $CSV2GEO_KEY" | jq .

{
  "query": "1 Apple Park Way Cupertino CA",
  "results": [
    {
      "formatted_address": "1 Apple Park Way, Cupertino, CA 95014, United States",
      "location": { "lat": 37.33177, "lng": -122.03042 },
      "accuracy": "houseNumber",
      "accuracy_score": 1,
      "components": {
        "house_number": "1",
        "street": "Apple Park Way",
        "city": "Cupertino",
        "state": "California",
        "postal_code": "95014",
        "country": "USA"
      }
    }
  ],
  "meta": { "response_time_ms": 412, "source": "here" }
}

The two fields that matter for client code are results[0].location (your {lat, lng}) and results[0].accuracy plus accuracy_score. Use them to drop low-confidence rows before they pollute the dataset downstream.

First request

Install Guzzle and you are done with dependencies for the synchronous case.

composer require guzzlehttp/guzzle:^7.8

<?php
// geocode.php
declare(strict_types=1);

require __DIR__ . '/vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Exception\RequestException;

$apiKey = getenv('CSV2GEO_KEY') ?: throw new RuntimeException('CSV2GEO_KEY not set');

$client = new Client([
    'base_uri'    => 'https://csv2geo.com/api/v1/',
    'timeout'     => 10.0,
    'connect_timeout' => 3.0,
    'headers'     => ['Authorization' => "Bearer {$apiKey}"],
    'http_errors' => true,
]);

function geocode(Client $client, string $address, string $country = 'US'): ?array
{
    try {
        $response = $client->get('geocode', [
            'query' => ['q' => $address, 'country' => $country],
        ]);
    } catch (RequestException $e) {
        $status = $e->getResponse()?->getStatusCode();
        throw new RuntimeException("geocode failed: HTTP {$status}", 0, $e);
    }

    $data    = json_decode((string) $response->getBody(), true, 512, JSON_THROW_ON_ERROR);
    $results = $data['results'] ?? [];
    return $results[0]['location'] ?? null;
}

print_r(geocode($client, '1 Apple Park Way, Cupertino, CA'));
// Array ( [lat] => 37.33177 [lng] => -122.03042 )

A few details worth pointing out. connect_timeout is set explicitly because Guzzle's default is 0 (no timeout), and a hung TCP handshake will block your worker forever. JSON_THROW_ON_ERROR is non-negotiable in 2026 PHP — the default json_decode returns null on malformed input with no exception, which is exactly the kind of silent corruption that makes you debug "lat = 0" tickets at 2am.

Reverse geocoding

Same client, different endpoint. Pass lat and lng, get back an address with a distance_meters field that tells you how far the matched rooftop sits from the input coordinate.

function reverse(Client $client, float $lat, float $lng): ?array
{
    $response = $client->get('reverse', [
        'query' => ['lat' => $lat, 'lng' => $lng],
    ]);
    $data    = json_decode((string) $response->getBody(), true, 512, JSON_THROW_ON_ERROR);
    $results = $data['results'] ?? [];
    return $results[0] ?? null;
}

// Eiffel Tower
$hit = reverse($client, 48.8584, 2.2945);
echo $hit['formatted_address'] . PHP_EOL;
echo 'distance_meters: ' . ($hit['distance_meters'] ?? 'n/a') . PHP_EOL;

If you are reverse-geocoding GPS pings from a delivery app, anything over ~50m on distance_meters usually means the GPS fix was bad, not that the geocoder missed. Below ~10m is rooftop. Treat it as the only honest accuracy metric for reverse work — confidence scores on reverse responses are softer than on forward.

Async pools for parallel work

The wrong way to geocode 1,000 addresses in PHP is a foreach loop calling geocode() synchronously. At ~400ms per request that is 400 seconds of wall time, and you will burn your daily quota staring at a progress bar.

The other wrong way is collecting 1,000 promises and calling Promise\all(). That fires every request at the API simultaneously, hits the rate limit on the second batch, and exhausts local sockets at the same time. Don't.

The right tool is GuzzleHttp\Pool, which is built for exactly this — bounded concurrency, lazy promise generation, and a per-fulfilled and per-rejected callback that fires as each request completes.

<?php
// pool_geocode.php
declare(strict_types=1);

require __DIR__ . '/vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Pool;
use GuzzleHttp\Psr7\Request;
use Psr\Http\Message\ResponseInterface;

$apiKey = getenv('CSV2GEO_KEY') ?: throw new RuntimeException('CSV2GEO_KEY not set');

$client = new Client([
    'base_uri'    => 'https://csv2geo.com/api/v1/',
    'timeout'     => 15.0,
    'connect_timeout' => 3.0,
    'headers'     => ['Authorization' => "Bearer {$apiKey}"],
    'http_errors' => false,
]);

$addresses = [
    '1600 Pennsylvania Ave NW, Washington, DC',
    '1 Apple Park Way, Cupertino, CA',
    '233 S Wacker Dr, Chicago, IL',
];

$results = [];

$requests = function (array $addresses) {
    foreach ($addresses as $i => $address) {
        $query = http_build_query(['q' => $address, 'country' => 'US']);
        yield $i => new Request('GET', "geocode?{$query}");
    }
};

$pool = new Pool($client, $requests($addresses), [
    'concurrency' => 16,
    'fulfilled'   => function (ResponseInterface $response, int $index) use (&$results): void {
        if ($response->getStatusCode() !== 200) {
            $results[$index] = ['error' => 'http_' . $response->getStatusCode()];
            return;
        }
        $data            = json_decode((string) $response->getBody(), true);
        $top             = $data['results'][0] ?? null;
        $results[$index] = $top['location'] ?? ['error' => 'no_match'];
    },
    'rejected'    => function (\Throwable $reason, int $index) use (&$results): void {
        $results[$index] = ['error' => 'transport:' . $reason::class];
    },
]);

$pool->promise()->wait();

ksort($results);
print_r($results);

Two details. The $requests closure is a generator — Guzzle pulls requests lazily, so 1,000,000 addresses do not allocate 1,000,000 PSR-7 objects up front. And concurrency is the only knob you should be tuning here, not connection-pool size: Guzzle handles socket reuse internally. For the math behind the right concurrency value, see concurrency tuning for geocoding.

A useful starting rule: take your plan's per-minute rate limit, divide by 60, multiply by ~1.5. Free is 100/min — start at 4. Starter ($49, 1K/min) — 16. Growth ($149, 5K/min) — 64. Pro ($499, 10K/min) — 128 and watch the headers.

Retry middleware and exponential backoff

A production pool needs three behaviours: retry on 429 and 5xx, honour Retry-After, and cap attempts so a permanently broken key does not loop forever. Guzzle's middleware stack composes cleanly for this.

<?php
// retry.php
declare(strict_types=1);

use GuzzleHttp\Client;
use GuzzleHttp\HandlerStack;
use GuzzleHttp\Middleware;
use Psr\Http\Message\RequestInterface;
use Psr\Http\Message\ResponseInterface;

function makeClient(string $apiKey, int $maxAttempts = 5): Client
{
    $stack = HandlerStack::create();

    $stack->push(Middleware::retry(
        function (
            int $retries,
            RequestInterface $request,
            ?ResponseInterface $response = null,
            ?\Throwable $exception = null
        ): bool {
            if ($retries >= 5) {
                return false;
            }
            if ($exception instanceof \GuzzleHttp\Exception\ConnectException) {
                return true;
            }
            if ($response === null) {
                return false;
            }
            $status = $response->getStatusCode();
            return $status === 429 || $status >= 500;
        },
        function (
            int $retries,
            ?ResponseInterface $response = null
        ): int {
            if ($response !== null && $response->hasHeader('Retry-After')) {
                $after = (int) $response->getHeaderLine('Retry-After');
                if ($after > 0) {
                    return $after * 1000;
                }
            }
            $base = (2 ** $retries) * 1000;
            $jitter = random_int(0, 1000);
            return $base + $jitter;
        }
    ), 'retry');

    return new Client([
        'base_uri'    => 'https://csv2geo.com/api/v1/',
        'handler'     => $stack,
        'timeout'     => 15.0,
        'connect_timeout' => 3.0,
        'headers'     => ['Authorization' => "Bearer {$apiKey}"],
        'http_errors' => false,
    ]);
}

The first callable decides *whether* to retry; the second decides *how long to wait*. Splitting them like this is what makes the middleware composable — you can swap retry logic for circuit-breaker logic without touching the delay function. Jitter matters when many workers retry simultaneously: without it, every worker wakes up at the same instant and re-DDoSes the server. The deeper math on retry budgets and dead-letter queues is in exponential backoff: when to retry, when to stop.

The headers worth watching on every successful response:

| Header | Meaning | |---|---| | X-RateLimit-Limit | Your plan's per-minute ceiling | | X-RateLimit-Remaining | What you have left in the current window | | X-RateLimit-Reset | Unix seconds until the window resets | | Retry-After | Sent on 429s — seconds to wait before retrying |

If Remaining drops below 10% of Limit partway through a batch, slow down voluntarily — sleep, lower your pool's concurrency, or switch to the batch endpoint. Cheaper than thrashing on 429s and far cheaper than a sudden burst that takes the pipeline offline. The taxonomy of every error class a geocoder produces is in error handling for geocoding APIs.

Laravel queue integration

A real geocoding workload is rarely a single CLI script — it is one of a hundred jobs the application enqueues per second. Laravel's queue worker pattern fits cleanly. Define a job that geocodes a single row, dispatch one per CSV line, let the queue handle concurrency and retries.

<?php
// app/Jobs/GeocodeAddress.php
declare(strict_types=1);

namespace App\Jobs;

use App\Models\Lead;
use App\Services\GeocodeClient;
use Illuminate\Bus\Queueable;
use Illuminate\Contracts\Queue\ShouldQueue;
use Illuminate\Foundation\Bus\Dispatchable;
use Illuminate\Queue\InteractsWithQueue;
use Illuminate\Queue\SerializesModels;

class GeocodeAddress implements ShouldQueue
{
    use Dispatchable, InteractsWithQueue, Queueable, SerializesModels;

    public int $tries = 5;
    public int $timeout = 30;
    public bool $deleteWhenMissingModels = true;

    public function __construct(public readonly int $leadId) {}

    public function handle(GeocodeClient $geocoder): void
    {
        $lead = Lead::findOrFail($this->leadId);
        $hit  = $geocoder->forward($lead->address, $lead->country ?? 'US');

        if ($hit === null || ($hit['accuracy_score'] ?? 0) < 0.7) {
            $lead->update(['geocode_status' => 'low_confidence']);
            return;
        }

        $lead->update([
            'lat'             => $hit['location']['lat'],
            'lng'             => $hit['location']['lng'],
            'geocode_status'  => 'ok',
            'geocode_source'  => $hit['source'] ?? 'csv2geo',
        ]);
    }

    public function backoff(): array
    {
        return [1, 4, 16, 60, 300];
    }

    public function failed(\Throwable $e): void
    {
        Lead::find($this->leadId)?->update([
            'geocode_status' => 'failed',
            'geocode_error'  => substr($e->getMessage(), 0, 500),
        ]);
    }
}

Three pieces matter. tries = 5 plus the explicit backoff() array gives Laravel its own retry layer — independent of the Guzzle middleware. They are not redundant: Guzzle's middleware retries inside one job execution (transient blips), Laravel's tries retries across worker invocations (worker crash, deploy, DB outage). failed() runs after the last attempt and writes the row state somewhere durable so an operator can decide what to do.

For batch design tradeoffs (one queue vs many, visibility timeouts, custom Postgres queues), see designing a batch geocoding queue.

Streaming CSV via League\Csv

If your input file is 100,000 rows, you do not want to file_get_contents it. You want an iterator that reads one record at a time and a Pool that drains chunks of those records concurrently.

composer require league/csv:^9.16

<?php
// stream_geocode.php
declare(strict_types=1);

require __DIR__ . '/vendor/autoload.php';

use GuzzleHttp\Client;
use GuzzleHttp\Pool;
use GuzzleHttp\Psr7\Request;
use League\Csv\Reader;
use League\Csv\Writer;
use Psr\Http\Message\ResponseInterface;

[$inPath, $outPath] = [$argv[1], $argv[2]];
$apiKey = getenv('CSV2GEO_KEY') ?: throw new RuntimeException('CSV2GEO_KEY not set');

$client = new Client([
    'base_uri'    => 'https://csv2geo.com/api/v1/',
    'timeout'     => 15.0,
    'connect_timeout' => 3.0,
    'headers'     => ['Authorization' => "Bearer {$apiKey}"],
    'http_errors' => false,
]);

$reader = Reader::createFromPath($inPath, 'r');
$reader->setHeaderOffset(0);

$writer = Writer::createFromPath($outPath, 'w');
$writer->insertOne(['id', 'address', 'lat', 'lng', 'error']);

$chunkSize = 500;
$concurrency = 16;
$buffer = [];

$drain = function (array $rows) use ($client, $writer, $concurrency): void {
    $output = [];

    $requests = function () use ($rows) {
        foreach ($rows as $i => $row) {
            $query = http_build_query(['q' => $row['address'], 'country' => $row['country'] ?? 'US']);
            yield $i => new Request('GET', "geocode?{$query}");
        }
    };

    $pool = new Pool($client, $requests(), [
        'concurrency' => $concurrency,
        'fulfilled'   => function (ResponseInterface $resp, int $i) use (&$output, $rows): void {
            $row = $rows[$i];
            if ($resp->getStatusCode() !== 200) {
                $output[$i] = [$row['id'], $row['address'], '', '', 'http_' . $resp->getStatusCode()];
                return;
            }
            $data = json_decode((string) $resp->getBody(), true);
            $top  = $data['results'][0] ?? null;
            if ($top === null) {
                $output[$i] = [$row['id'], $row['address'], '', '', 'no_match'];
                return;
            }
            if (($top['accuracy_score'] ?? 0) < 0.7) {
                $output[$i] = [$row['id'], $row['address'], '', '', 'low_confidence'];
                return;
            }
            $output[$i] = [
                $row['id'], $row['address'],
                $top['location']['lat'], $top['location']['lng'], '',
            ];
        },
        'rejected'    => function (\Throwable $e, int $i) use (&$output, $rows): void {
            $row = $rows[$i];
            $output[$i] = [$row['id'], $row['address'], '', '', 'transport:' . $e::class];
        },
    ]);

    $pool->promise()->wait();
    ksort($output);
    $writer->insertAll($output);
};

foreach ($reader->getRecords() as $row) {
    $buffer[] = $row;
    if (count($buffer) >= $chunkSize) {
        $drain($buffer);
        $buffer = [];
    }
}
if ($buffer !== []) {
    $drain($buffer);
}

echo "done -> {$outPath}" . PHP_EOL;

League\Csv\Reader::getRecords() returns an Iterator, so the file is read one row at a time. The chunk-and-drain pattern bounds two things at once: peak memory (only chunkSize rows in scope), and peak concurrency (Guzzle's pool runs concurrency requests at a time).

Things that bite PHP developers

Default Guzzle has no request timeout. Both timeout and connect_timeout default to 0, which means infinite. A hung TCP handshake or a server that accepts the connection and never replies will block your worker until something external kills it. Always set both, and set them low — 3 seconds for connect, 10–15 for total.

`json_decode` returns null on bad input. No exception, no warning, just null. Always pass flags: JSON_THROW_ON_ERROR (PHP 7.3+) and catch JsonException at the call site.

Pool ignores `concurrency` if you give it the wrong iterator. The pool consumes its iterable lazily *only if it is a generator*. If you pass array_map(...) or any pre-materialised array, every request is allocated up front. Always yield from a generator function.

FPM workers vs CLI for batch jobs. PHP-FPM is tuned to die after one request. Long-running geocode jobs belong in php artisan queue:work or a plain php worker.php under supervisord. Running a 100K-row batch through an FPM endpoint and watching it 504 is a rite of passage that you can skip.

Frequently Asked Questions

Do I need an SDK?

No. The API is REST plus JSON, Guzzle 7 covers both sync and async, and the entire client surface fits in the geocode() and pool functions above plus a retry middleware. About 80 lines for a complete typed client. Maintaining your own thin wrapper is cheaper than tracking an SDK release cadence and dealing with its dependency conflicts inside a Laravel app.

Symfony or Laravel — does it matter?

Not for the HTTP layer. Both ship Guzzle as the de facto HTTP client, and the code in this post drops into either framework unchanged. The integration differs only at the queue boundary: Laravel uses jobs and php artisan queue:work, Symfony uses Messenger and messenger:consume. The retry knobs (tries, backoff()) on Laravel jobs map roughly to the RetryStrategy in Messenger.

Symfony HttpClient instead of Guzzle?

Workable. Symfony\Component\HttpClient has built-in concurrent stream support via HttpClientInterface::stream(), which fills the same role as Guzzle's Pool. The retry-with-backoff equivalent is RetryableHttpClient. The choice is mostly philosophical.

How do I tune concurrency?

Take your plan's per-minute rate limit, divide by 60, multiply by ~1.5. So Starter (1,000/min) lands at concurrency 16–24. The truth source is the X-RateLimit-Remaining header — log it on every successful response and watch for it falling faster than 1 per request. The full curve and the math behind it is in concurrency tuning for geocoding.

How do I tell a no-match from a low-confidence match?

results is an empty array on a true no-match. On a low-confidence match, results[0] exists but accuracy is "postcode" or "place" rather than "houseNumber" or "street", and accuracy_score sits well below 1.0. A reasonable default threshold is 0.7. Full picture in geocoding confidence scores explained.

Does this work for non-US addresses?

Yes. Pass the right ISO alpha-2 in the country parameter — DE for Germany, GB for the UK, BR for Brazil, JP for Japan. Coverage spans 39 countries today, including the full top 10 by address count.

Should I use single requests or the batch endpoint?

Batch when you have ≥100 addresses to do at once and your plan allows it (Starter and up). One POST is cheaper for both sides than 100 GETs. The plan caps:

| Plan | Monthly rows | Per-minute | Batch size | |---|---|---|---| | Free | 1K/day API, 100/day batch | 100 | 100 | | Starter ($49) | 50,000 | 1,000 | 1,000 | | Growth ($149) | 250,000 | 5,000 | 5,000 | | Pro ($499) | 1,000,000 | 10,000 | 10,000 |

function batchGeocode(Client $client, array $addresses): array
{
    $response = $client->post('geocode', [
        'json'    => ['addresses' => $addresses],
        'timeout' => 60.0,
    ]);
    return json_decode((string) $response->getBody(), true, 512, JSON_THROW_ON_ERROR)['results'];
}

Batch responses preserve input order — position N of the response always corresponds to position N of the input.

Where to go from here

The full reference for the API is at csv2geo.com/api. If you would rather work in Python, the Python geocoding tutorial follows the same structure with httpx and asyncio. For the Node version, see the Node.js tutorial.

Deeper dives on the patterns this post touches: designing a batch geocoding queue, exponential backoff: when to retry, when to stop, error handling for geocoding APIs, and monorepo patterns for multi-language geocoding clients.

If you find the X-RateLimit headers undercounting in some edge case, or you hit a response shape this post does not cover, the contact form on the site reaches a person who reads it. Bug reports with curl (or Guzzle) reproductions get fixed quickly.

I.A. / CSV2GEO Creator