Solar site screening: 10,000 candidate roofs in an afternoon
Screen 10,000 rooftop solar candidates in hours using elevation profiles and places data. REST patterns, Python + Node code, cost math included.
A solar development team running a prospecting campaign has a familiar problem. The business development side hands over a spreadsheet of 10,000 addresses — residential parcels, commercial rooftops, municipal buildings — and asks which ones are worth sending a field surveyor to. The addresses came from county parcel records, utility interconnection queues, and a bought list of large flat-roof commercial properties. Some of these sites are excellent. Most are not. The team needs to cut the list to a workable subset of 400–600 high-confidence candidates before the end of the week, because the surveyors are booked and site visits are expensive.
The traditional GIS workflow for this takes days. Download a DEM. Clip it to the study area. Join the addresses. Run a slope analysis. Join again against a utility substations layer. Export results. Repeat when the address list changes. Most solar teams without a dedicated GIS analyst either skip the screening step entirely — and spend surveyor time on junk sites — or block on a contractor who charges project-rate pricing for what amounts to a bulk lookup.
The faster path uses two API endpoints: elevation (batched up to 500 points per call) and places (proximity to infrastructure). The full pipeline for 10,000 addresses runs in well under an hour on a laptop, costs less than a single site visit, and produces a ranked candidate list with enough signal to brief the surveyors without a GIS analyst in the room.
This post walks through the complete pipeline: why these two signals matter for solar, how to call the endpoints, how to combine the outputs into a site score, and where the approach has honest limits.
Why elevation and places, specifically
Solar site screening has a long list of variables — shading from nearby trees, roof age and material, structural load capacity, utility rate tariffs, interconnection queue position — and most of them require either a field visit or a licensed data contract to assess reliably. Two signals you can get in bulk for essentially nothing are elevation and infrastructure proximity. Both are stronger filters than they look.
Elevation as a slope proxy. A rooftop solar installation on a steeply sloping site has higher racking cost, higher labour cost, and lower array efficiency for a given panel count. Ground-mount projects on steep slopes have even worse economics: grading, erosion control, and structural foundations eat margin fast. Elevation alone does not tell you slope — for that you need two points — but the delta between an address and its immediate neighbours is a cheap and useful slope proxy, and you can compute it from a batch elevation call against the candidate point plus a small ring of surrounding points.
Elevation as a local climate signal. High-altitude sites see more solar irradiance on average — thinner atmosphere — but more wind, more hail, and harder logistics for installation crews. A site at 1,597 m (Denver) is in a different risk category from a site at 1 m (coastal Miami), even if both have identical rooftops. For a developer underwriting a large portfolio, elevation is the fastest available proxy for regional climate exposure: snow load at high altitude, salt-air corrosion near sea level, freeze-thaw cycling in mountainous terrain. None of these dominate the go/no-go decision, but all of them shift the pro forma.
Places proximity as infrastructure fitness. A rooftop solar project that has a distribution substation 300 m away has an interconnection path that is orders of magnitude cheaper than a project that needs a new medium-voltage run to the nearest substation 4 km out. Similarly, a commercial-scale ground-mount near an industrial park has a different interconnection story from one in a purely residential neighbourhood. The Places API can search for utility infrastructure, industrial zones, and road access points within a configurable radius — giving you a coarse but useful "infrastructure fitness" score per candidate that correlates strongly with interconnection cost.
Together, these two signals let you eliminate the bottom third to half of a candidate list without spending a single surveyor day. That is the value of the pipeline.
What the API gives you
Two endpoints do the work here. Both are REST, both use the same API key, both return JSON.
`GET /api/v1/elevation` — accepts up to 500 lat,lng points per request as a |-separated string and returns an elevation in metres for each, in the same order. Global coverage at roughly 30 m horizontal resolution. Anchor probes to sanity-check the data: Mt Everest returns 8,731 m, Dead Sea shore returns −415 m, Death Valley's Badwater Basin returns −80 m, Mauna Kea's summit returns 4,198 m, Denver returns approximately 1,597 m, Miami approximately 1 m. Negative numbers are real — if you see 0 m for coastal-below-sea-level points, the provider is clamping, not measuring.
`GET /api/v1/places/nearby` — accepts a lat, lng, radius (metres), and a category filter, and returns a list of matching places within that radius. Adding ?include=elevation to this call returns the elevation of each matching place alongside its name, location, and category — no second round-trip. This is the endpoint that tells you "there is a utility substation 280 m from this candidate address" or "the nearest industrial road access is 1.2 km out."
Both are documented at csv2geo.com/api. Both are covered under the same free tier (3,000 calls/day, no credit card) and the same paid plans starting at $54/month for 100,000 calls.
Building the screening pipeline
The pipeline has five steps. Steps 1 through 3 gather data. Step 4 scores candidates. Step 5 exports for the surveyors.
Step 1: Geocode and validate the address list
The raw address list from county parcel records is never clean. Abbreviated street types, transposed house numbers, missing unit designators — geocoding the full list surfaces the problems before you spend elevation or places credits on junk rows.
curl -G "https://csv2geo.com/api/v1/geocode" \
--data-urlencode "q=1234 Commerce Park Dr, Denver, CO 80239" \
--data-urlencode "api_key=$CSV2GEO_KEY"In Python, batching the geocodes:
import csv
import os
import time
import requests
API = "https://csv2geo.com/api/v1"
KEY = os.environ["CSV2GEO_API_KEY"]
def geocode_batch(addresses):
"""Geocode a list of address strings. Returns list of result dicts."""
results = []
for addr in addresses:
r = requests.get(
f"{API}/geocode",
params={"q": addr, "api_key": KEY},
timeout=30,
)
r.raise_for_status()
data = r.json()
if data.get("results"):
top = data["results"][0]
results.append({
"address": addr,
"lat": top["lat"],
"lng": top["lng"],
"confidence": top.get("confidence", 0),
})
else:
results.append({"address": addr, "lat": None, "lng": None, "confidence": 0})
time.sleep(0.05) # polite pacing; see rate-limiting post
return resultsAny result with confidence < 0.7 goes into a manual-review pile. Do not spend elevation or places credits on an address you cannot geocode reliably — those credits will produce a number attached to the wrong parcel, which is worse than no number at all.
After this step you have a CSV of address, lat, lng, confidence for all candidates that geocoded cleanly. In practice, on a typical county parcel list, 5–10% of rows fail or score below 0.7. Flag them; do not silently drop them.
Step 2: Pull elevation for every candidate (and their neighbours)
A single elevation point per candidate tells you the site's altitude. To estimate slope — which is what actually matters for solar — you need the elevation of a ring of points around each candidate. A simple approach: for each candidate coordinate, generate four offset points (±200 m in lat and lng), batch all five together in the elevation call, and compute the max delta as a slope proxy.
Generating the offset ring:
OFFSET_M = 200
DEG_PER_M_LAT = 1 / 111_320
def offset_ring(lat, lng):
"""Return the centre + 4 cardinal offset points for slope estimation."""
dlat = OFFSET_M * DEG_PER_M_LAT
dlng = OFFSET_M * DEG_PER_M_LAT / max(abs(lat), 0.001) # cos(lat) approximation
return [
(lat, lng), # centre
(lat + dlat, lng), # north
(lat - dlat, lng), # south
(lat, lng + dlng), # east
(lat, lng - dlng), # west
]Batch elevation call for up to 100 candidates at once (100 × 5 points = 500 — the per-call limit):
def fetch_elevations(point_list):
"""
point_list: list of (lat, lng) tuples, max 500.
Returns list of elevation_m values in the same order.
"""
pts = "|".join(f"{lat},{lng}" for lat, lng in point_list)
r = requests.get(
f"{API}/elevation",
params={"points": pts, "api_key": KEY},
timeout=30,
)
r.raise_for_status()
return [item.get("elevation_m") for item in r.json()["results"]]
def slope_proxy(elevations):
"""Max elevation delta across a 5-point ring, in metres."""
valid = [e for e in elevations if e is not None]
if len(valid) < 2:
return None
return max(valid) - min(valid)For 10,000 candidates at 5 points each — 50,000 elevation lookups — you need 100 API calls (500 points per call). At a conservative 1-second round-trip per call, that is under two minutes of wall-clock time.
The elevation call in Node, for teams running the pipeline in a JS environment:
const API = 'https://csv2geo.com/api/v1';
const KEY = process.env.CSV2GEO_API_KEY;
async function fetchElevations(points) {
// points: [{lat, lng}, ...]
const pts = points.map(p => `${p.lat},${p.lng}`).join('|');
const url = `${API}/elevation?points=${encodeURIComponent(pts)}&api_key=${KEY}`;
const r = await fetch(url);
if (!r.ok) throw new Error(`elevation http ${r.status}`);
const data = await r.json();
return data.results.map(item => item.elevation_m ?? null);
}After this step, each candidate row has elevation_m (the site's altitude) and slope_proxy_m (the elevation spread across the 400 m bounding ring). A slope proxy above 25 m over 400 m — roughly a 6° average — is a flag for elevated installation cost. Adjust the threshold for your geography: this is calibrated for the US midwest; it needs to shift for the Rocky Mountain west or the Appalachians.
Step 3: Check infrastructure proximity with Places
For each candidate that passed the slope filter, query the Places nearby endpoint for utility infrastructure within a configurable search radius. A 500 m radius is a reasonable starting point for distribution-scale rooftop projects; a 2 km radius is more appropriate for ground-mount or community solar.
The call with elevation included:
curl -G "https://csv2geo.com/api/v1/places/nearby" \
--data-urlencode "lat=39.7392" \
--data-urlencode "lng=-104.9903" \
--data-urlencode "radius=1000" \
--data-urlencode "categories=utility,industrial,substation" \
--data-urlencode "limit=5" \
--data-urlencode "include=elevation" \
--data-urlencode "api_key=$CSV2GEO_KEY"In Python:
def check_infrastructure(lat, lng, radius_m=1000):
"""
Returns list of nearby infrastructure places with distance and elevation.
"""
r = requests.get(
f"{API}/places/nearby",
params={
"lat": lat,
"lng": lng,
"radius": radius_m,
"categories": "utility,industrial,substation",
"limit": 5,
"include": "elevation",
"api_key": KEY,
},
timeout=30,
)
if r.status_code == 404:
return [] # no places found — not an error
r.raise_for_status()
return r.json().get("results", [])Each result carries a distance_m field. The nearest infrastructure distance is the key signal: under 300 m is excellent, 300–800 m is workable, above 1 km starts to require project-specific interconnection analysis. The elevation of the infrastructure point relative to the candidate site tells you something about the routing terrain — a substation 10 m below the site on a hillside has different cable-run economics from one on the same flat plateau.
For a 10,000-candidate list, you typically run the Places call only against candidates that passed the slope filter — perhaps 6,000–7,000 of the original list. That is 6,000–7,000 API calls, each returning up to 5 results. At the free tier ceiling of 3,000 calls/day this is a two-day job; on a paid plan it completes in well under an hour with modest concurrency.
Step 4: Score and rank the candidates
With elevation, slope proxy, and infrastructure proximity in hand, a scoring function is straightforward. The exact weights belong to your business — the function below is a starting template, not a prescription.
def site_score(elevation_m, slope_proxy_m, nearest_infra_m):
"""
Returns a score from 0 (terrible) to 100 (excellent).
Higher is better. Adjust weights to match your project economics.
"""
score = 50 # baseline
# Elevation band: high altitude = more irradiance but more risk
if elevation_m is None:
score -= 10
elif elevation_m > 2500:
score -= 5 # high-altitude logistics penalty
elif 500 < elevation_m <= 2000:
score += 5 # sweet spot for many US markets
# Slope: lower delta is better
if slope_proxy_m is None:
score -= 5
elif slope_proxy_m <= 10:
score += 20 # essentially flat
elif slope_proxy_m <= 25:
score += 10 # gentle slope, manageable
elif slope_proxy_m <= 50:
score -= 10 # steep: cost uplift likely
else:
score -= 25 # very steep: likely uneconomic
# Infrastructure proximity
if nearest_infra_m is None:
score -= 15 # no infrastructure found in radius
elif nearest_infra_m <= 300:
score += 25 # excellent interconnection position
elif nearest_infra_m <= 800:
score += 10 # workable
elif nearest_infra_m <= 1500:
score -= 5 # possible but project-specific
else:
score -= 20 # likely expensive interconnection
return max(0, min(100, score))Sort the full list descending by score. The top 400–600 candidates go to the surveyors. The bottom third — sites that score below 30 — can be archived with a brief reason code attached (slope_too_steep, no_infrastructure_within_1500m, low_confidence_geocode) so the BD team understands what happened to sites they championed.
Step 5: Export the ranked list for field teams
The surveyor-ready CSV has one row per candidate, sorted by score descending, with columns the field team can actually use:
import csv
OUTPUT_COLS = [
"rank", "address", "lat", "lng",
"elevation_m", "slope_proxy_m", "nearest_infra_m", "infra_name",
"site_score", "geocode_confidence", "flag"
]
def export_results(candidates, out_path="candidates_ranked.csv"):
candidates.sort(key=lambda c: c["site_score"], reverse=True)
with open(out_path, "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=OUTPUT_COLS, extrasaction="ignore")
writer.writeheader()
for rank, c in enumerate(candidates, start=1):
c["rank"] = rank
c["flag"] = _flags(c)
writer.writerow(c)
def _flags(c):
flags = []
if c.get("geocode_confidence", 1) < 0.8:
flags.append("verify_address")
if (c.get("slope_proxy_m") or 0) > 40:
flags.append("slope_review")
if (c.get("elevation_m") or 0) > 2000:
flags.append("high_altitude")
if (c.get("nearest_infra_m") or 9999) > 1200:
flags.append("interconnection_study")
return "|".join(flags) if flags else "clean"The flag column is the surveyor's pre-read. A site marked clean has no automated concerns; a site marked slope_review|interconnection_study needs the field engineer to bring a tape measure and a utility contact. Surveyors spend their limited time on the clean sites first, the flagged sites second, and never drive to a site that scored below 25.
Cost and time budget for 10,000 addresses
Here is the honest accounting for a full pipeline run.
| Step | API calls | Credits | |---|---|---| | Geocode 10,000 addresses | 10,000 | 10,000 | | Elevation (5 pts × 10,000 candidates) | 100 | 50,000 | | Places nearby (6,500 post-filter candidates) | 6,500 | 6,500 | | Total | 16,600 | 66,500 |
At $54/month for 100,000 calls, the entire screening run fits within a single monthly quota with room to spare. The marginal cost per candidate is under $0.007. A single wasted surveyor day — travel, time on site, report — runs $500–$1,500. The pipeline pays for itself the moment it prevents two or three bad site visits per month.
Wall-clock time, running sequentially on a laptop with polite pacing: approximately 45–90 minutes for the full 10,000. With modest concurrency (10 parallel workers on the geocode and places steps), closer to 15–20 minutes. For a team that runs this pipeline weekly during an active prospecting campaign, wrapping it in a simple queue-based job and scheduling it overnight is straightforward.
The free tier (3,000 calls/day, no credit card required) comfortably supports a pilot run of 500–600 candidates before you commit to a paid plan.
Where the approach has real limits
The honest section. A pipeline that presents only strengths is a sales document. A pipeline that presents strengths and limits is engineering.
Elevation is terrain, not shading. The elevation API measures ground height above sea level. It does not measure tree canopy height, neighbouring building height, or roof pitch. A flat 0° rooftop at sea level with a six-storey building to its south is a poor solar site. The elevation and slope proxy will score it highly. Shading analysis requires a different dataset — typically a digital surface model or a LiDAR-derived canopy height model — and is outside the scope of what a REST elevation endpoint can deliver.
Places proximity is a coarse filter, not an interconnection study. Finding a substation within 500 m is a positive signal, not a connection offer. The actual interconnection process involves a utility queue, a system impact study, and a timeline that can stretch 18–36 months in congested areas regardless of physical proximity. Use the places signal to deprioritise sites with no infrastructure nearby, not to promise clients a fast interconnection.
The slope proxy is a 400 m average, not a roof measurement. The four-point ring approach estimates local terrain relief. A flat warehouse roof on a gently sloping hillside will score its terrain slope, which may be low — that is correct and useful. But a residential rooftop with a steeply pitched gable will not have that pitch captured by terrain elevation alone. Roof pitch measurement requires aerial imagery, which is a separate endpoint and covered in Per-Policy Roof and Terrain Snapshots Without Satellite Licenses.
Coverage is global for elevation, US-only for aerial imagery. If your prospecting campaign includes international addresses, the elevation pipeline works everywhere. The aerial-imagery endpoint — which you might want to add to the pipeline as a next step — covers the contiguous United States plus Alaska, Hawaii, and Puerto Rico only. Build the international fallback path from the start.
Extending the pipeline: what to add next
Once the base pipeline is running, three extensions add significant signal without much additional complexity.
Aerial imagery per top-tier candidate. For the top 100–150 sites by score, pull a top-down aerial image per parcel. This adds one credit per address and gives the surveyor a visual before they drive out — useful for confirming roof type and identifying obvious obstructions. The pattern is identical to the insurance underwriting workflow in Per-Policy Roof and Terrain Snapshots Without Satellite Licenses.
Caching the elevation results. Terrain does not change. If your prospecting campaigns revisit overlapping geographies — which they always do — cache elevation results to a local database keyed by rounded coordinate (four decimal places, ~11 m precision). The cache hit rate on a second campaign run through the same county is typically above 60%. See Caching Geocoding Results — 90% Cost Reduction for the database caching pattern that applies directly here.
Observability on the pipeline itself. A screening pipeline that runs weekly needs monitoring. How many candidates failed geocoding this week versus last? Is the average site score drifting (a sign that the input list quality changed)? Are elevation call failures increasing (a sign of a rate-limit or quota issue)? Instrument the pipeline with simple counters per step and alert on anomalies. Observability for Geocoding Pipelines covers the metric shapes and alerting thresholds that transfer cleanly to this use case.
Frequently Asked Questions
Can this pipeline handle commercial rooftops differently from residential?
Yes, and it should. Commercial flat rooftops at large warehouses score differently on slope (nearly zero slope proxy is expected and excellent) and on infrastructure proximity (industrial parks often have nearby distribution infrastructure). Pass a type column through from the input CSV and apply different scoring weights — commercial candidates might weight infrastructure proximity more heavily, residential candidates might weight slope proxy more heavily.
What if a candidate address geocodes to the wrong parcel?
This is the most common data-quality failure on county parcel lists. The geocode confidence score is your first filter — anything below 0.7 should go to manual review. Beyond that, visual inspection of the top-tier candidates with aerial imagery (one extra credit per address) will catch most mis-geocodes before a surveyor wastes a day on the wrong building.
How should I handle elevation returning `null` for some candidates?
null is a real response for points where the DEM has no data — typically open water or certain edge tiles. Branch on is None (Python) or === null (JavaScript), never on a falsy check — because a real sea-level elevation of 0 m is encoded as the integer 0. Candidates returning null elevation get flagged as elevation_unavailable and dropped to the bottom of the ranked list; do not eliminate them silently.
Is the free tier enough for a real screening campaign?
A 500-address pilot fits within a single day's free tier allocation (3,000 calls/day) with room to spare. A 10,000-address full run needs a paid plan — the maths are in the cost table above. Starting a pilot on the free tier, validating the pipeline output, and then upgrading for the full run is the standard pattern. No credit card required to start.
What concurrency level is safe for the places nearby calls?
Ten concurrent workers is a safe starting point — enough to cut wall-clock time significantly without triggering rate-limit responses. If you see HTTP 429 responses, back off to five workers and add jitter between retries. The detailed guidance is in Concurrency Tuning for Geocoding Pipelines.
Should I re-run the pipeline on the same address list next quarter?
Only for new candidates or candidates whose status changed. Elevation and terrain do not change on quarterly timescales. Cache the elevation and slope results per coordinate and only re-query places proximity if the search radius or category list changed. Re-running geocodes is worth doing for any address that previously returned low confidence — sometimes a clean-up to the input address string yields a better result.
Can I use this pipeline for ground-mount solar prospecting, not just rooftops?
Yes, with adjusted parameters. Ground-mount sites need a larger places search radius (2 km rather than 500 m), a tighter slope filter (ground-mount economics break faster on terrain than rooftop racking does), and an additional check for land-use zoning — which the places category filter can approximate but cannot fully replace. The pipeline structure is identical; the weights and thresholds shift.
Related Articles
- Adding elevation to property data — one API call per address — the foundation for the elevation pipeline used throughout this post
- Per-policy roof and terrain snapshots without satellite licenses — adding aerial imagery to the top-tier candidates after scoring
- Benchmarking geocoding APIs — honest numbers — how to evaluate the geocoding step in a pipeline like this one
- Caching geocoding results — 90% cost reduction — why terrain data should be cached and how to structure the cache
- Dispatch console: 5,000 stops per day — the same batching and concurrency patterns applied to a logistics routing problem
---
*I.A. / CSV2GEO Creator*
Use our batch geocoding tool to convert thousands of addresses to coordinates in minutes. Start with 100 free addresses.
Try Batch Geocoding Free →