How to Reverse Geocode Data from a Large GPS-IERS Data File?
Many people are familiar with geocoding in their everyday lives even if they don’t recognize it by name. For example, anyone using the Google Maps app to get driving directions is using geocoding. Today at least a quarter of all internet searches involve some type of mapped address as a result. It has become like the “bread and butter” of internet business. Latitude and longitude data provided by geocoded address results are considered quite normal. However, the term “geocoding” is not that popular and often creates confusion to the general population. Things get more complicated when we next mention “reverse batch geocoding” of files. Even more complicated yet when we discuss “reverse batch geocoding of exceptionally large files”.
Let’s step back for a moment and establish some basic terminology before we start explaining specifics of how to reverse geocode an exceptionally large file. To begin, let’s define what reverse geocoding is to contemporary, modern users.
Recall that “geocoding” is essentially plugging in data which is generally known as “postal addresses”. Everyone living or working where mail is delivered has some form of an address that postal services use every day to deliver mail. These typically include something like a street number, then street name, city, state/province/district, postal “code”, and country. For example, the address for The White House of the United States is:
1600 Pennsylvania, Avenue, Washington, District of Columbia, 20500, U.S.A.
From this, we can perform a forward geocoding of this address and find that the latitude & longitude of this location is at a latitude of 38.89768 and longitude of -77.03655. Sometimes this is known as its Global Positioning System (GPS) or International Earth Reference System (IERS).
So, what is “reverse geocoding”? Well, basically it’s inputting GPS coordinates of longitude and latitude and being presented a legitimate postal address or a location. At least, the closest postal address to the actual GPS data entered. Reverse geocoding is essential for georeferencing, in which mapping system can be related to actual, earth surface systems (such as roads, waterways, etc.) using GPS-IERS coordinates. Important to note is that a GPS data point is quite literally a single point on the surface of the earth, while actual homes and businesses typically occupy rather large patches on the surface of the earth. Compounding this is the situation of apartment buildings and multi-story buildings. These postal addresses have GPS coordinates that are literally stacked on top of each. We will discuss this more later. But for now, let’s focus on the single surface of the earth without an altitude (height) component.
So how do we accomplish reverse geocoding and interestingly, how can we reverse geocode data from an exceptionally large GPS-IERS data file?
Until recently, without strong computing capability available, this type of data transformation was unavailable to most people. However today, the average business class PC has enough strength while connected well to the internet to make GPS data transformation not just possible, but readily accessible and affordable to almost anyone. There are countless business scenarios when reverse geocoding is handy. For example: delivery trucks making home or business delivery using existing mapping apps discover they are not always accurate. So on occasion, drivers attempting to deliver a packages to a postal address will find their delivery target thousands of feet from the actual expected delivery point. In this case, the driver can simply collect his current GPS coordinates (latitude, longitude) and designate that specific spot as the “true target”. After a while, the company may collect many, many pairs of GPS coordinates. The goal is to align validated (GPS: Postal Addresses) on the map that can correlate GPS coordinates to all “true target” postal addresses. This correlation task is not easy when comes to dealing with exceptionally large files.
In this article, we will demonstrate how the CSV2GEO app can handle an exceptionally large file as an input to reverse geocoding. Feel free to use this or any other available reverse geocoding tool. The method is likely similar.
We recommend 5 steps to be followed in to have safe and sound results:
Step 1.) Build a UTF-8 formatted input file*. Make sure you build your data input file with 8-bit Unicode Transformation Format (UTF-8) encoding. A step by step tutorial of how to prepare your UTF-8 file for geocoding is included in this link.
*Note: this is a critically important processing step. Many users assume ASCII format is fine, and it will deliver some results, but what they don’t realize are the limitations of ASCII format. That is where UTF-8 is critical. UTF-8 format is perfect for batch geocoding and map engines love it.
Step 2.) Identify the cost up-front. A lot depends on your budget for the job. Usually there are few criteria to determine your budget, such as accuracy and relevancy of the final product, volume, time of delivery, etc. CSV2GEO always focuses on highest quality outcome with the most affordable option possible. We offer free introductory trials that are handy when testing products and we also offer very flexible payment options:
a) Pay as you go plan. This is an extremely popular plan and simple: Users can always use the price table to estimate the price for each reverse geocoding job as each job is processed separate. Price tag is also presented to users on the fly when they start batch geocoding/reverse geocoding process. During the process workflow, a registration and payment will be handled with a touch. CSV2GEO never stores credit/debit card information on its own servers. Customer payment information is never stored by CSV2GEO on an account. Users can choose to store their credentials in PayPal’s (for Visa, MasterCard, PayPal, Stripes, etc.). With this option, users can calculate the cost in advance and budget the desired quantity of data use.
b) Monthly basic subscription plan. If a client anticipates long term use for reverse geocoding, a suitable option may be a monthly subscription plan. Customers who don’t know their exact volume needed per month can benefit from the price estimate calculator. If for some reason the monthly credit runs out, the system will auto renew by starting another month.
Monthly subscription plans offer these handy features:
1) Ability to have multiple users in an organization sharing it. Each user will have their own separate account and can use it concurrently and independently to process reverse batch geocode files. This feature is popular with medium to large organizations.
2) Unlimited numbers of map markers displayed on interactive map produced automatically from all reverse geocoding job.
3) Monthly invoices sent automatically to one or more users specified within the organization.
c) Subscribe with Application Program Interface (API). When this option is chosen, instead of encapsulated large files, user can pass constant flow of single reverse rows via API. This option is handy when human involvement is minimal while machines handle processing. This option features returns from each batch of completed collections of map markers displayed on interactive map with huge ability for manipulation. If you currently use systems without this feature, Scale Campaign can help you by integrating your current systems with CSV2GEO.
d) Purchase CSV2GEO user credits. From time to time an organization may have a lead campaign for few weeks that may or may not turn into longer marketing/sale campaign. The volume for reverse geocoding may be large but limited in time. In this case, an appropriate method may be to buy credits based on volume. What’s cool about this option is that credits can be shared between multiple accounts.
Step 3. Prepare a pilot file to test. Based on countless historical experiences with clients processing large files for reverse geocoding, we absolutely recommend users always prepare a small test batch file first. There is a simple way to do that. Copy your complete large .csv file, changing its name and delete everything except the first and last 50 rows. In that way the file will have a consistent header and format, and only 100 rows of “test batch” data. The run test will use a file structure identical to the full and complete large .csv file, so that during the test if any issues are identified they can be resolved quickly and easier in a 100 rows file that 1,000,000 rows file. Use this tutorial on how to process batch of addresses into latitude and longitude for free. After processing the test file, open the resulting output file and examine the following criteria:
a) Did each input row return data back? In normal circumstances, each reverse geocoding pair of latitude and longitude should result of an address or a location as result. When result is missing, user need to look very carefully what is wrong with the row that was given as an input. As a start check if the latitude and longitude are not swapped during the input. Sometimes systems have latitude as first coordinate where other systems have longitude as first coordinate. We also have seen clients mistakenly label one coordinate as the other, either in the file itself or during selection of components selection. The other red flag to look for is if different types of values somehow shifted and ended up inside the latitude and longitude columns. That usually happens when users export data from CRM systems who do not have their own geocoding module. At CSV2GEO we offer Service Level Agreements to build internally such modules in case needed and make the whole process painless.
b) Check the accuracy of the output data. Accuracy/relevance is an important point when we look for the quality of the results. The higher the relevance, the better. The highest number is 1.00. The lowest is 0. using georeferencing on the back and some interpolation, the map engine always tries to be helpful by aligning the best of its abilities to provide perfect match between the input and an existing address/location as an output. For example, if we use latitude = 38.89768 and longitude = 77.03655 we get back 1600 Pennsylvania Ave NW, Washington, DC 20500, United States with relevance 1.00
c) Review the structure of the output file. Examine at the output file to see how it looks. Does it follow the same structure as the input file? CSV2GEO does not manipulate the input file, but instead clones a duplicate and append the results to it. The final version is the input plus the output in the same file. We do that for a reason, but mostly to preserve the authenticity of the input file. Keep in mind CSV2GEO covers all countries worldwide with very exceptions (like North and South Korea, Japan). The tool uses World Geodetic System (WGS-84) as standard.
Step 4. Once you have successfully run the test run, you are ready to run!
There only one small difference how reverse geocoding is run inside CSV2GEO in contrast to normal geocoding. When the data is loaded, make sure to select reverse geocoding from the radio button selection. Note: The DEFAULT setting is for “normal-forward” geocoding. You will want to click the radio button to “Reverse Geocoding” (as displayed below).
Do not hesitate to contact us in case you have questions or you are not sure how to perform some operations. We can provide direct training or do the work for you if you decide to delegate that to us at very affordable rates. We would love to see our clients, partners gain confidence and perform 100% of analytical work on their own using our platform, but we will always make time to assist clients in any way we can. We believe helping our clients help us.
Step 5. Let’s Geocode that actual large file. When working with large files, remember that it takes time to load them into the system and takes time to process them. Please plan your schedule in advance when doing that. At CSV2GEO we work hard to take every measure and evaluate and validate input files for our customers, client, and partners. We also give a user the ability to self-review as we post the results for the first ten records from the file.
During each step of the process the system will indicate the progress of the workflow as percentage. Consider for a batch of 1,000,000 rows with latitudes and longitudes entries, it may take around 120 min to geocode them.
Relevant articles that may help you when you try to reverse geocode a large file: