Feature 4 – Address Parsing

An address usually contains the name and number of the street, city or place name, abbreviation for the state, a zip postal code and the country. All these elements (tokens) are separated by a comma in order to identify which one of these features represents each element of an address. When dealing with a large amount of addresses and all these addresses are recorded in CSV files, it is time-consuming and a challenging task to verify if all the addresses are correctly typed or the format of all these addresses is correct.

The good news is that at CSV2GEO address parsing is an automatic process and the user should not worry about those situations when the comma is missing from the address in order to separate the tokens forming an address. The CSV2GEO tool is ready to provide the correct coordinates even from the address where is missing one comma or if the entire address is a long list of names and numbers without any comma separating them.

In order to understand better the advantage of address parsing feature available while using CSV2GEO tool, we will take an example for the California Academy of Sciences in San Francisco. The address for this location is “55 Music Concourse Dr, San Francisco, CA 94118, United States”. In the geocoding tool, we will introduce first the address without any comma separating address tokens. Consequently, the address will become “55 Music Concourse Dr San Francisco CA 94118 USA”. Even if the address tokens are not separated by any comma CSV2GEO tool provides the correct geographic coordinates for this location. After the geocoding process is complete, the data can be downloaded as a CSV file and then this file can be opened in GIS software. However, when the CSV file is opened in GIS software it can be noticed that the address is distributed in one field as it was introduced in geocoding batch tool, but also as separate address tokens in more fields, each field being attributed for each address token.

For the same address we will use another example having commas after each address token and the abbreviation for California will be changed with the full name of the state. Consequently, the address becomes “55, Music Concourse Dr, San Francisco, California, 94118, USA”. The CSV2GEO tool identifies that each address token represents something else and due to this the table before processing the geocoding step has 6 fields as can be noticed in the image below. When the data is downloaded as a CSV file and opened in GIS software, the address is again distributed as both separated fields for each address token and one field with the entire address.

Another version of the address for this location could be “55 Music Concourse Dr, San Francisco CA 94118, United States”, with only two commas separating some of the address tokens. However, even in this case the CSV2GEO tool will provide the correct geographic coordinates for the California Academy of Sciences in San Francisco.

The address parsing feature offers this important advantage of providing the coordinates for a certain location even if the address is written as more segments of address tokens or only as one line. The system manages to parse addresses both as a single column or as groups of address tokens and finally the system processes each token in order to provide the correct coordinates for a desired location.