Normalized Address Tokenazation
When a user needs to run a reverse geocoding in CSV2GEO, there are many options as input data. If the user has a table with many records as a CSV file, it can be browsed into the geocoder in order to obtain addresses. We will use the example of the U.S. Wind Turbine Database for Minnesota. In order to download this data we will use the Minnesota Geospatial Commons website that is available at https://gisdata.mn.gov and we will search for data available in CSV format since this format can be uploaded into CSV2GEO. As soon as we found the CSV file with the wind turbines in Minnesota, we will download it. If the CSV file recording the wind turbines in Minnesota is opened, we can observe that the last two columns are represented by the latitude and longitude of each point of interest.
Accordingly, this CSV file is browsed into the CSV2GEO tool. Moreover, records observed in the CSV file downloaded from the Minnesota Geospatial Commons website are now seen in the table opened in CSV2GEO tool and the user must choose first reverse geocoding as process that will run. Additionally, the columns used as latitude and longitude based on the columns of the uploaded CSV file should be selected. The next step is to click on the “Process data” button and then all the locations are displayed on the map available in the CSV2GEO tool. Moreover, the addresses for all the geographic coordinates from the CSV file are now as postal code addresses and the first few addresses are displayed after the reverse geocoding process is complete. Furthermore, in order to have new addresses into an independent file, the user should get the data as a new CSV file and download it.
However, after the output CSV file is opened, new columns can be observed containing the complete address for each location, but also columns containing each one the street number, street name, city name, state name, country and zip code. Moreover, the output file obtained after the reverse geocoding process contains also a column having all the address components separated by commas in order to highlight the address tokens for each geographic coordinate.