How to Geocode Addresses from Brazil
How to Geocode Addresses from Brazil?
Have you ever wondered how postal addresses are structured around the world? Are they all the same or similar? Is there a stable set of “rules” that you can rely on to help you understand for the purposes of your enterprise? Wherever you grew-up, the system probably seems very straight-forward, reliable, and predictable. We almost all take for granted the structure of an address when we mail something or use it on your cell phone to locate it and find a driving route. You may think the same structure exists in other countries, even if you aren’t familiar with those places. To your surprise that is not always the case. There are many variations in different regions and countries around the world based on their history and their modern efforts to put a rational structure in place to align with digital technology. Let’s look at Brazil.
Brazil is the fifth largest country by territorial area and sixth by population in the world today. Brazil is also the largest economy in South America, not to mention all Latin America. Keeping an orderly mail system in a country of that size requires more than just a traditional approach. It requires a fast and reliable postal service with an orderly structure capable of being integrated into our contemporary digital age, using computers and IT systems to sort and deliver millions of letters and packages daily.
The postal service in Brazil today relies on a common address format/structure that we will examine and explain here. Understanding that structure is also tremendously useful to those wanting to perform geocoding in large batches. If you have a need to batch geocode many addresses in a file to identify their latitude and longitude coordinates, or if you want to reverse geocode from coordinates to recognized, actual postal addresses, you will want to understand their structure to make your effort easier and more accurate.
First, let’s remember that the native language used in Brazil is Portuguese. Like English and Spanish (an many other languages) their words and spelling variations are common in addresses or abbreviations used. When talking about a postal address in any country, we like to speak of “Tokens”. A token is a discreet “chunk” of data that is quite specific. You are naturally aware of tokens, even if this term is unfamiliar in this context. For example: Street Name is a token, Street Number is a token, City, State, Country, and Zip Codes are all tokens in the USA’s postal system. So, in a similar way, the main address tokens for Brazil are listed below. If your input data file is arranged in alignment with Brazil’s structure, then your geocoding request can be easily performed using the csv2geo application by simply identifying the address token in each data column manually.
Consider this example of data structure for a known location in Brazil, the National Institute of Industrial Property Ministry of Economy (INPI) .
The full and formal address is listed as:
Rua Mayrink Veiga, 9, Centro, 22° ndar
CEP 20090-910
Rio de Janeiro, RJ
Brazil
When processing an address from Brazil using csv2geo, it is important to properly stage the secondary address parts in separate columns, so they don’t interfere with the formal address structure. Once they have been properly segregated into their own unique column, we will ignore them in geocoding processing, and they will just be carried along in the output data file without change. These “parts” may be particularly important attributes to you in later data filtering, but they are not important and even worse, confusing to the geocoding algorithm. Once you have your data with the added benefit of the geocoded values of latitude and longitude for each address, you can resume to use those other parts as you please.
Brazilian Address “Tokens” |
CSV2GEO address tokens |
Example |
Number - The most common are |
Street Number |
9 |
Street – The street, or street name is a bit complex and requires particular attention. They can be quite simple or a combination of several parts. The most common address types: Alameda , Avenida (Ave/Avenue), Beco, Caixa Postal (P.O. Box),Vila, etc. However, others do exist. These could include the actual name of place (especially present when a large commercial complex or locally identified Neighborhoods or Condominium/Apartment complex names, Secondary address numbers and letters to identify a building, a wing of a building, a floor of a building, an apartment “unit” in a building, etc. |
Street Name |
Rua Mayrink Veiga |
Neighborhood – not used |
n/a |
n/a |
City – Identify either the short name or full name of the city. In Brazil is common to shortened city names for convenience since some city names are quite long (given from tradition). |
City |
Rio de Janeiro |
State – As most countries, Brazil is divided into a set of 26 states and 1 federal (capitol) district.
AC Acre AL Alagoas AP Amapá AM Amazonas BA Bahía CE Ceará DF Distrito Federal ES Espirito Santo GO Goiás MA Maranhão MT Mato Grosso MS Mato Grosso do Sul MG Minas Geraís PR Paraná PB Paraíba PA Pará PE Pernambuco PI Piauí RN Rio Grande do Norte RS Rio Grande do Sul RJ Rio de Janeiro RO Rondônia RR Roraima SC Santa Catarina SE Sergipe SP São Paulo TO Tocantins |
State
|
RJ
|
|
|
|
Postal Code – The postal code in Brazil is 8 digits, groped first ad 5 digits, then separated by hyphen (dash) and three digits follow.
Formal Structure of Brazilian Postal Code Format: XXXXX-XXX
The first digit is important as it easily identifies the region that the address is from.
0 - São Paulo (metro area only)
In our example, the address is in Rio de Janeiro. Therefore, notice it starts with the number “2”. Also note; the 0090 places our example well inside the “Central City” zone.
Digits 2,3,4 and 5 further define the sub-zones of the city. In the case of cities like Rio de Janeiro the number of sub-zones could be quite large, so 4-digit spaces are provided:
Rio de Janeiro Central City 20000-xxx to *20199-xx 20000 Series of Rio de Janeiro 20200-xxx - Rio Comprido 20397-xxx - Ilha de Paquetá 20500-xxx - Tijuca e Grajaú 20800-xxx - São Cristóvão 21000 Series of Rio de Janeiro 21000-xxx - Rio de Janeiro (Méier) 21200-xxx - Rio de Janeiro (Irajá) 21300-xxx - Madureira 21700-xxx - Anchieta 21800-xxx - Realengo 21900-xxx - Ilha do Governador 22000 Series of Rio de Janeiro 22000-xxx - Copacabana 22200-xxx - Flamengo 22400-xxx - Lagoa 22600-xxx - Barra da Tijuca 22700-xxx - Jacarepagu
Digits 6,7 and 8 further define the exact addresses and pinpoints the very small area within the primary postal code.
|
Zip
|
20090-910
|
Country – Brazil, in this case |
Country |
Brazil |
Again, let’s revisit this example for Brazil’s National Institute of Industrial Property Ministry of Economy (INPI).
Rua Mayrink Veiga, 9, Centro, 22° andar
CEP 20090-910
Rio de Janeiro, RJ
Brazil
Writing the address in the lines order maybe important for postal label but in geocoding tool the order of columns is not as important. If we place the whole address in one line and separate the address lines by comma we get;
Rua Mayrink Veiga, 9, Centro, 22° andar, CEP 20090-910 , Rio de Janeiro, RJ , Brazil
. . . and the preview of the map shows exact location (see the pin map view, below).
The csv2geo system will loads the data and it will appear as sown below (in the viewing window of csv2geo):
(illustration 1 - orange)
Without using any selection, click Process data to see first the results in a preview screen. Up to ten addresses will be previewed together to verify the format of data input as well as a preview of the output to be expected.
(illustration 2 - map)
Ref: Similar article for How to Geocode Addresses from Australia
Posted 2 months ago
Add a comment