Geocoding¶
Overview of Geocoders¶
Geocoding, i.e. converting addresses into coordinates or vice versa, is a really common GIS task. Luckily, in Python there are nice libraries that makes the geocoding really easy. One of the libraries that can do the geocoding for us is geopy that makes it easy to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources.
As said, Geopy uses third-party geocoders - i.e. services that does the geocoding - to locate the addresses and it works with multiple different service providers such as:
- ESRI ArcGIS
- Baidu Maps
- Bing
- geocoder.us
- GeocodeFarm
- GeoNames
- Google Geocoding API (V3)
- IGN France
- Mapquest
- Mapzen Search
- NaviData
- OpenCage
- OpenMapQuest
- Open Street Map Nominatim
- SmartyStreets
- What3words
- Yandex
Thus, there is plenty of geocoders where to choose from! However, to be able to use these services you might need to request so called API access-keys from the service provider to be able to use the service. You can get your access keys to e.g. Google Geocoding API from Google APIs console by creating a Project and enabling a that API from Library. Read a short introduction about using Google API Console from here.
Note
There are also other Python modules in addition to geopy that can do geocoding such as Geocoder.
Geocoding in Geopandas¶
It is possible to do geocoding in Geopandas using its integrated
functionalities of geopy. Geopandas has a function called geocode()
that can geocode a list of addresses (strings) and return a GeoDataFrame
containing the resulting point objects in geometry
column. Nice,
isn’t it! Let’s try this out.
Download a text file called addresses.txt that contains few addresses around Helsinki Region. The first rows of the data looks like following:
id;address
1000;Itämerenkatu 14, 00101 Helsinki, Finland
1001;Kampinkuja 1, 00100 Helsinki, Finland
1002;Kaivokatu 8, 00101 Helsinki, Finland
1003;Hermanstads strandsväg 1, 00580 Helsingfors, Finland
- Let’s first read the data into a Pandas DataFrame using
read_csv()
-function:
# Import necessary modules
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
# Filepath
fp = r"addresses.txt"
# Read the data
data = pd.read_csv(fp, sep=';')
# Let's take a look of the data
In [1]: data.head()
Out[1]:
id address
0 1000 Itämerenkatu 14, 00101 Helsinki, Finland
1 1001 Kampinkuja 1, 00100 Helsinki, Finland
2 1002 Kaivokatu 8, 00101 Helsinki, Finland
3 1003 Hermanstads strandsväg 1, 00580 Helsingfors, F...
4 1004 Itäväylä, 00900 Helsinki, Finland
- Now we have our data in a Pandas DataFrame and we can geocode our addresses
Note
Here we use my API key that has a limitation of 2500 requests / hour. Because of this, only the computer instances of our course environment have access to Google Geocoding API for a short period of time. Thus, the following key will NOT work from your own computer, only from our cloud computers. If you wish, you can create your own API key to Google Geocoding API V3 from Google APIs console. See the notes from above.
# Import the geocoding tool
In [2]: from geopandas.tools import geocode
# Key for our Google Geocoding API
# Notice: only the cloud computers of our course can access and
# successfully execute the following
In [3]: key = 'AIzaSyAwNVHAtkbKlPs-EEs3OYqbnxzaYfDF2_8'
# Geocode addresses
In [4]: geo = geocode(data['address'], api_key=key)
In [5]: geo.head(2)
Out[5]:
address geometry
0 Itämerenkatu 14, 00180 Helsinki, Finland POINT (24.9146767 60.1628658)
1 Kampinkuja 1, 00100 Helsinki, Finland POINT (24.9301701 60.1683731)
And Voilà! As a result we have a GeoDataFrame that contains our original
address and a ‘geometry’ column containing Shapely Point -objects that
we can use for exporting the addresses to a Shapefile for example.
However, the id
column is not there. Thus, we need to join the
information from data
into our new GeoDataFrame geo
, thus making
a Table Join.