Geocoding in Geopandas¶
It is possible to do geocoding in Geopandas using its integrated
functionalities of geopy. Geopandas has a function called geocode()
that can geocode a list of addresses (strings) and return a GeoDataFrame
containing the resulting point objects in geometry
column. Nice,
isn’t it! Let’s try this out.
Download data¶
For the lesson three download data package from here.
The package contains a text file called addresses.txt
which has a
few addresses around Helsinki Region. The first rows of the data looks
like following:
id;addr
1000;Itämerenkatu 14, 00101 Helsinki, Finland
1001;Kampinkuja 1, 00100 Helsinki, Finland
1002;Kaivokatu 8, 00101 Helsinki, Finland
1003;Hermannin rantatie 1, 00580 Helsinki, Finland
We have an id
for each row and an address on column addr
.
- Let’s first read the data into a Pandas DataFrame using the
read_csv()
-function:
In [2]:
# Import necessary modules
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
# Filepath
fp = "L3_data/addresses.txt"
# Read the data
data = pd.read_csv(fp, sep=';')
In [3]:
# Let's take a look of the data
data.head()
Out[3]:
id | addr | |
---|---|---|
0 | 1000 | Itämerenkatu 14, 00101 Helsinki, Finland |
1 | 1001 | Kampinkuja 1, 00100 Helsinki, Finland |
2 | 1002 | Kaivokatu 8, 00101 Helsinki, Finland |
3 | 1003 | Hermannin rantatie 1, 00580 Helsinki, Finland |
4 | 1005 | Tyynenmerenkatu 9, 00220 Helsinki, Finland |
Now we have our data in a Pandas DataFrame and we can geocode our addresses.
In [8]:
# Import the geocoding tool and geopy
from geopandas.tools import geocode
# Geocode addresses with Nominatim backend
geo = geocode(data['addr'], provider='nominatim', user_agent='csc_user_ht')
geo.head(2)
Out[8]:
geometry | address | |
---|---|---|
0 | POINT (24.9155624 60.1632015) | Ruoholahti, 14, Itämerenkatu, Ruoholahti, Läns... |
1 | POINT (24.9316914 60.1690222) | Kamppi, 1, Kampinkuja, Kamppi, Eteläinen suurp... |
And Voilà! As a result we have a GeoDataFrame that contains our original
address and a ‘geometry’ column containing Shapely Point -objects that
we can use for exporting the addresses to a Shapefile for example.
However, the id
column is not there. Thus, we need to join the
information from data
into our new GeoDataFrame geo
, thus making
a Table Join.
Table join¶
Table joins are really common procedures when doing GIS analyses. As you
might remember from our earlier lessons, combining data from different
tables based on common key
attribute can be done easily in
Pandas/Geopandas using
.merge()
-function.
However, sometimes it is useful to join two tables together based on the
index of those DataFrames. In such case, we assume that there is
same number of records in our DataFrames and that the order of the
records should be the same in both DataFrames. In fact, now we have
such a situation as we are geocoding our addresses where the order of
the geocoded addresses in geo
DataFrame is the same as in our
original data
DataFrame.
Hence, we can join those tables together with join()
-function which
merges the two DataFrames together based on index by default.
In [10]:
join = geo.join(data)
join.head()
Out[10]:
geometry | address | id | addr | |
---|---|---|---|---|
0 | POINT (24.9155624 60.1632015) | Ruoholahti, 14, Itämerenkatu, Ruoholahti, Läns... | 1000 | Itämerenkatu 14, 00101 Helsinki, Finland |
1 | POINT (24.9316914 60.1690222) | Kamppi, 1, Kampinkuja, Kamppi, Eteläinen suurp... | 1001 | Kampinkuja 1, 00100 Helsinki, Finland |
2 | POINT (24.9416849 60.1699637) | Bangkok9, 8, Kaivokatu, Keskusta, Kluuvi, Etel... | 1002 | Kaivokatu 8, 00101 Helsinki, Finland |
3 | POINT (24.9655355 60.2008878) | 1, Hermannin rantatie, Hermanninmäki, Hermanni... | 1003 | Hermannin rantatie 1, 00580 Helsinki, Finland |
4 | POINT (24.9216003 60.1566475) | Hesburger, 9, Tyynenmerenkatu, Jätkäsaari, Län... | 1005 | Tyynenmerenkatu 9, 00220 Helsinki, Finland |
- Let’s also check the data type of our new
join
table.
In [11]:
type(join)
Out[11]:
geopandas.geodataframe.GeoDataFrame
As a result we have a new GeoDataFrame called join
where we now have
all original columns plus a new column for geometry
.
- Now it is easy to save our address points into a Shapefile
In [12]:
# Output file path
outfp = "L3_data/addresses.shp"
# Save to Shapefile
join.to_file(outfp)
That’s it. Now we have successfully geocoded those addresses into Points and made a Shapefile out of them. Easy isn’t it!
Notes about Nominatim¶
Nominatim works relatively nicely if you have well defined and well-known addresses such as the ones that we used in this tutorial. However, in some cases, you might not have such well-defined addresses, and you might have e.g. only the name of a museum available. In such cases, Nominatim might not provide such good results, and in such cases you might want to use e.g. Google Geocoding API (V3). Take a look from past year’s materials where we show how to use Google Geocoding API in a similar manner as we used Nominatim here.