Geocoding¶
Overview of Geocoders¶
Geocoding, i.e. converting addresses into coordinates or vice versa, is a really common GIS task. Luckily, in Python there are nice libraries that makes the geocoding really easy. One of the libraries that can do the geocoding for us is geopy that makes it easy to locate the coordinates of addresses, cities, countries, and landmarks across the globe using third-party geocoders and other data sources.
As said, Geopy uses third-party geocoders - i.e. services that does the geocoding - to locate the addresses and it works with multiple different service providers such as:
- ESRI ArcGIS
- Baidu Maps
- Bing
- geocoder.us
- GeocodeFarm
- GeoNames
- Google Geocoding API (V3)
- IGN France
- Mapquest
- Mapzen Search
- NaviData
- OpenCage
- OpenMapQuest
- Open Street Map Nominatim
- SmartyStreets
- What3words
- Yandex
Thus, there are plenty of geocoders where to choose from! However, for most of these services you might need to request so called API access-keys from the service provider to be able to use the service.
Luckily, Nominatim, which is a geocoder based on OpenStreetMap data does not require a API key to use their service if it is used for small scale geocoding jobs as the service is rate-limited to 1 request per second (3600 / hour). As we are only making a small set of queries, we can do the geocoding by using Nominatim.
Note
- Note 1: If you need to do larger scale geocoding jobs, use and request an API key to some of the geocoders listed above.
- Note 2: There are also other Python modules in addition to geopy that can do geocoding such as Geocoder.
Hint
You can get your access keys to e.g. Google Geocoding API from Google APIs console by creating a Project and enabling a that API from Library. Read a short introduction about using Google API Console from here.
Geocoding in Geopandas¶
It is possible to do geocoding in Geopandas using its integrated
functionalities of geopy. Geopandas has a function called geocode()
that can geocode a list of addresses (strings) and return a GeoDataFrame
containing the resulting point objects in geometry
column. Nice,
isn’t it! Let’s try this out.
Download a text file called addresses.txt that contains few addresses around Helsinki Region. The first rows of the data looks like following:
id;addr
1000;Itämerenkatu 14, 00101 Helsinki, Finland
1001;Kampinkuja 1, 00100 Helsinki, Finland
1002;Kaivokatu 8, 00101 Helsinki, Finland
1003;Hermannin rantatie 1, 00580 Helsinki, Finland
We have an id
for each row and an address on column addr
.
- Let’s first read the data into a Pandas DataFrame using
read_csv()
-function:
# Import necessary modules
import pandas as pd
import geopandas as gpd
from shapely.geometry import Point
# Filepath
fp = r"addresses.txt"
# Read the data
data = pd.read_csv(fp, sep=';')
# Let's take a look of the data
In [1]: data.head()
Out[1]:
id addr
0 1000 Itämerenkatu 14, 00101 Helsinki, Finland
1 1001 Kampinkuja 1, 00100 Helsinki, Finland
2 1002 Kaivokatu 8, 00101 Helsinki, Finland
3 1003 Hermannin rantatie 1, 00580 Helsinki, Finland
4 1005 Tyynenmerenkatu 9, 00220 Helsinki, Finland
Now we have our data in a Pandas DataFrame and we can geocode our addresses.
- Let’s
# Import the geocoding tool
In [2]: from geopandas.tools import geocode
# Geocode addresses with Nominatim backend
In [3]: geo = geocode(data['addr'], provider='nominatim')
---------------------------------------------------------------------------
timeout Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
1317 h.request(req.get_method(), req.selector, req.data, headers,
-> 1318 encode_chunked=req.has_header('Transfer-encoding'))
1319 except OSError as err: # timeout error
C:\ProgramData\Anaconda3\lib\http\client.py in request(self, method, url, body, headers, encode_chunked)
1238 """Send a complete request to the server."""
-> 1239 self._send_request(method, url, body, headers, encode_chunked)
1240
C:\ProgramData\Anaconda3\lib\http\client.py in _send_request(self, method, url, body, headers, encode_chunked)
1284 body = _encode(body, 'body')
-> 1285 self.endheaders(body, encode_chunked=encode_chunked)
1286
C:\ProgramData\Anaconda3\lib\http\client.py in endheaders(self, message_body, encode_chunked)
1233 raise CannotSendHeader()
-> 1234 self._send_output(message_body, encode_chunked=encode_chunked)
1235
C:\ProgramData\Anaconda3\lib\http\client.py in _send_output(self, message_body, encode_chunked)
1025 del self._buffer[:]
-> 1026 self.send(msg)
1027
C:\ProgramData\Anaconda3\lib\http\client.py in send(self, data)
963 if self.auto_open:
--> 964 self.connect()
965 else:
C:\ProgramData\Anaconda3\lib\http\client.py in connect(self)
1399 self.sock = self._context.wrap_socket(self.sock,
-> 1400 server_hostname=server_hostname)
1401 if not self._context.check_hostname and self._check_hostname:
C:\ProgramData\Anaconda3\lib\ssl.py in wrap_socket(self, sock, server_side, do_handshake_on_connect, suppress_ragged_eofs, server_hostname, session)
400 server_hostname=server_hostname,
--> 401 _context=self, _session=session)
402
C:\ProgramData\Anaconda3\lib\ssl.py in __init__(self, sock, keyfile, certfile, server_side, cert_reqs, ssl_version, ca_certs, do_handshake_on_connect, family, type, proto, fileno, suppress_ragged_eofs, npn_protocols, ciphers, server_hostname, _context, _session)
807 raise ValueError("do_handshake_on_connect should not be specified for non-blocking sockets")
--> 808 self.do_handshake()
809
C:\ProgramData\Anaconda3\lib\ssl.py in do_handshake(self, block)
1060 self.settimeout(None)
-> 1061 self._sslobj.do_handshake()
1062 finally:
C:\ProgramData\Anaconda3\lib\ssl.py in do_handshake(self)
682 """Start the SSL/TLS handshake."""
--> 683 self._sslobj.do_handshake()
684 if self.context.check_hostname:
timeout: _ssl.c:733: The handshake operation timed out
During handling of the above exception, another exception occurred:
URLError Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\geopy\geocoders\base.py in _call_geocoder(self, url, timeout, raw, requester, deserializer, **kwargs)
142 try:
--> 143 page = requester(req, timeout=(timeout or self.timeout), **kwargs)
144 except Exception as error: # pylint: disable=W0703
C:\ProgramData\Anaconda3\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
222 opener = _opener
--> 223 return opener.open(url, data, timeout)
224
C:\ProgramData\Anaconda3\lib\urllib\request.py in open(self, fullurl, data, timeout)
525
--> 526 response = self._open(req, data)
527
C:\ProgramData\Anaconda3\lib\urllib\request.py in _open(self, req, data)
543 result = self._call_chain(self.handle_open, protocol, protocol +
--> 544 '_open', req)
545 if result:
C:\ProgramData\Anaconda3\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
503 func = getattr(handler, meth_name)
--> 504 result = func(*args)
505 if result is not None:
C:\ProgramData\Anaconda3\lib\urllib\request.py in https_open(self, req)
1360 return self.do_open(http.client.HTTPSConnection, req,
-> 1361 context=self._context, check_hostname=self._check_hostname)
1362
C:\ProgramData\Anaconda3\lib\urllib\request.py in do_open(self, http_class, req, **http_conn_args)
1319 except OSError as err: # timeout error
-> 1320 raise URLError(err)
1321 r = h.getresponse()
URLError: <urlopen error _ssl.c:733: The handshake operation timed out>
During handling of the above exception, another exception occurred:
GeocoderTimedOut Traceback (most recent call last)
<ipython-input-3-ba8493af24dd> in <module>()
----> 1 geo = geocode(data['addr'], provider='nominatim')
C:\ProgramData\Anaconda3\lib\site-packages\geopandas\tools\geocoding.py in geocode(strings, provider, **kwargs)
60
61 """
---> 62 return _query(strings, True, provider, **kwargs)
63
64
C:\ProgramData\Anaconda3\lib\site-packages\geopandas\tools\geocoding.py in _query(data, forward, provider, **kwargs)
136 try:
137 if forward:
--> 138 results[i] = coder.geocode(s)
139 else:
140 results[i] = coder.reverse((s.y, s.x), exactly_one=True)
C:\ProgramData\Anaconda3\lib\site-packages\geopy\geocoders\osm.py in geocode(self, query, exactly_one, timeout, addressdetails, language, geometry)
191 logger.debug("%s.geocode: %s", self.__class__.__name__, url)
192 return self._parse_json(
--> 193 self._call_geocoder(url, timeout=timeout), exactly_one
194 )
195
C:\ProgramData\Anaconda3\lib\site-packages\geopy\geocoders\base.py in _call_geocoder(self, url, timeout, raw, requester, deserializer, **kwargs)
161 elif isinstance(error, URLError):
162 if "timed out" in message:
--> 163 raise GeocoderTimedOut('Service timed out')
164 elif "unreachable" in message:
165 raise GeocoderUnavailable('Service not available')
GeocoderTimedOut: Service timed out
In [4]: geo.head(2)