Nearest Neighbour Analysis

One commonly used GIS task is to be able to find the nearest neighbour. For instance, you might have a single Point object representing your home location, and then another set of locations representing e.g. public transport stops. Then, quite typical question is “which of the stops is closest one to my home?” This is a typical nearest neighbour analysis, where the aim is to find the closest geometry to another geometry.

In Python this kind of analysis can be done with shapely function called nearest_points() that returns a tuple of the nearest points in the input geometrie.

Nearest point using Shapely

Let’s start by testing how we can find the nearest Point using the nearest_points() function of Shapely.

Let’s create an origin Point and a few destination Points and find out the closest destination.

In [1]: from shapely.geometry import Point, MultiPoint

In [2]: from shapely.ops import nearest_points

In [3]: orig = Point(1, 1.67)

In [4]: dest1, dest2, dest3 = Point(0, 1.45), Point(2, 2), Point(0, 2.5)

To be able to find out the closest destination point from the origin, we need to create a MultiPoint object from the destination points.

In [5]: destinations = MultiPoint([dest1, dest2, dest3])

In [6]: print(destinations)
MULTIPOINT (0 1.45, 2 2, 0 2.5)

Okey, now we can see that all the destination points are represented as a single MultiPoint object.

  • Now we can find out the nearest destination point by using nearest_points() function.
In [7]: nearest_geoms = nearest_points(orig, destinations)

In [8]: near_idx0 = nearest_geoms[0]

In [9]: near_idx1 = nearest_geoms[1]

In [10]: print(nearest_geoms)
(<shapely.geometry.point.Point object at 0x000001C7315DBE10>, <shapely.geometry.point.Point object at 0x000001C731917828>)

In [11]: print(near_idx0)
POINT (1 1.67)

In [12]: print(near_idx1)
POINT (0 1.45)

As you can see the nearest_points() function returns a tuple of geometries where the first item is the geometry of our origin point and the second item (at index 1) is the actual nearest geometry from the destination points. Hence, the closest destination point seems to be the one located at coordinates (0, 1.45).

This is the basic logic how we can find the nearest point from a set of points.

Nearest points using Geopandas

Of course, the previous example is not really useful yet. Hence, next I show, how it is possible to find nearest points from a set of origin points to a set of destination points using GeoDataFrames. If you don’t already have the addresses and PKS_suuralueet.kml datasets, you can find and download them from geocoding and Point in Polygon tutorials.

  • First we need to create a function that takes advantage of the previous function but is tailored to work with two GeoDataFrames.
def nearest(row, geom_union, df1, df2, geom1_col='geometry', geom2_col='geometry', src_column=None):
    """Find the nearest point and return the corresponding value from specified column."""
    # Find the geometry that is closest
    nearest = df2[geom2_col] == nearest_points(row[geom1_col], geom_union)[1]
    # Get the corresponding value from df2 (matching is based on the geometry)
    value = df2[nearest][src_column].get_values()[0]
    return value

Next we read the address data and the Helsinki districts data and find out the closest address to the centroid of each district.

In [7]: import geopandas as gpd

In [8]: fp1 = "/home/geo/PKS_suuralue.kml"
In [9]: fp2 = "/home/geo/addresses.shp"
In [10]: gpd.io.file.fiona.drvsupport.supported_drivers['KML'] = 'rw'

In [11]: df1 = gpd.read_file(fp1, driver='KML')
In [12]: df2 = gpd.read_file(fp2)

Create unary union from Points, which basically creates a MultiPoint object from the Point geometries.

In [13]: unary_union = df2.unary_union

In [14]: print(unary_union)
MULTIPOINT (24.8609335 60.22401389999999, 24.86186 60.2485471, 24.8718598 60.22243630000001, 24.877383 60.240163, 24.8840504 60.2306474, 24.893153 60.2177823, 24.9214046 60.159069, 24.9214846 60.1565781, 24.9250072 60.16500139999999, 24.9301701 60.1683731, 24.9334051 60.1986856, 24.9337569 60.1694809, 24.9418933 60.1698665, 24.94251 60.1711874, 24.9468514 60.1719108, 24.9483202 60.22163339999999, 24.9494874 60.1793862, 24.9609122 60.18789030000001, 24.9670533 60.2291135, 24.9774004 60.18735880000001, 24.9934979 60.2436961, 25.0068399 60.18837519999999, 25.0125655 60.25149829999999, 25.0291078 60.2633799, 25.0291263 60.19413939999999, 25.035855 60.2753891, 25.040583 60.24444239999999, 25.042239 60.2033879, 25.0756547 60.225599, 25.0778094 60.20966609999999, 25.0816923 60.23489060000001, 25.0919641 60.21448089999999, 25.1080054 60.2382054, 25.1204966 60.20548979999999, 25.1424936 60.20751019999999)

Calculate the centroids for each district area.

In [15]: df1['centroid'] = df1.centroid

In [16]: df1.head()
Out[16]: 
               Name Description  \
0  Suur-Espoonlahti               
1    Suur-Kauklahti               
2       Vanha-Espoo               
3     Pohjois-Espoo               
4    Suur-Matinkylä               

                                            geometry  \
0  POLYGON Z ((24.775059677807 60.1090604462157 0...   
1  POLYGON Z ((24.6157775254076 60.1725681273527 ...   
2  POLYGON Z ((24.6757633262026 60.2120070032819 ...   
3  POLYGON Z ((24.767921197401 60.2691954732391 0...   
4  POLYGON Z ((24.7536131356802 60.1663051341717 ...   

                                      centroid  
0   POINT (24.76754037242762 60.0440879200116)  
1  POINT (24.57415010885406 60.19764302289445)  
2  POINT (24.60400724339237 60.25253297356344)  
3  POINT (24.68682879841453 60.30649462398335)  
4  POINT (24.76063843560942 60.15018263640097)  

Okey now we are ready to use our function and find closest Points (taking the value from id column) from df2 to df1 centroids

In [17]: df1['nearest_id'] = df1.apply(nearest, geom_union=unary_union, df1=df1, df2=df2, geom1_col='centroid', src_column='id', axis=1)

In [18]: df1.head(20)
Out[18]: 
                Name Description  \
0   Suur-Espoonlahti               
1     Suur-Kauklahti               
2        Vanha-Espoo               
3      Pohjois-Espoo               
4     Suur-Matinkylä               
5         Kauniainen               
6    Suur-Leppävaara               
7       Suur-Tapiola               
8           Myyrmäki               
9            Kivistö               
10         Eteläinen               
11        Kaakkoinen               
12          Keskinen               
13          Läntinen               
14         Pohjoinen               
15         Koillinen               
16         Aviapolis               
17         Tikkurila               
18         Koivukylä               
19           Itäinen               

                                             geometry  \
0   POLYGON Z ((24.775059677807 60.1090604462157 0...   
1   POLYGON Z ((24.6157775254076 60.1725681273527 ...   
2   POLYGON Z ((24.6757633262026 60.2120070032819 ...   
3   POLYGON Z ((24.767921197401 60.2691954732391 0...   
4   POLYGON Z ((24.7536131356802 60.1663051341717 ...   
5   POLYGON Z ((24.6907528033566 60.2195779731868 ...   
6   POLYGON Z ((24.797472695835 60.2082651196077 0...   
7   POLYGON Z ((24.8443596422129 60.1659790707387 ...   
8   POLYGON Z ((24.8245867448802 60.2902531157585 ...   
9   POLYGON Z ((24.9430919106369 60.3384471629062 ...   
10  POLYGON Z ((24.7827651307035 60.09997268858 0,...   
11  POLYGON Z ((24.8480782099727 60.0275589731893 ...   
12  POLYGON Z ((24.9085548098731 60.2082029641503 ...   
13  POLYGON Z ((24.832174555671 60.2516121985945 0...   
14  POLYGON Z ((24.8992644865152 60.2689368800439 ...   
15  POLYGON Z ((24.9722813313308 60.2432476462193 ...   
16  POLYGON Z ((24.9430919106369 60.3384471629062 ...   
17  POLYGON Z ((24.9764047156358 60.2896890295612 ...   
18  POLYGON Z ((24.9942315864552 60.3329637072809 ...   
19  POLYGON Z ((25.0351655840904 60.23627484214 0,...   

                                       centroid  nearest_id  
0    POINT (24.76754037242762 60.0440879200116)        1005  
1   POINT (24.57415010885406 60.19764302289445)        1020  
2   POINT (24.60400724339237 60.25253297356344)        1017  
3   POINT (24.68682879841453 60.30649462398335)        1017  
4   POINT (24.76063843560942 60.15018263640097)        1020  
5   POINT (24.71357964516679 60.21457067576294)        1020  
6   POINT (24.77910492134015 60.22913609608545)        1020  
7   POINT (24.79937514852226 60.17816655223976)        1020  
8   POINT (24.81763652589348 60.27819504217397)        1017  
9   POINT (24.84180592296876 60.34358057021768)        1017  
10  POINT (24.90837930087519 60.10976339578206)        1005  
11  POINT (25.05325169482274 60.05155324331345)        1029  
12  POINT (24.95489633637751 60.20067297308771)        1027  
13  POINT (24.87614770011878 60.21754287289237)        1010  
14  POINT (24.94156264995636 60.24654213027523)        1014  
15  POINT (25.02148999795968 60.25026309396886)        1015  
16  POINT (24.93554952483983 60.30204064147746)        1018  
17  POINT (25.03931014564627 60.29804037805193)        1008  
18  POINT (25.05766837333244 60.32582581576258)        1008  
19  POINT (25.12590828372607 60.20923259104367)        1035  

That’s it! Now we found the closest point for each centroid and got the id value from our addresses into the df1 GeoDataFrame.