Advanced plotting with BokehΒΆ

In this part we see how it is possible to visualize any kind of geometries (normal geometries + Multi-geometries) in Bokeh and add a legend into the map which is one of the key elements of a good map.

  • Let’s import the modules and functions that we need
In [1]: from bokeh.palettes import YlOrRd6 as palette

In [2]: from bokeh.plotting import figure, save

In [3]: from bokeh.models import ColumnDataSource, HoverTool, LogColorMapper, GeoJSONDataSource

In [4]: from bokeh.palettes import RdYlGn10 as palette

In [5]: import geopandas as gpd

In [6]: import pysal as ps

In [7]: import numpy as np
  • Let’s use three different layers to produce the map. One of the files (road.shp) is containing MultiLineString

objects. Here we learn how to deal with such geometric objects when plotting bokeh maps.

# Filepaths
fp = r"/home/geo/data/TravelTimes_to_5975375_RailwayStation.shp"
roads_fp = r"/home/geo/data/roads.shp"
metro_fp = r"/home/geo/data/metro.shp"
  • Read the data with Geopandas.
In [8]: data = gpd.read_file(fp)

In [9]: roads = gpd.read_file(roads_fp)

In [10]: metro = gpd.read_file(metro_fp)
  • Ensure that the CRS is the same than in the all layers
In [11]: data['geometry'] = data['geometry'].to_crs(epsg=3067)

In [12]: roads['geometry'] = roads['geometry'].to_crs(epsg=3067)

In [13]: metro['geometry'] = metro['geometry'].to_crs(epsg=3067)

Next, let’s create a set of functions that are used for getting the x and y coordinates of the geometries. Shapefiles etc. can often have Multi-geometries (MultiLineStrings etc.), thus we need to handle those as well which makes things slightly more complicated.


Nowadays, converting GeoDataFrames to a format that Bokeh understands is much easier with dedicated GeoJSONDataSource object than shown in here. Take a look at the example_ at the end of this page.

  • It is always a good practice to slice your functions into small pieces which is what we have done here:
def getXYCoords(geometry, coord_type):
    """ Returns either x or y coordinates from  geometry coordinate sequence. Used with LineString and Polygon geometries."""
    if coord_type == 'x':
        return geometry.coords.xy[0]
    elif coord_type == 'y':
        return geometry.coords.xy[1]

def getPolyCoords(geometry, coord_type):
    """ Returns Coordinates of Polygon using the Exterior of the Polygon."""
    ext = geometry.exterior
    return getXYCoords(ext, coord_type)

def getLineCoords(geometry, coord_type):
    """ Returns Coordinates of Linestring object."""
    return getXYCoords(geometry, coord_type)

def getPointCoords(geometry, coord_type):
    """ Returns Coordinates of Point object."""
    if coord_type == 'x':
        return geometry.x
    elif coord_type == 'y':
        return geometry.y

def multiGeomHandler(multi_geometry, coord_type, geom_type):
    Function for handling multi-geometries. Can be MultiPoint, MultiLineString or MultiPolygon.
    Returns a list of coordinates where all parts of Multi-geometries are merged into a single list.
    Individual geometries are separated with np.nan which is how Bokeh wants them.
    # Bokeh documentation regarding the Multi-geometry issues can be found here (it is an open issue)

    for i, part in enumerate(multi_geometry):
        # On the first part of the Multi-geometry initialize the coord_array (np.array)
        if i == 0:
            if geom_type == "MultiPoint":
                coord_arrays = np.append(getPointCoords(part, coord_type), np.nan)
            elif geom_type == "MultiLineString":
                coord_arrays = np.append(getLineCoords(part, coord_type), np.nan)
            elif geom_type == "MultiPolygon":
                coord_arrays = np.append(getPolyCoords(part, coord_type), np.nan)
            if geom_type == "MultiPoint":
                coord_arrays = np.concatenate([coord_arrays, np.append(getPointCoords(part, coord_type), np.nan)])
            elif geom_type == "MultiLineString":
                coord_arrays = np.concatenate([coord_arrays, np.append(getLineCoords(part, coord_type), np.nan)])
            elif geom_type == "MultiPolygon":
                coord_arrays = np.concatenate([coord_arrays, np.append(getPolyCoords(part, coord_type), np.nan)])

    # Return the coordinates
    return coord_arrays

def getCoords(row, geom_col, coord_type):
    Returns coordinates ('x' or 'y') of a geometry (Point, LineString or Polygon) as a list (if geometry is LineString or Polygon).
    Can handle also MultiGeometries.
    # Get geometry
    geom = row[geom_col]

    # Check the geometry type
    gtype = geom.geom_type

    # "Normal" geometries
    # -------------------

    if gtype == "Point":
        return getPointCoords(geom, coord_type)
    elif gtype == "LineString":
        return list( getLineCoords(geom, coord_type) )
    elif gtype == "Polygon":
        return list( getPolyCoords(geom, coord_type) )

    # Multi geometries
    # ----------------

        return list( multiGeomHandler(geom, coord_type, gtype) )

Now we can apply our functions and calculate the x and y coordinates of any kind of geometry by using the same function, i.e. getCoords().

  • Calculate the x and y coordinates of the grid.
In [14]: data['x'] = data.apply(getCoords, geom_col="geometry", coord_type="x", axis=1)

In [15]: data['y'] = data.apply(getCoords, geom_col="geometry", coord_type="y", axis=1)
  • Calculate the x and y coordinates of the roads (these contain MultiLineStrings).
In [16]: roads['x'] = roads.apply(getCoords, geom_col="geometry", coord_type="x", axis=1)

In [17]: roads['y'] = roads.apply(getCoords, geom_col="geometry", coord_type="y", axis=1)
  • Calculate the x and y coordinates of metro.
In [18]: metro['x'] = metro.apply(getCoords, geom_col="geometry", coord_type="x", axis=1)

In [19]: metro['y'] = metro.apply(getCoords, geom_col="geometry", coord_type="y", axis=1)

Next, we need to classify the travel time values into 5 minute intervals using Pysal’s user defined classifier. We also create legend labels for the classes.

  • First, we need to replace No Data values (-1) with large number (999) so that those values end up in the last class.
In [20]: data = data.replace(-1, 999)

Next, we want to classify the travel times with 5 minute intervals until 200 minutes.

  • Let’s create a list of values where minumum value is 5, maximum value is 200 and step is 5.
In [21]: breaks = [x for x in range(5, 200, 5)]
  • Now we can create a pysal User_Defined classifier and classify our travel time values.
In [22]: classifier = ps.User_Defined.make(bins=breaks)

In [23]: pt_classif = data[['pt_r_tt']].apply(classifier)

In [24]: car_classif = data[['car_r_t']].apply(classifier)
  • Rename the columns of our classified columns.
In [25]: pt_classif.columns = ['pt_r_tt_ud']

In [26]: car_classif.columns = ['car_r_t_ud']
  • Join the classes back to the main data.
In [27]: data = data.join(pt_classif)

In [28]: data = data.join(car_classif)
  • Create names for the legend (until 60 minutes). The following will produce: ["0-5", "5-10", "10-15", ... , "60 <"].
In [29]: upper_limit = 60

In [30]: step = 5

In [31]: names = ["%s-%s " % (x-5, x) for x in range(step, upper_limit, step)]
  • Add legend label for over 60.
In [32]: names.append("%s <" % upper_limit)
  • Assign legend names for the classes.
In [33]: data['label_pt'] = None

In [34]: data['label_car'] = None
  • Update rows with the class-names.
In [35]: for i in range(len(names)):
   ....:     data.loc[data['pt_r_tt_ud'] == i, 'label_pt'] = names[i]
   ....:     data.loc[data['car_r_t_ud'] == i, 'label_car'] = names[i]
  • Update all cells that didn’t get any value with "60 <"
In [36]: data['label_pt'] = data['label_pt'].fillna("%s <" % upper_limit)

In [37]: data['label_car'] = data['label_car'].fillna("%s <" % upper_limit)
  • Finally, we can visualize our layers with Bokeh, add a legend for travel times and add HoverTools for Destination Point and the grid values (travel times).
# Select only necessary columns for our plotting to keep the amount of data minumum
df = data[['x', 'y', 'pt_r_tt_ud', 'pt_r_tt', 'car_r_t', 'from_id', 'label_pt']]
dfsource = ColumnDataSource(data=df)

# Include only coordinates from roads (exclude 'geometry' column)
rdf = roads[['x', 'y']]
rdfsource = ColumnDataSource(data=rdf)

# Include only coordinates from metro (exclude 'geometry' column)
mdf = metro[['x','y']]
mdfsource = ColumnDataSource(data=mdf)

# Specify the tools that we want to use
TOOLS = "pan,wheel_zoom,box_zoom,reset,save"

# Flip the colors in color palette
color_mapper = LogColorMapper(palette=palette)

p = figure(title="Travel times to Helsinki city center by public transportation", tools=TOOLS,
           plot_width=650, plot_height=500, active_scroll = "wheel_zoom" )

# Do not add grid line
p.grid.grid_line_color = None

# Add polygon grid and a legend for it
grid = p.patches('x', 'y', source=dfsource, name="grid",
         fill_color={'field': 'pt_r_tt_ud', 'transform': color_mapper},
         fill_alpha=1.0, line_color="black", line_width=0.03, legend="label_pt")

# Add roads
r = p.multi_line('x', 'y', source=rdfsource, color="grey")

# Add metro
m = p.multi_line('x', 'y', source=mdfsource, color="red")

# Modify legend location
p.legend.location = "top_right"
p.legend.orientation = "vertical"

# Insert a circle on top of the Central Railway Station (coords in EurefFIN-TM35FIN)
station_x = 385752.214
station_y =  6672143.803
circle =[station_x], y=[station_y], name="point", size=6, color="yellow")

# Add two separate hover tools for the data
phover = HoverTool(renderers=[circle])
phover.tooltips=[("Destination", "Railway Station")]

ghover = HoverTool(renderers=[grid])
ghover.tooltips=[("YKR-ID", "@from_id"),
                ("PT time", "@pt_r_tt"),
                ("Car time", "@car_r_t"),


# Output filepath to HTML
output_file = r"/home/geo/accessibility_map_Helsinki.html"

# Save the map
save(p, output_file);


Important update! Bokeh nowadays support GeoJSONDataSource that makes it much easier to work and plot data from GeoDataFrame. Take a look of the following .. _example:

import geopandas as gpd
from bokeh.plotting import save, figure
from bokeh.models import GeoJSONDataSource

addresses_fp = r'/home/geo/addresses.shp'
roads_fp = r'/home/geo/roads.shp'

# Read the data
addresses = gpd.read_file(addresses_fp)
roads = gpd.read_file(roads_fp)

# Reproject to the same projection
addresses = addresses.to_crs(crs=CRS)

# Convert GeoDataFrames into GeoJSONDataSource objects (similar to ColumnDataSource)
point_source = GeoJSONDataSource(geojson=addresses.to_json())
roads_source = GeoJSONDataSource(geojson=roads.to_json())

# Initialize our plot figure
p = figure(title="A test map")

# Add the lines to the map from our GeoJSONDataSource -object (it is important to specify the columns as 'xs' and 'ys')
p.multi_line('xs', 'ys', source=roads_source, color='gray', line_width=3)

# Add the lines to the map from our 'msource' ColumnDataSource -object'x', 'y', source=point_source, color='black', size=6)

# Output filepath
outfp = r"C:\HY-DATA\HENTENKA\KOODIT\Opetus\Automating-GIS-processes\Test\Martta_Ex5\test_map.html"

# Save the map
save(p, outfp)