Data reclassification¶
Reclassifying data based on specific criteria is a common task when doing GIS analysis. The purpose of this lesson is to see how we can reclassify values based on some criteria which can be whatever, such as:
1. if available space in a pub is less than the space in my wardrobe
AND
2. the temperature outside is warmer than my beer
------------------------------------------------------
IF TRUE: ==> I go and drink my beer outside
IF NOT TRUE: ==> I go and enjoy my beer inside at a table
Even though, the above would be an interesting study case, we will use slightly more traditional cases to learn classifications. We will use Corine land cover layer from year 2012, and a Travel Time Matrix data from Helsinki to classify some features of them based on our own self-made classifier, or using a ready made classifiers that are commonly used e.g. when doing visualizations.
The target in this part of the lesson is to:
classify the lakes into big and small lakes where
- a big lake is a lake that is larger than the average size of all lakes in our study region
- a small lake ^ vice versa
use travel times and distances to find out
- good locations to buy an apartment with good public tranportation accessibility to city center
- but from a bit further away from city center where the prices are lower (or at least we assume so).
use ready made classifiers from pysal -module to classify travel times into multiple classes.
Download data¶
Download (and then extract) the dataset zip-package used during this lesson from this link.
You should have following Shapefiles in the data
folder:
$ cd /home/geo/L4/data
$ ls
Corine2012_Uusimaa.cpg Helsinki_borders.cpg TravelTimes_to_5975375_RailwayStation.dbf
Corine2012_Uusimaa.dbf Helsinki_borders.dbf TravelTimes_to_5975375_RailwayStation.prj
Corine2012_Uusimaa.prj Helsinki_borders.prj TravelTimes_to_5975375_RailwayStation.shp
Corine2012_Uusimaa.shp Helsinki_borders.shp TravelTimes_to_5975375_RailwayStation.shx
Corine2012_Uusimaa.shp.xml Helsinki_borders.shx
Corine2012_Uusimaa.shx TravelTimes_to_5975375_RailwayStation.cpg
Data preparation¶
Before doing any classification, we need to prepare our data a little bit.
Let’s read the data in and select only English columns from it and plot our data so that we can see how it looks like on a map.
import geopandas as gpd
import matplotlib.pyplot as plt
# File path
fp = "/home/data/Corine2012_Uusimaa.shp"
data = gpd.read_file(fp)
Let’s see what we have.
In [1]: data.head(2)
Out[1]:
Level1 Level1Eng Level1Suo Level2 Level2Eng \
0 1 Artificial surfaces Rakennetut alueet 11 Urban fabric
1 1 Artificial surfaces Rakennetut alueet 11 Urban fabric
Level2Suo Level3 Level3Eng \
0 Asuinalueet 112 Discontinuous urban fabric
1 Asuinalueet 112 Discontinuous urban fabric
Level3Suo Luokka3 \
0 Väljästi rakennetut asuinalueet 112
1 Väljästi rakennetut asuinalueet 112
geometry
0 POLYGON ((279500 6640640, 279507.469 6640635.3...
1 POLYGON ((313620 6655820, 313639.8910000001 66...
Let’s select only English columns
# Select only English columns
In [2]: selected_cols = ['Level1', 'Level1Eng', 'Level2', 'Level2Eng', 'Level3', 'Level3Eng', 'Luokka3', 'geometry']
# Select data
In [3]: data = data[selected_cols]
# What are the columns now?
In [4]: data.columns
Out[4]:
Index(['Level1', 'Level1Eng', 'Level2', 'Level2Eng', 'Level3', 'Level3Eng',
'Luokka3', 'geometry'],
dtype='object')
Let’s plot the data and use column ‘Level3’ as our color.
In [5]: data.plot(column='Level3', linewidth=0.05)
Out[5]: <matplotlib.axes._subplots.AxesSubplot at 0x2ce1cba52e8>
# Use tight layout and remove empty whitespace around our map
In [6]: plt.tight_layout()
Let’s see what kind of values we have in ‘Level3Eng’ column.
In [7]: list(data['Level3Eng'].unique())
Out[7]:
['Discontinuous urban fabric',
'Transitional woodland/shrub',
'Non-irrigated arable land',
'Fruit trees and berry plantations',
'Pastures',
'Land principally occupied by agriculture, with significant areas of natural vegetation',
'Bare rock',
'Inland marshes',
'Peatbogs',
'Salt marshes',
'Water courses',
'Water bodies',
'Sea and ocean',
'Industrial or commercial units',
'Road and rail networks and associated land',
'Port areas',
'Airports',
'Mineral extraction sites',
'Broad-leaved forest',
'Dump sites',
'Coniferous forest',
'Construction sites',
'Green urban areas',
'Sport and leisure facilities',
'Mixed forest']
Okey we have plenty of different kind of land covers in our data. Let’s select only lakes from our data. Selecting specific rows from a DataFrame
based on some value(s) is easy to do in Pandas / Geopandas using a specific indexer called .ix[]
, read more from here..
# Select lakes (i.e. 'waterbodies' in the data) and make a proper copy out of our data
In [8]: lakes = data.ix[data['Level3Eng'] == 'Water bodies'].copy()
In [9]: lakes.head(2)