Reading raster files with Rasterio

Rasterio is a highly useful module for raster processing which you can use for reading and writing several different raster formats in Python. Rasterio is based on GDAL and Python automatically registers all known GDAL drivers for reading supported formats when importing the module. Most common file formats include for example TIFF and GeoTIFF, ASCII Grid and Erdas Imagine .img -files.

Landsat 8 bands are stored as separate GeoTIFF -files in the original package. Each band contains information of surface reflectance from different ranges of the electromagnetic spectrum.

Let’s start with inspecting one of the files we downloaded:

In [1]: import rasterio

In [2]: fp = r"C:\HY-DATA\HENTENKA\KOODIT\Opetus\Automating-GIS-processes\Data\Landsat\Helsinki_masked_p188r018_7t20020529_z34__LV-FIN.tif"

# Open the file:
In [3]: raster = rasterio.open(fp)

# Check type of the variable 'raster'
In [4]: type(raster)
Out[4]: rasterio._io.RasterReader

Okey so from here we can see that our raster variable is a rasterio._io.RasterReader type which means that we have opened the file for reading.

Read raster file properties

Let’s have a closer look at the properties of the file:

# Projection
In [5]: raster.crs
Out[5]: CRS({'proj': 'tmerc', 'lat_0': 0, 'lon_0': -183, 'k': 0.9996, 'x_0': 500000, 'y_0': 0, 'datum': 'WGS84', 'units': 'm', 'no_defs': True})

# Affine transform (how raster is scaled, rotated, skewed, and/or translated)
In [6]: raster.transform
Out[6]: [698592.0, 28.5, 0.0, 6697870.5, 0.0, -28.5]

# Dimensions
In [7]: raster.width
Out[7]: 1288

In [8]: raster.height
Out[8]: 1439

# Number of bands
In [9]: raster.count
Out[9]: 7

# Bounds of the file
In [10]: raster.bounds
Out[10]: BoundingBox(left=698592.0, bottom=6656859.0, right=735300.0, top=6697870.5)

# Driver (data format)
In [11]: raster.driver
Out[11]: 'GTiff'

# No data values for all channels
In [12]: raster.nodatavals
Out[12]: (None, None, None, None, None, None, None)

# All Metadata for the whole raster dataset
In [13]: raster.meta
Out[13]: 
{'affine': Affine(28.5, 0.0, 698592.0,
       0.0, -28.5, 6697870.5),
 'count': 7,
 'crs': CRS({'proj': 'tmerc', 'lat_0': 0, 'lon_0': -183, 'k': 0.9996, 'x_0': 500000, 'y_0': 0, 'datum': 'WGS84', 'units': 'm', 'no_defs': True}),
 'driver': 'GTiff',
 'dtype': 'uint8',
 'height': 1439,
 'nodata': None,
 'transform': (698592.0, 28.5, 0.0, 6697870.5, 0.0, -28.5),
 'width': 1288}

Get raster bands

Different bands of a satellite images are often stacked together in one raster dataset. In our case, all seven bands of the Landsat 8 scene are included in our GeoTIFF and the count is hence 7.

In order to have a closer look at the values stored in the band, we will take advantage of the GDAL Band API.

# Read the raster band as separate variable
In [14]: band1 = raster.read(1)

# Check type of the variable 'band'
In [15]: type(band1)
Out[15]: numpy.ndarray

# Data type of the values
In [16]: band1.dtype
Out[16]: dtype('uint8')

Now we have the values of the raster band stored in the variable band1.

Data type of the band can be interpreted with the help of GDAL documentation on Pixel data types. Unsigned integer is always equal or greater than zero and signed integer can store also negative values. For example, an unsigned 16-bit integer can store 2^16 (=65,536) values ranging from 0 to 65,535.

Band statistics

Next, let’s have a look at the values that are stored in the band. As the values of the bands are stored as numpy arrays, it is extremely easy to calculate basic statistics by using basic numpy functions.

# Read all bands
In [17]: array = raster.read()

# Calculate statistics for each band
In [18]: stats = []

In [19]: for band in array:
   ....:     stats.append({
   ....:         'min': band.min(),
   ....:         'mean': band.mean(),
   ....:         'median': np.median(band),
   ....:         'max': band.max()})
   ....: 

In [20]: print(stats)
[{'min': 0, 'mean': 59.631322325286277, 'median': 61.0, 'max': 255}, {'min': 0, 'mean': 43.133428148429509, 'median': 43.0, 'max': 255}, {'min': 0, 'mean': 36.294187755472009, 'median': 31.0, 'max': 255}, {'min': 0, 'mean': 35.094630393777599, 'median': 13.0, 'max': 255}, {'min': 0, 'mean': 37.632635025185706, 'median': 13.0, 'max': 255}, {'min': 0, 'mean': 105.82214777774421, 'median': 114.0, 'max': 175}, {'min': 0, 'mean': 26.283487605695811, 'median': 14.0, 'max': 255}]