Tuesday, February 10, 2015

Reading Raster Data with Python and gdal

I am trying to learn Python for Geoprocessing. Here are some very basic notes on "playing" with Python/gdal/ogr. The online documentation is, I must say, rather confusing, so I try it step by step at the command line as below.

I follow the very useful lecture notes at http://www.gis.usu.edu/~chrisg/python/2009/ and the tutorial at http://www.gdal.org/gdal_tutorial.html

Importing both gdal and gdalconst; just "import gdalconst" does not work (and I dont understand why right now...):

    >>> import gdal
    >>> from gdalconst import *


Defining the filename (assuming its located in the current working directory of python -- use os.getced and os.chdir):

    >>> filename = ERS1PRI_19920430o04133tr481fr1989_AppOrb_Calib_Spk_SarsimTC_LinDB.tif

Now the file can be opened, the driver only needs to be imported for write-access if I understand correctly:

    >>> dataset = gdal.Open(filename, GA_ReadOnly)

Typing "dataset" shows the pointer to the opened file:

    >>> dataset
    <osgeo.gdal.Dataset; proxy of <Swig Object of type GDALDatasetShadow * at 0x0000000003550EA0> >
>>>


Various information can be retrieved from the opened file:

    >>> cols = dataset.RasterXSize
    >>> rows = dataset.RasterYSize
    >>> bands = dataset.RasterCount
    >>> driver = dataset.GetDriver().LongName

    >>> cols
    7257
    >>> rows
    7226
    >>> bands
    1
    >>>driver

    GeoTIFF
    >>>

Geoinformation can be retrieved with GetGeoTransform().

>>> geotransform = dataset.GetGeoTransform()

The variable "geotransform" now contains a list with Geoinformation:

    >>> geotransform
    (368745.92379062285, 20.0, 0.0, 8828671.611738198, 0.0, -20.0) 


The answer to what these values mean are found in the documentation:

    adfGeoTransform[0] /* top left x */
    adfGeoTransform[1] /* w-e pixel resolution */
    adfGeoTransform[2] /* rotation, 0 if image is "north up" */
    adfGeoTransform[3] /* top left y */
    adfGeoTransform[4] /* rotation, 0 if image is "north up" */
    adfGeoTransform[5] /* n-s pixel resolution */


and one can retrieve a single value from this list for example with

    >>> originX = geotransform[0]
    >>> originY = geotransform[3]
    >>> pixelWidth = geotransform[1]
    >>> pixelHeight = geotransform[5]
    >>> originX
    368745.92379062285
    >>> originY
    8828671.611738198
    >>> pixelWidth
    20.0
    >>> pixelHeight
    -20.0 


But how to get the individual data values in the file?

Get the band and read the first line:

    >>> band = dataset.GetRasterBand(1)

    >>> bandtype = gdal.GetDataTypeName(band.DataType)
    >>> bandtype
    Float32
    >>> scanline = band.ReadRaster( 0, 0, band.XSize, 1,band.XSize, 1, band.DataType)




Since I was not sure what the "ReadRaster" parameters meant, google led me to this useful page:

The ReadRaster() call has the arguments: def ReadRaster(self, xoff, yoff, xsize, ysize, buf_xsize = None, buf_ysize = None, buf_type = None, band_list = None ): The xoff, yoff, xsize, ysize parameter define the rectangle on the raster file to read. The buf_xsize, buf_ysize values are the size of the resulting buffer. So you might say "0,0,512,512,100,100" to read a 512x512 block at the top left of the image into a 100x100 buffer (downsampling the image).
which I found here

Typing "scanline" gives me long lines of this:

    >>> scanline
    `Bxa2x8d`Bxa2x8d`Bxa2x8d`Bxa2x8d`Bxa2x8d`Bxa2x8d`Bxa2x8d`Bxa2x8d`Bxa2x8d`.........


Note that the returned scanline is of type string, and contains xsize*4 bytes of raw binary floating point data. To convert this into readable values use struct.unpack and instead you get long lines of float numbers:

    >>> import struct
    >>> value = struct.unpack(f * band.XSize, scanline)
    >>> value
    (-1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, -1.0000000031710769e-30, ....


Now I can get individual values, but all are in "one line" and not in an array:

     >>> value[8]
    -1.0000000031710769e-30


I rather read the whole file into an array:

    >>>data = band.ReadAsArray(0, 0, cols, rows)
    >>> value = data[3500,4000]
    >>> value
    -8.30476 


Using the numpy library I can define the datatype in the array:
   >>> import numpy
   >>> data = band.ReadAsArray(0, 0, dataset.RasterXSize, dataset.RasterYSize).astype(numpy.float)
   >>> value = data[3500,4000]
   >>> value
   -8.3047599792480469
   >>>


One has  to be careful not confusing column and rows! Matrix is value = data[row, column], and it starts with 0, so the value -8.30476 is located at y=row=3501 and x=column=4001.

To be continued....

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.