Learning Geospatial Analysis with Python
上QQ阅读APP看书,第一时间看更新

Raster data

Raster data consists of rows and columns of cells or pixels, with each cell representing a single value. The easiest way to think of raster data is as images, which is how they are typically represented by software. But raster data sets are not necessarily stored as images. They can also be ASCII text files or Binary Large Objects (BLOBs) in databases.

Another difference between geospatial raster data and regular digital images is resolution. Digital images express resolution as dots-per-inch if printed at full size. Resolution can also be expressed or the total number of pixels in the image defined as megapixels. However, geospatial raster data uses the ground distance each cell represents. For example, a raster data set with two-foot resolution means that a single cell represents two feet on the ground, which also means only objects larger than two feet can be identified visually in the data set.

Raster data sets may contain multiple bands, meaning that different wavelengths of light can be collected at the same time over the same area. Often this range is from 3-7 bands but can be several hundred in hyper-spectral systems. These bands are viewed individually or swapped in and out as the RGB bands of an image. They can also be recombined using mathematics into a derivative single band image and then recolored using a set number of classes representing like-values within the data set.

Another common application of raster data is in the field of Scientific Computing which shares many elements of geospatial remote sensing but adds some interesting twists. Scientific Computing often uses complex raster formats, including NetCDF and GRIB, which store entire data models. Formats like these are more like directories in a file system and can contain multiple data sets or multiple versions of the same data set. Oceanography and meteorology are the most common applications of this kind of analysis. An example of a Scientific Computing data set is the output of a weather model, where the cells of the raster data set in different bands may represent different variables output from the model in a time series.

Like vector data, raster data can come in a variety of formats. The open-source raster library called GDAL, which actually includes the OGR library, mentioned earlier, lists over 130 supported raster formats (http://www.gdal.org/formats_list.html). The FME software package, mentioned earlier, supports that many as well. But just like shapefiles and CAD data there are a few standout raster formats.

TIFF files

The Tagged Image File Format, or TIFF, is the most common geospatial raster format. The TIFF format's flexible tagging system allows it to store any type of data whatsoever, in a single file. TIFFs can contain overview images, multiple bands, integer elevation data, basic metadata, internal compression, and a variety of other data typically stored in additional supporting files by other formats. Anyone can extend the TIFF format unofficially by adding tagged data to the file structure. This extensibility has benefits and drawbacks, however. A TIFF file may work fine in one piece of software but fail when accessed in another, because the two software packages implemented the massive TIFF specification to different degrees. An old joke about TIFFs has a frustrating amount of truth to it: TIFF stands for "Thousands of Incompatible File Formats". The GeoTIFF extension defines how geospatial data is stored. Geospatial rasters stored as TIFF files may have any of the following file extensions .tiff, .tif, .gtif.

JPEG, GIF, BMP, and PNG

These formats are common image formats in general, but can be used for basic geospatial data storage as well. Typically, these formats rely on accompanying supporting text files for georeferencing information to make them compatible with the GIS software.

The JPEG format is also fairly common for geospatial data. JPEGs have a built-in metadata tagging system similar to TIFFs called EXIF. JPEGs are commonly used for geo-tagged photographs in addition to raster GIS layers. Bitmap images (BMP) are used for desktop applications and document graphics. However JPEG, GIF, and PNG are the formats used in web mapping applications.

Compressed formats

Because geospatial rasters tend to be very large, they are often stored using advanced compression techniques. The latest open standard is the JPEG2000 format which is an update of the JPEG format to include wavelet compression and a few other features such as georeferencing data. MrSID (.sid) and ECW (.ecw) are two proprietary wavelet compression formats often seen in geospatial contexts. The TIFF format supports compression including the LZW algorithm. It must be noted that compressed data is suitable as part of a base map but should not be used for remote sensing processing. Compressed images are designed to look visually correct but often alter the original cell value. Lossless compression algorithms try to avoid degrading the source data but it's generally considered a bad idea to attempt spectral analysis on data that has been through compression. The JPEG format is designed to be a lossy format which sacrifices data for smaller file size. It is also commonly encountered, so it is important to remember this fact to avoid invalid results.

ASCII GRIDS

Another means of storing raster data, often elevation data, is in ASCII GRID files. This file format was created by ESRI but has become an unofficial standard supported by most software packages. An ASCII GRID is a simple text file containing (x,y) values as rows and columns. The spatial information for the raster is contained in a simple header. The format of the file is as follows:

<NCOLS xxx>
<NROWS xxx>
<XLLCENTER xxx | XLLCORNER xxx>
<YLLCENTER xxx | YLLCORNER xxx>
<CELLSIZE xxx>
{NODATA_VALUE xxx}
row 1
row 2
.
.
.
row n

While not the most efficient way to store data, ASCII GRID files are very popular because they don't require any special data libraries to create or access geospatial raster data. These files are often distributed as zip files. The header values in the preceding format contain the following information:

  • Number of columns
  • Number of rows
  • X-axis cell center coordinate | X-axis lower-left corner coordinate
  • Y-axis cell center coordinate | Y-axis lower-left corner coordinate
  • Cell size in mapping units
  • No-data value (typically-9999)

World files

World files are simple text files which can provide geospatial referencing information to any image externally for file formats which typically have no native support for spatial information including JPEG, GIF, PNG, and BMP. The world file is recognized by geospatial software due to its naming convention. The most common way to name a world file is to use the raster file name and then alter the extension to remove the middle letter and add w to the end. The following table shows some examples of raster images in different formats and the associated world file name based on the convention:

The structure of a world file is very simple. It is a six-line text file:

  • Line 1: Cell-size along the x axis in ground units
  • Line 2: Rotation on the y axis
  • Line 3: Rotation on the x axis
  • Line 4: Cell-size along the y axis in ground units
  • Line 5: Center x coordinate of the upper left cell
  • Line 6: Center y coordinate of the upper left cell

The following is an example of world file values:

15.0
0.0
0.0
-15.0
-89,38
45.0

The (x,y) coordinates and the (x,y) cell size contained in lines 1,4,5, and 6, allow you to calculate the coordinate of any cell or the distance across a set of cells. The rotation values are important for geospatial software because remotely sensed images are often rotated due to the data collection platform. Rotating the images runs the risk of resampling the data and therefore data loss so the rotation values allow the software to account for the distortion. The surrounding pixels outside the image are typically assigned a "no data" value and represented as the color black. The following image demonstrates image rotation where the satellite collection path is oriented from southeast to northeast but the underlying base map is north up:

Image courtesy of the USGS

World files are a great tool when working with raster data in Python. Most geospatial software and data libraries support world files so they are usually a good choice for georeferencing.

Tip

You'll find that world files are very useful but you use them infrequently enough that you forget what the unlabeled contents represent. A handy quick reference for world files is available here:

http://kralidis.ca/gis/worldfile.htm