Learning Geospatial Analysis with Python
上QQ阅读APP看书,第一时间看更新

Common vector GIS concepts

This section will discuss different types of GIS processes commonly used in geospatial analysis. This list is not exhaustive; however, it provides the essential operations on which all other operations are based. If you understand these operations you can quickly understand much more complex processes as they are either derivatives or combinations of these processes.

Data structures

GIS vector data uses coordinates consisting of, at a minimum, an x horizontal value and a y vertical value to represent a location on the earth. In many cases a point may also contain a z value. Other ancillary values are possible including measurements or timestamps.

These coordinates are used to form points, lines, and polygons to model real-world objects. Points can be a geometric feature in and of themselves, or they can connect line segments. Closed areas created by line segments are considered polygons. Polygons model objects such as buildings, terrain, or political boundaries.

A GIS feature can consist of a single point, line, or polygon or it can consist of more than one shape. For example, in a GIS polygon data set containing world country boundaries, the Philippines, which is made up of 7,107 islands, would be represented as a single country made up of thousands of polygons.

Vector data typically represents topographic features better than raster data. Vector data has better accuracy potential and is more precise. But vector data is also traditionally more costly to collect on a large scale than raster data.

Two other important terms related to vector data structures are bounding box and convex hull. The bounding box or minimum bounding box is the smallest possible square which contains all of the points in a data set. The following image demonstrates a bounding box for a collection of points:

The convex hull of a data set is similar to the bounding box but instead of a square it is the smallest possible polygon which can contain a data set. The bounding box of a data set always contains its convex hull. The following image shows the same point data as the previous example with the convex hull polygon shown in red:

Buffer

A buffer operation can be applied to spatial objects including points, lines, or polygons. This operation creates a polygon around the object at a specified distance. Buffer operations are used for proximity analysis, for example, establishing a safety zone around a dangerous area. In the following image, the black shapes represent the original geometry while the red outlines represent the larger buffer polygons generated from the original shape:

Dissolve

A dissolve operation creates a single polygon out of adjacent polygons. A common use for a dissolve operation is to merge two adjacent properties in a tax database that have been purchased by a single owner. Dissolves are also used to simplify data extracted from remote sensing.

Generalize

Objects which have more points than necessary for the geospatial model can be generalized to reduce the number of points used to represent the shape. This operation usually requires a few attempts to get the optimal number of points without compromising the overall shape. It is a data optimization technique to simplify data for computing efficiency or better visualization. This technique is useful in web-mapping applications. Computer screens have a resolution of 72 Dots Per Inch (dpi). Highly detailed point data, which would not be visible, can be reduced so less bandwidth is used to send a visually-equivalent map to the user.

Intersection

An intersection operation is used to see if one part of a feature intersects with one or more features. This operation is for spatial queries in proximity analysis and is often a follow-on operation to a buffer analysis.

Merge

A merge operation combines two or more non-overlapping shapes into a single multishape object. Multishape objects mean the shapes maintain separate geometries but are treated as a single feature with a single set of attributes by the GIS.

Point in polygon

A fundamental geospatial operation is checking to see if a point is inside a polygon. This one operation is the atomic building block of many different types of spatial queries. If the point is on the boundary of the polygon it is considered inside. Very few spatial queries exist that do not rely on this calculation in some way. It can be very slow on a large number of points, however.

The most common and efficient algorithm to detect if a point is inside a polygon is called the Ray Casting algorithm. First a test is performed to see if the point is on the polygon boundary. Next the algorithm draws a line from the point in question in a single direction. The program counts the number of times the line crosses the polygon boundary until it reaches the bounding box of the polygon. The bounding box is the smallest box which can be drawn around the entire polygon. If the number is odd, the point is inside. If the number of boundary intersections is even, the point is outside.

Union

The union operation is less common but very useful for combining two or more overlapping polygons into a single shape. It is similar to the dissolve but in this case the polygons are overlapping as opposed to being adjacent. Usually this operation is used to clean up automatically generated feature data sets from remote sensing operations.

Join

A join or SQL join is a database operation used to combine two or more tables of information. Relational databases are designed to avoid storing redundant information for one-to-many relationships. For example, a US state may contain many cities. Rather than creating a table for each state containing all of its cities, a table of states with numeric IDs is created, while a table for all cities in every state is created with a state numeric ID. In a spatial join operation the state and cities tables could be linked by state ID. In a GIS, you can also have spatial joins which are part of the spatial extension software for a database. In spatial joins, combine the attributes to two features the same way you do in an SQL join, but the relation is based on the spatial proximity of the two features. To follow the previous cities example, we could add the county name each city resides in using a spatial join. The cities layer could be loaded over a county polygon layer whose attributes contain the county name. The spatial join would determine which city is in which county and perform an SQL join to add the county name to each city's attribute row.

Geospatial rules about polygons

In geospatial analysis, there are several general rules of thumb regarding polygons which are different from mathematical descriptions of polygons:

  • Polygons must have at least four points (no triangles)
  • A polygon boundary should not overlap itself
  • Polygons within a layer shouldn't overlap
  • A polygon within a layer inside another polygon is considered a hole in the underlying polygon

Different geospatial software packages and libraries handle exceptions to these rules differently and can lead to confusing errors or software behavior. The safest route is to make sure your polygons obey these rules. There is one more important piece of information about polygons. A polygon is by definition a closed shape, meaning the first and last vertices of the polygon are identical. Some geospatial software will throw an error if you haven't explicitly duplicated the first point as the last point in the polygon data set. Other software will automatically close the polygon without complaining. The data format you use to store your geospatial data may also dictate how polygons are defined. This issue is a gray area so it didn't make the polygon rules but knowing this quirk will come in handy someday when you run into an error you can't explain easily.