Road extraction from digital images using linear programming and cost functions

Venkat Chalasani and Peter Beling
University of Virginia, ATTN: Systems Engineering, Olsson Hall, Charlottesville VA 22903 U.S.A.
E-mail: vsc4d@virginia.edu, beling@virginia.edu

Abstract

In this paper we propose a two-step process for extraction of roads from digital images, and demonstrate the results on an Airborne Visible Infra Red Imaging Spectrometer (AVIRIS) image of an area near Williamsburg, Virginia. The first step in our methodology involves treating the recorded intensity values for each wavelength band in the image as an attribute and using classification algorithms to identify potential road pixels. We develop and use a decision tree classifier in which each branching decision is constructed by solving a linear program. For comparison, we also test a nearest neighbor-based approach in which feature selection is first performed using linear programming. As a second step, certain properties of roads, such as their elongation and contrast to adjoining pixels, and the grayness measure, are used to create cost functions. The function has a lower value if the configuration of pixels is closer to what we would expect from a road, and is higher otherwise. We then use dynamic programming to trace a minimum cost path through the pixels of the image produced by classification. We show that this approach is effective in tracing roads in a road network.

1. Introduction

Remote sensing data of the Earth's surface is readily available in digital format. The data consists of a series of images of a particular part of the Earth's surface taken from a number of sensors working in different wavelength ranges. Some images are measurements of reflected solar radiation in the ultraviolet, visible and near to middle infrared range of wavelengths, while others measure infrared light emitted by the Earth itself. These images are typically referred to as bands and are often treated together as one image. Each band consists of a set of discrete picture elements or pixels, each of which represents a certain amount of ground area on the Earth's surface. The electro-magnetic radiation received from the area by the specific sensor is captured as the brightness value of the pixel of the specific band. Depending on the context we will refer to a single band or the complete set of bands as the image. These data can be used for identifying certain features of interest in the image such as coast-lines, specific crop areas, wetlands and highways.

To identify a feature of interest we not only have to classify individual pixels as belonging to a specific class such as road, vegetation or water but also identify a set of such pixels as a part of the feature. The problem of assigning an individual pixel a class membership will be referred to as classification, and the process of assigning a set of pixels to a feature will be referred to as feature extraction. During the process of feature extraction certain additional known properties of the feature can be used to reassign the class membership. There has been substantial research done in the past to develop algorithms which identify certain specific kinds of features. The geometrical, radiological or other properties of the specific feature allow the algorithm to be fine tuned to the task at hand.

Roads for example are geometrically elongated, have a maximum curvature, and radiologically have a homogeneous surface and a good contrast to adjacent areas (11). Many road tracing algorithms including edge detection algorithms and mask based approaches depend on these properties for tracking objects. The techniques which track objects by precise outline determination can be referred to as type II operators (7). Other examples are dynamic programming based segment linking systems and knowledge-based systems. These algorithms however do not focus on classification and (with the exception of knowledge-based systems) can only effectively use a single band from an image.

In this paper we propose a two-step process for extraction of roads and demonstrate the results on an Airborne Visible Infra Red Imaging Spectrometer (AVIRIS) image of an area near Williamsburg, Virginia. The first step in our methodology is a classification step in which the bands of the image are treated as attributes. We create a training set by manually identifying certain coordinates as belonging to a specific class, such as road or soil, and then list the attributes for the specific coordinates obtained from the bands of the image. We use the term coordinates to refer to a specific location on a band with reference to the left top corner. We use linear programming to create a discriminating surface between different classes. The linear program is formulated to find a surface that best separates classes. By separating classes one at a time, we can construct a decision tree that can be used to uniquely classify each pixel in the image. Thus our method works like a type I operator which involves classification of individual pixels but is not concerned with precise outline determination.

We use distance to discriminate surfaces for each pixel to assign it a probability measure of being a road, which is then converted to a gray scale value and is used to rebuild the image with only roads. We also show results of a nearest neighbor-based approach for gray scale image creation using linear programming for feature selection. These steps give us a single band image which can be used for road tracing. We demonstrate the use of one such tracing technique, which is based on cost functions and dynamic programming. In the tracing algorithm, certain properties of roads, such as their elongation, contrast to adjoining pixels, and gray scale value, are used to create a cost value. This cost value is low if the configuration of pixels is close to what we would expect from a road, and is high otherwise. We then use dynamic programming to trace a minimum-cost path through the pixels of the image produced by classification. We show that this approach is effective in tracing the roads from the AVIRIS image.

The remainder of the paper is organized as follows: In Section 2, we describe the creation of data sets. In Section 3, we describe our main classification method, which uses linear programming (LP) to build decision trees for classification. In Section 4, we explain how we can carry out feature selection using LP and show how we can use it in our decision tree. In Section 5, we report the results of a nearest neighbor algorithm using features suggested by the LP decision tree. In Section 6 we provide the results of a road tracing algorithm and finally in Section 7, we offer some conclusions and suggestions for further research.

2. Training and test data

Classification is a supervised learning process. Given a set of n measurement vectors in f-dimensional Euclidean space with class membership assigned to each vector from a set of classes, a classification algorithm partitions the Euclidean space so that a new measurement vector in the space without class label can be assigned a class label depending on its location in the Euclidean space. The coordinates of the measurement vector are commonly referred to as features. For example, in the case of heart attack data, x₁ could refer to a patient's blood pressure, x₂ to the patient's cholesterol level, and so on. In the case of remote sensing data coordinates correspond logically to the brightness values associated with each of the bands taken in a specific order. We will refer to the f-dimensional Euclidean space as the feature space and a measurement vector x as a data point. The set of n points with class label known is generally referred to as the data set and is frequently partitioned into a training data set and a test data set. In our case, however, the complete data set was treated as a training data set and testing was carried out by classifying the complete image. The results were evaluated by comparing it with an actual map of the area.

Our training data set was taken from an AVIRIS image of an area south of Williamsburg, Virginia. This image consists of pixels. For each pixel, image intensity is recorded to a precision of 16 bits for each of 224 bands of the electro-magnetic spectrum. The ground resolution of the image is 17m. To enhance the contrast between different features of the image as part of a preliminary visual inspection, we extracted bands 20, 50 and 70 from the AVIRIS image and used them as the red, green, and blue bands of the false-color image shown in Figure 1. Certain features are apparent from the false-color image and from the map shown in Figure 2. One of the major roads is the interstate highway I-64, which runs approximately vertically in the center of the image. The large body of water which is visible to the left is the James River. Other major features include areas of development, such as the portion of Williamsburg seen at the top center of the image, and the large tracks of vegetation or bare soil seen to the right in the image. It seems reasonable, then, to identify four potential classes for each pixel: road, water, soil, or developed area.

Our training set consisted of approximately 150 road points, 400 water points, 1200 soil points, and 200 points of developed area. We created the training data set by manually examining pixels on the false-color image. For each pixel, we recorded the pixel location, associated intensity data for each of the 224 bands in the AVIRIS image, and the most likely class membership for the pixel based on our visual inspection. This procedure is somewhat tedious in general, but often it is possible to identify blocks of pixels that clearly belong to the same class.

3. Trees and classifiers based on linear programming

Binary tree classifiers create a classification tree by repeatedly splitting, or partitioning, the feature space. Most classifier splits take the form of hyper-planes that separate the feature space into two regions, one on each side of the plane. One of the most widely used algorithms, Classification and Regression Trees (CART), employs linear univariate splits to partition the feature space. For every attribute, the CART technique considers a number of partitions between two consecutive data points and subsequently chooses the split that maximizes the increase in ``purity'' of the descendent subsets as compared to the parent subset. A subset is considered ``purer'' than another if it contains data points that are distributed among lesser number of classes than its parent. For example, two subsets each of which contains elements of only one class are ``purer'' than a subset which contains elements of more than one class. In general, CART uses either the ``gini'' criterion or the ``twoing'' criterion to assess the purity of the descendent subsets (2).

For binary tree structures we use the term open node or decision node to indicate a point in a tree where a split or branch is possible. In the case where there are no outgoing branches from a node, the point in the tree is referred to as a terminal node or closed node. The assignment of class to a new point is a simple matter of tracing the point through the tree until the process reaches a node that does not have descendant subsets (i.e., a terminal or closed node). The new point is assigned the class corresponding to the terminal node. For our application, one can define an additional measure that is assigned to each pixel classified as road. This measure, which is obtained by considering the fraction of road pixels to total pixels at each terminal node, represents a probabilistic assessment of the likelihood that the pixel is a road.

Because binary trees grown by CART use only univariate splits, they are relatively bushy (i.e., they contain many branches and terminal nodes). Further, the images created as a result of CART classification also show substantial noise (i.e., they contain many isolated pixels incorrectly classifies as roads). Work by Brown and Pittard (3) shows that classification using multivariate splits, on the other hand, decreases the total number of partitions required in the tree and enhances classification accuracy. One approach to creating multivariate splits is to employ linear programming. The earliest linear programming literature includes a convex hull approach by Kendell (8). Freed and Glover (5) formalized the convex hull procedure to solve a discrimination problem. Further, Glover (6) discusses the issue of a LP null solution, a solution in which the weights associated with the features are all 0, and Bennett (1) provides an elegant formulation which prevents the null solution from occurring. Most of the literature, however, focuses on a two-class separation problem and does not adequately address the issue of multi-class separation. In the remainder of this section we explain how we use a linear programming approach to create multivariate splits and we provide an algorithm to grow a decision tree.

To describe our methodology, let us assume that we have C classes labeled . Let n_k denote the number of data points associated with class k in the training set. If there are B bands in the image, the ith data point associated with the kth class in the training set may be described as the collection of intensity values X^k_ij for . Hence, the dimensionality of the feature space is B.

We formulate a linear program that attempts to separate a set of points belonging to one class (say class 1) from a set of points belonging to another class (say class 2).

The variables in this formulation are w_i, y_I, z_i, and . The variables w_i define a hyper-plane that separates (in feature space) the observations of class 1 from those of class 2. If the two classes cannot be cleanly separated some of the variables y_i, z_i may be forced to take on positive values. The objective is to minimize a weighted average of the errors. The particular choice of weights shown above has the interesting property that it prevents the null hyper-plane w_j = 0, , from being selected as the optimal solution (1).

Given the above linear programming technique for creating splits, there are various methodologies that can be used to grow a decision tree for classification. Park (9) also uses linear programming for creating splits. He proposes evaluating multiple splits at each step of the tree growing process, where each split attempts to separate one class from all other classes. This gives C possible splits, where C is the number of classes. The class whose split maximizes the ``gini'' criterion used by CART (2), is chosen to be removed. Thus we have two partitions, one for the split class and the other for remaining classes. Now the partition containing the remaining classes is further partitioned by considering removal of each remaining class one at a time. This procedure is repeated until only one class remains.

We propose a similar methodology except that we choose to split a class whose arithmetic average on the attributes is farthest, in a Euclidean sense, from the arithmetic average of all classes. There are two types of nodes in the tree: split nodes and terminal nodes. Each split node is associated with a linear programming split, as described above, and has two children in the tree, one a terminal node and the other a split node. Each terminal node is associated with a class label, and has no children. An example decision tree is shown in Figure 3. In growing a tree, we maintain a set of active classes (i.e., classes which are not yet associated with a terminal node). Initially, all classes are active. The root node of the tree is always a split node that corresponds to an attempt to separate a single active class from the remaining set of active classes. One child of the root is a terminal node with the class label of the class that we are attempting to separate from the other active classes. The other child of the root is a split node that may be viewed, recursively, as the root node for a subtree with one less active class. At any level of this process, the class that is chosen for separation is that active class whose average pixel intensities are the furthest from the average pixel intensity for all other active classes combined. In the linear programming formulation, the active class we are attempting to separate forms class 1, while class 2 is the set of remaining active classes.

Our tree growing methodology is described formally in Algorithm 1.

4. Gray scale construction

After the decision tree has been grown using the training data set, the tree is used to classify all the pixels of the AVIRIS image. For those points which are classified as road, we consider two strategies, combined separation and pairwise separation, to create a gray scale value. With both strategies, our primary motivation is to obtain a measure which is indicative of the probability of a particular pixel being of the class road. The measure is then converted to a gray scale value and used to create a gray scale image. The gray scale image can then be used as input to road tracing algorithms, such as the one considered in section 6.

We calculate the probability of a particular pixel being of the class road as follows. First we select the discriminating hyper-plane which was used in the decision tree to partition the road class. We calculate the distance from the pixel point to the discriminating hyper-plane. This distance is divided by the centroid distance (i.e., the arithmetic average of distance of all points of the road class from the hyper-plane boundary). If the resulting value is more than 1, we assume a value of 1, constraining the measure to values between 0 and 1. Finally, the gray sale value is obtained by multiplying the measure by 255.

Our first strategy, combined separation, assumes only two classes, road and non-road, to create a gray scale image. Water, soil and developed areas are combined into the non-road class. Results are shown in figure 4. The combined separation strategy is also applied on the image assuming that water is excluded from the non-road class (figure 5). In the second scenario which excludes water, the noise level in the image is decreased. To make this approach useful we can add an additional step in which only water is detected and removed from the image.

Our second strategy, pairwise separation, creates multiple discriminating hyper-planes between road and soil, road and water, and so on. We then add the distance of the pixel to each of the discriminating hyper-planes and divide it by the sum of distances of the centroid of the road class to each of hyper-planes and multiply by 255. In all these cases we use 70 bands and 4 classes to create the decision tree. First we examine the distance to all three classes (water, soil, and developed areas). These results are shown in figure 6. Next we consider distance to soil and developed areas (figure 8), and finally we consider distance to soil only (figure 9). Overall results show that the discriminating hyper-plane for soil is more effective than other discriminating hyper-planes in constructing images. In addition, the pairwise separation strategy is more effective than the combined separation strategy.

5. LP for feature selection and nearest-neighbor classification

When linear programming is used for creating a split between two classes we obtain a weight vector, , as an output. Each element in this weight vector corresponds to a feature, or intensity measurement. The size of a weight is in some respects an indication of the importance of the corresponding feature in the split. Features which do not contribute to the split, for example, have a value zero in the weight vector. We can therefore use linear programming as a feature selection algorithm.

We have used this idea in decreasing the computation of the Nearest-neighbor algorithm. We also used this idea to increase the effectiveness of our decision trees. This idea was incorporated into the decision tree by running each split twice. In the first exploratory split we use all 224 bands. We then examine the feature weights and select any features that may have a significantly higher weight than the others. Using this approach we selected two bands that were very effective in separating soil from road. The result of our approach are shown in 10.

The k-nearest neighbor classification algorithm proposed by Fix and Hodges (10) has the following simple form: Define a distance metric on the feature space, say the Euclidean norm, and choose an integer k greater than 0. We have chosen k to be 20. For every point x, find the k points in the training set that are nearest x. These points are called the nearest neighbors of x. Assign x to the class which has the largest representation among the k nearest neighbors of x. This rule can be modified to create the gray scale value that we need. In particular, for each pixel we set the gray scale value equal to 255 times the fraction of the pixel's k-nearest neighbors that belong to the road class.

The k-nearest neighbor algorithm is computationally intensive. In its basic form, this algorithm requires the calculation of the distances between all the points in the training set during each of the k main iterations of the algorithm. To reduce the computations required, we used LP-based feature selection. When we created the LP decision tree we found that 16 of the bands had a much higher coefficients in the discrimination function than the other bands. We created nearest neighbor images using these 16 bands (figure 11), and then compared it with a nearest neighbor image created using 100 bands. There were no substantial differences. Indeed, the results of k-nearest neighbor are similar to our LP-based approach.

6. Road tracing

Features, such as parking lots, share many of the same properties as roads. Therefore, it is difficult to isolate roads from these features. Moreover, obstacles on roads, tunnels, under-passes, shadows, and canopy cover may effectively obscure the road from overhead satellite imagery. There are many algorithms, such as those based on edge detection and masking, that can aid in the extraction of roads. We demonstrate the use of one such tracing technique, which is based on cost functions and dynamic programming. In the tracing algorithm, certain properties of roads, such as their elongation, contrast to adjoining pixels, and gray scale value, are used to create a cost value. This cost value is low if the configuration of pixels is close to what we would expect from a road, and is high otherwise. We then use dynamic programming to trace a minimum-cost path through the pixels of the image produced by classification.

We briefly explain the algorithm and provide results. The first step in the process of extracting roads is to characterize the points representing the roads by geometric and spectral considerations. This characterization is done using a costing procedure, that for each pixel, associates a cost such that, the lower the cost, the more likely the point is to be a road pixel. The costing procedure is carried out by examining each pixel's neighborhood to detect special configurations. Certain configurations have a higher probability of being road than other configurations. For example, if in the eight neighborhood of a pixel (eight pixels immediately adjacent to the current pixel) there are only 0 or 1 road pixels then the cost function associated with the current pixel is high, since isolated road pixels are rare. In addition, if we invert our gray scale image, the pixel intensity itself may be used as a cost function. The various cost functions are added together to define an overall cost function, which is used to build a matrix in which each element is a cost associated with an individual pixel.

To trace a road through the matrix, we choose a pixel which we know is a road pixel and choose the number of pixels which should be joined together. Typically the figure is 5 or 6. We can then use dynamic programming to find the minimum cost path through the matrix for any starting point. The results of road tracing on the gray scale image created by our LP decision tree are shown in figure 12.

7. Conclusions and suggestions for further research

Using linear programming decision trees seems to be of some effectiveness in creating a gray scale image for road extraction. In particular, pairwise separation seems to give results that are superior to combined separation and are comparable to nearest neighbor. Linear programming also seems to be useful for feature selection.

Future research will involve the modification of our decision tree by using linear programming to create quadratic splits, rather than linear splits. Another avenue of research will explore the modality of the data. If this research shows that the data is multi-modal, our decision tree will be extended to accommodate the data by clustering the data and partitioning each class into subclasses.