Efficient Generalisation and Abstraction of Network Data Using Perceptual Grouping

Robert C. Thomson¹ & Rupert Brooks²

¹Baskent University
Department of Computer Engineering
Baglica Kampusu
Eskisehir Yolu
Baglica 06530
Ankara
Turkey
Thomson@baskent.edu.tr

²National Atlas of Canada
Canada Centre for Remote Sensing
615 Booth Street
Ottawa, Ontario K1A 0E9
Canada
Brooks@nrcan.gc.ca
http://www.nrcan.gc.ca/~rbrooks/

Abstract

Spatial data generalisation or abstraction is a vital computational procedure in data presentation (notably in cartography) and in data integration. This paper considers the particular difficulties of the generalisation of road and river networks, outlining the approach developed at Canada Centre for Remote Sensing (CCRS) which exploits the principles of perceptual organisation . Perceptual organization (or grouping) principles present a powerful set of analysis tools for spatial data, with applications in various fields. They are key to the human understanding of images and hence can direct us to the crucial aspects of spatial analysis.

By means of the perceptual grouping principle of ‘good continuation’ the network can be analysed (i.e. broken down) into linear elements, which are non-branching chains of arcs. We term these linear elements ‘strokes’; this term is prompted by the idea of a curvilinear segment that can be drawn in one smooth movement and without a dramatic change in style. Unlike arcs, strokes may in theory contain any positive number of nodes, and may intersect each other or themselves. They are also computationally simple to derive, and can be applied in both road and river network analysis. Furthermore, the stroke-building algorithm is both robust enough to operate on incomplete or imperfect data, and can be easily constrained using additional information.

Further analysis allows the derived strokes to be ordered, to reflect their relative importance in the network. The deletion of the elements according to this sequence then provides a simple and effective method of generalising (attenuating) the network. In the case of hydrologic networks, the established method of generalisation on the basis of stream orders can be viewed as a special instance of this more general approach. We also note that when two sub-strokes are combined into a new stroke the attributes of the resulting stroke can be easily obtained from those of the component sub-strokes. This allows the analysis of a network to be derived from the analysis of component pieces - with concomitant advantages in efficiency and storage.

This perception-based approach to network generalisation has been implemented and evaluated at CCRS. The implementation is outlined, results presented, and the effectiveness of the techniques demonstrated on both road and river networks. The technique has been used to do the hydrology selection for a map covering Canada's three northern territories.

1 Introduction

This paper addresses the problem of the efficient and effective generalisation of geographic network data. The solution presented employs perceptual grouping principles, particularly that of ‘good continuation’. It will be shown that the good continuation principle can serve as the basis for partitioning a network into a set of linear elements or ‘strokes’, which are chains of network arcs. Various attributes, such as length or representative class, can be derived for these elements. On the basis of these measures the set of strokes can be ordered to reflect the strokes’ relative importance in the network, which can also be made to accommodate considerations of connectivity. The deletion of the strokes according to the derived sequence then provides a relatively simple and effective method of generalizing (i.e. attenuating) the network.

An implementation of this approach has been extensively tested at the CCRS. These tests have shown that the approach is applicable to both road and river networks. The technique has been used in practice with good results to create a selection of rivers from the 1:1M scale VMAP0 dataset to be shown on a forthcoming printed map of Canada’s Northern Territories to be published in the fall of 2000.

2 The generalisation of geographic networks

2.1 Generalisation

The term ‘generalisation’ has been used to label a range of procedures entailed in spatial database handling and the cartographic presentation of spatial data. The manifestation of generalisation considered here is a form of data abstraction that has been termed "distribution refinement" [DeLucia & Black 1987]. This may be viewed as controlled data reduction: attenuating a pattern – which may comprise lines, points and areas – while retaining those elements that are most important in the given context and, moreover, preserving the character of the pattern. The patterns for generalization considered here are geographical networks, primarily road and river networks; these are instances of "open irregular networks" whose generalisation has been considered relatively challenging [DeLucia & Black 1987].

Procedures for distribution refinement have obvious applications to data volume reduction, for saving storage space or improving data processing and transmission times. A further important reason for such a procedure is the homogenisation of different datasets in the process of data integration. A primary objective of database abstraction is the derivation of databases at multiple levels of accuracy or resolution, and this form of generalisation serves as a preprocessing operation for cartographic generalisation and the visual presentation of the data (map generation) [Weibel 1995].

Guidelines as to the amount of data reduction that is appropriate for a given coarsening of scale have been around for a long time: for example, the "radical law" of Töpfer and Pillewizer [1966]. The problem addressed here is how to decide the particular network elements to be removed – further entailing the issue of how exactly the network elements should be defined.

2.2 Generalising hydrologic and road networks

Ultimately, the process of database generalisation may be viewed as a sorting process. Elements in a set are identified and ranked in order of importance; for a reduced level of detail only a certain percentage of the elements – the more important – will be used. The problem of hydrologic network generalisation has thus been seen as that of calculating weighting values for each river segment (the portion between two junctions). By thresholding the data using these values the network can then be attenuated to the desired degree.

The stream orders developed by physical geographers and hydrologists – e.g. Strahler [1957] and Horton [1945] orders – when taken together with segment length give a relatively simple but effective ordering of the segments. The stream order calculations require, of course, knowledge of the stream directions. Complications are introduced into the ordering scheme by stream braiding and delta structures, and the method is complicated further when lakes within the network are also subject to removal by generalization. But these difficulties can be overcome and a reliable generalisation scheme for hydrologic networks can be established on this foundation [Richardson 1993].

Road networks have proved more difficult to generalise, however. This is because road networks, unlike hydrologic networks, contain abundant loop structures (hamstringing attempts to derive equivalents to stream orders, which assume an underlying tree structure), there is no equivalent to a flow direction, and road segments may have associated a range of semantic data, such as road class, width, surface type, and seasonality.

3. Perceptual grouping

Perceptual organization (or perceptual grouping) describes the phenomena whereby the human visual system spontaneously organizes elements of the visual field, even with no high level or semantic knowledge available. These phenomena were studied by the Gestalt psychologists; they observed how some arrangements of picture elements tend to be seen as ‘belonging together’, forming natural groups that often appear to stand out from the surrounding elements, i.e. as ‘figures’ against ‘grounds’.

Many perceptual grouping principles have been identified, such as proximity, similarity, symmetry, closure, parallelism, collinearity, co-termination and continuity. Some illustrative examples are given in figure 1.

Figure 1: Examples of perceptual grouping

Figures 1(a) to 1(c) show grouping according to different forms of similarity of the elements – similarity of shape, colour (or grayscale) and orientation. Thus, for example, figure 1(b) is perceived as two groups of white circles and two groups of black circles, even though the vertical separation of the circles is slightly less than the horizontal separation. Figure 1(d) shows the effect of collinearity: the three collinear elements are perceived as the figure, with the other elements as ground. Figure 1(e) shows the importance of good continuation: this figure is perceived as one smooth curve with two smaller curves incident on it, rather than as, for example, five connected linear elements. Figure 1(f) shows the effect of uniform density: the collection of dots is perceived as two groups, where the dot spacing within each group is relatively uniform.

The perceptual grouping principles are now widely recognized as the basis for parsing the visual world into surfaces and objects according to relatively simple visual characteristics, in a process that operates independently of the domain being represented [Palmer, 1983]. Consequently, perceptual organization principles play a vital role in the understanding of two-dimensional images of three-dimensional scenes [Witkin & Tenenbaum 1983; Biederman et al. 1991; Lowe 1985]. These principles are also crucial in the interpretation of other forms of images, such as sectional images [Thomson & Claridge 1989; Thomson 1999], and in the understanding of maps [Bertin 1983; MacEachren 1995].

The important role of these principles in map generalization has also long been recognized. Notably, DeLucia and Black [1987, p.175] stated that "… there are direct analogies between what is required for successful map generalization and … the principles of perceptual organization enunciated long ago by the Gestalt psychologists".

4. Network generalisation via "strokes"

The principle of perceptual organisation which is applied here to the generalisation of networks is the principle of good continuation, viz. "that elements that appear to follow in the same direction (as in a straight line or simple curve) tend to be grouped together" [Coren and Ward 1989]. Thus a series of relatively smoothly connected elements will be naturally "grouped" and perceived as a single object intersected by others. This single object, or stroke, takes the form of a linear, connected set of smaller objects. We use the term "strokes" for these linear elements, prompted by the idea of a curvilinear segment that can be drawn in one smooth movement and without a dramatic change in style.

The arc-node topological data model is commonly used in GIS systems. In this model, a dataset of linear features is maintained as a planar graph. Many of the graph theoretic properties are explicitly represented in the data structure. In a network in such a system, there are two types of object: the arc, or linear feature, and the nodes, which occur at the ends of the arcs.

In our data model, a stroke will comprise a set of arcs and nodes. This set represents a path through the network from one terminal node to another (possibly the same) which will utilize all the arcs in the set, without repetition – although the path may self-intersect and so nodes may repeat. This path is the stroke.

4.1 Stroke Building

The stroke building process assumes that a proper planar graph data structure has been created for the given dataset. At each node, a decision is made as to which arcs (if any) adjoining the node should be connected together. Any number of criteria may be applied at this point. The simplest criterion is the angle of deflection that the join would imply. Additional criteria which have proved useful in practice are the class (for road networks), direction of flow (for river networks), and the name of the feature (for roads and rivers). Once a decision has been made at each node the strokes are assembled by accumulating sets of arcs that connect to each other. When all arcs have been assigned to such a set, the process is complete.

Figure 2: Strokes constructed on the road network for downtown Ottawa

Figure 2 shows the strokes generated for the road network in part of downtown Ottawa. The strokes in this figure have been coloured so that no intersecting strokes have been assigned the same colour. In the upper part of the figure, the gridded street layout has caused a number of parallel, straight strokes to form. In contrast, near point A, a stroke has formed a complete loop.

In the implementation used at the National Atlas of Canada, stroke building is a local process, considering each node in turn and dependent completely on the properties of the network at that node neighbourhood. This implies that strokes can be locally adjusted, with the update requiring only small processing time. (One exception to this may occur in the hydrologic network where the stroke building may be made dependent on supporting information such as the largest upstream length or largest drained area. In this case, the stroke-building process is still local, but the generation of the supporting information may require processing the entire basin.)

It is interesting to compare this approach to network generalization with that described by Mackaness [1995] for urban road generalization. The latter scheme is similar in that ‘axial lines’ are derived for a network, quantitative indices are derived for these elements, and these then provide the basis for the generalization. The derivation of axial lines, however, is based on lines of sight, following a concept drawn from urban space pattern analysis [Penn, 1993]. Compared with axial lines, strokes are far simpler computationally to derive and, being based on more generally applicable principles, they can justifiably be used for a wider range of networks.

The stroke construction process is robust, in that there are no degenerate cases that cause the algorithm to fail. However, if faulty data is used as input, strokes will still be defined, although they may not accurately correspond to any meaningful real world objects. To achieve meaningful results it is essential that the dataset have correct connectivity, and in the case of hydrologic networks, correct flow direction. In both cases, better results can be achieved by using additional attribute information to guide the stroke building process. For example, in both road and river networks, constraining the process so that arcs with the same name attribute (e.g. Mackenzie River; Bank Street) are connected into a stroke yields good results. Where this information is incomplete, the strokes will be built according to geometric considerations.

Once the strokes are constructed the networks must still be generalised. First an order of importance must be assigned to every stroke in the network. Then some proportion of them must be removed to reduce the data density. Finally any cleanup processes, if necessary, should be applied to create the final selection.

4.2 Road Networks

To devise an ordering scheme for a road network, three simple principles were applied:

All other things being equal, a longer stroke is more important than a shorter one.
All other things being equal, a stroke made up of roads of higher quality is more important than a stroke made up of roads of lower quality. (For example, a superhighway or motorway would be a very high quality road, while a logging track would be a very low quality road)
Finally, at no point in the generalisation should a connected component of the original network be split into two disconnected parts.

It is relatively straightforward to define a salience value for each stroke based on length and weighted by a factor derived from the road quality. However, an ordering based on this salience alone is not enough, as simply removing the shortest and poorest strokes can easily cause the network to become disconnected. To avoid this, the ordered list based on the raw salience must be adjusted to avoid disconnecting the network.

Each stroke acts as a connector between all possible pairs of its neighbours. Therefore, connectivity is maintained so long as all pairs of the neighbours of a stroke still have a path connecting them after the stroke is removed. The following procedure creates an ordered list of strokes such that strokes can be removed in sequence without disconnecting the network.

Let there be a stack, S, of strokes ordered by salience so that the least salient stroke will be next off the stack. Let H be a heap, currently empty, and let L be an output list.

1. Pop a stroke, call it A, off S.
2. Examine the neighbours of S; for each possible pairing of neighbours, try to find a path between them that does not traverse A.
3. If a pair of neighbors is found that do not have such a path

3.1.1 Put A on the heap

Otherwise, it is safe to remove A

3.2.1 Put A on L
3.2.2 Check each stroke in H, in order of increasing salience, to see if it can be removed. If it can, then put it on L, and recheck the strokes in H.

4. Repeat until S and H are empty.

If the search in step 2 is complete, then this procedure will produce an ordering for all strokes in the network. In practice, however, it was found more efficient to only search for an alternate path to a depth of four branchings. This may result in some strokes appearing impossible to remove, but this did not adversely affect the results.

Figure 3: Raw road data for Ottawa and surrounding area

Figures 3 and 4 show the effect of this generalisation on the road network data for the Ottawa area. Figure 3 shows the complexity of the original dataset. The dark lines in Figure 4 show the result, after generalisation using the method described above. No postprocessing has been applied to this result so that the effect of the algorithm is clearly shown. In practice, some complementary processing would be applied.

Figure 4: Generalised road network for Ottawa and surrounding area

4.3 Hydrologic and other networks

For a hydrologic network, it has previously been shown that the Horton stream ordering method most closely approximates the generalisation decisions made by a human cartographer [Rusak Mazur and Castner, 1990]. Horton’s ordering method operates in two stages, as shown in Figure 5. In the first step, the unbranching streams are removed and numbered as one, these process is then repeated, incrementing the number each time, until the entire network has been ordered. This method is also known as the Strahler ordering. In the second stage, a main stream is selected, and assigned the highest order occurring on it throughout its length. Of the remaining tributaries, a main stream is selected in each, and assigned the maximum order which occurs on its length. This process continues until all components have been assigned an order in the second method [Goudie 1985].

Figure 5: Horton stream ordering method

By carefully selecting the stroke building rules, the strokes can be constructed so that they can be ordered using Horton’s method. The strokes are used to define the main path, needed for the second stage of processing. At each node, a choice of the main path is made which is then connected to form a stroke. The maximum first stage (Strahler) order occurring on any element of a stroke is assigned as the Horton ordering of that stroke.

The definition of what constitutes the "main stream" is a key component of this process, and has a large effect on the quality of the results. Horton’s original criteria were that the longest and straightest path should be used to define the main stream. Other determinations of the main stream have included the straightest path [Richardson 1993] or the largest drained area [Pfafstetter 1989]. In many cases, quirks of human geography have influenced the selection of what is considered to be the main stream in defiance of any geometric characteristics. For example, the Missouri river system is a much longer headwater of the Mississippi river than what is named the Mississippi head water. However, the historical development of the United States has assured that this river system is named in a manner that contradicts its geometry. The use of the stroke data model does not enforce any particular choice of main stream; the rules used for building the strokes should simply be the same rules used for selecting the main stream.

Braided streams add an additional level of complexity to the ordering process. Prior to assigning any orders, braids were separated from main channels using the rule that the main channel was the shortest and straightest path to the ocean. Braid channels were assigned a stream order in the first processing step, but the intersection of two braids of the same order, or of a braid and main stream of the same order did not increment the order of the outflowing stream. When the strokes were constructed, they were not allowed to include both braided and main channels. The result was that the main channel in a braided stream would be surrounded by braids of equal or lesser order, while preventing a runaway increase in stream order due to the many intersections within the braid. When removing elements from the network according to their length and Horton order, this scheme prevented any components of the braid from becoming disconnected.

Figure 6: Ungeneralised river network (Attawapiskat River)

Figures 6 and 7 show the generalisation process using strokes as applied to a river network. No post-processing of any kind has been applied to these images, so that the reader may clearly evaluate the effects of this algorithm. In practice, a number of constraints are utilised to refine the result.

Figure 7 : Generalised river network (Attawapiskat River)

5. Practical experience at the National Atlas of Canada.

An automated generalisation system was constructed for production use at the National Atlas of Canada. The system was heavily based on previous work by Richardson and Thomson [Richardson 1993; Thomson & Richardson 1995, 1999; Richardson & Thomson 1996]. The system has been used to generalise the hydrology for a National Atlas of Canada 1:4M scale map of Canada’s three northern territories from source material in the VMAP0 dataset. As part of an ongoing program [Brooks 1999], the source data had been cleaned and attributed, connectivity corrected and directionality computed.

The stroke-building algorithm itself is very efficient. Measured computational efficiency in practice appears to be approximately linear in the number of nodes, as shown in Figure 8. While the times listed here appear to be rather large, recent experiments indicate that at least a ten-fold increase in speed can be achieved by recoding outside of the ArcInfo GIS system. Furthermore, the stroke building algorithm will operate on erroneous datasets. The results will be progressively degraded as the source data degrades, but this allows operation in a production environment on large datasets, where there may not always be the option of detecting and correcting all the errors in a dataset.

Figure 8: Processing time for stroke building is linear

For production generalisation, a number of constraints were applied to the stroke building process to increase the cartographic accuracy.

Strokes could never connect two arcs whose direction of flow was not compatible.
Strokes would always follow a named path, if one existed.
Strokes would never cross from a braid to a non-braid stream.
Strokes could never jump more than two Strahler stream orders
The user could enter constraints to force the stroke to be built along a certain path at a node. These constraints overruled all other considerations.

Once these constraints were satisfied, the stroke building algorithm would choose the "longest and straightest" path. The ordered set of strokes formed the primary basis for generalisation, but could be adjusted by additional fine-tuning, if required. For example paths could be maintained between designated water bodies, e.g. a given lake and the ocean, in a manner analogous to that used for road networks.

Finally, the selection which resulted from the application of the generalisation technique described here was processed using cartographic generalisation techniques, including line simplification, displacement and exaggeration.

The hydrologic network for Canada, at the 1:1M scale contains approximately 500 000 arc features. As a complete dataset, it is difficult to handle this dataset, so for practical reasons it had to be partitioned for processing. Drainage basins partition naturally along the ridge lines of the topography. Road datasets can also be inconveniently large, and their partitioning into sub-networks may similarly be warranted. In this regard it may be noted that when two sub-strokes are combined into a new stroke during stroke-building the attributes of the resulting stroke can usually be obtained from those of the component sub-strokes. This should allow the analysis of a network to be derived from the analysis of component pieces - with concomitant advantages in efficiency and storage.

Once constructed, the perceptual stroke model was used for other applications beyond the generalisation of the data. In particular, it assisted in semi-automatically matching name attributes to river tributaries, greatly increasing the efficiency of that process.

6. Conclusions

The use of the technique of perceptual strokes allows the delineation of meaningful units of a network in much the same way that the human visual cortex perceives these units. This allows relatively simple properties of these meaningful units to be used to determine an order of importance. However, in each application it was necessary, or at least useful, to fine tune the stroke-based determination of generalisation order by additional processing. In the case of transportation networks this fine tuning took the form of procedures previously developed to prevent unnecessary dead-ends being introduced during network attenuation, together with the use of spanning trees to maintain certain connections [Thomson & Richardson 1995].

This method is robust enough to be used on faulty data with reasonable results. Furthermore, arbitrary local constraints could be added to the stroke building method without affecting its overall performance. This allowed additional information, such as names, where available to be used to adjust the systems performance.

Although no extensive testing has been done, it is reasonable to expect that the stroke-based generalisation method will apply to other types of networks, including pipeline and electrical networks. Current work in the National Atlas of Canada is exploring this possibility.

6. Acknowledgments

This work builds on methodologies and techniques developed at CCRS by Dr Dianne Richardson.

Statistics Canada generously provided road network data from their database which was used for the road network experiments.

References

Bertin, J. 1983. Semiology of Graphics, Madison, WI: University of Wisconsin Press.

Biederman, I., Hilton, H. J., and Hummel, J. E. 1991. Pattern goodness and pattern recognition. In: Pomerantz, J. R. and Lockhead, G. R. (Eds.), The Perception of Structure, Washington DC: American Psychological Association.

Brooks, R. 1999. The new and improved base framework for National Atlas data. Proceedings of the ICA 19th International Cartographic Conference (Ottawa, 14-21 August). Also available at http://atlas.gc.ca/english/about_us/index_pres_e.html#framework

Cohen, S. and Ward, L. M. 1989. Sensation and Perception, Orlando, FL: Harcourt, Brace and Jovanovich.

DeLucia, A. and Black, T. 1987. A comprehensive approach to automatic feature generalization, Proceedings of the 13th International Cartographic Conference (Morelia, Mexico), 168-191.

Goudie, A., Ed. 1985. The Encyclopedic Dictionary of Physical Geography. Basil Blackwell, Ltd. Oxford, UK.

Horton, H. 1945. Erosional development of streams and their drainage basins, Bulletin of the Geological Society of America, 56, 275-370.

Lowe, D. G. 1985. Perceptual Organisation and Visual Recognition, Boston: Kluwer Academic.

MacEachren, A. M. 1995. How Maps Work: Representation, Visualization, and Design, New York: Guilford Press.

Mackaness, W. A. 1995. Analysis of urban road networks to support cartographic generalization, Cartography and Geographic Information Systems, 22(4), 306-316.

McMaster, R. B. and Shea, K. S. 1992. Generalization in Digital Cartography, Washington DC: Association of American Geographers.

Palmer, S. E. 1983. The psychology of perceptual organisation: a transformational approach. In: J. Beck et al. (Eds.), Human and Machine Vision, New York: Academic Press, 269-339.

Penn, A. 1993). Intelligent analysis of urban space patterns: graphical interfaces to precedent databases for urban design, Proceedings of Auto Carto 11, 53-62.

Pfafstetter, O. 1989. Classification of hydrographic basins: coding methodology, Rio de Janeiro: DNOS (translated by Verdin, J. P., 1991. U.S. Bureau of Reclamation, Brasilia).

Richardson, D. E. 1993. Automatic Spatial and Thematic Generalization using a Context Transformation Model, Doctoral Dissertation, Wageningen Agricultural University, Netherlands.

Richardson, D. E. 1994. Generalization of spatial and thematic data using inheritance and classification and aggregation hierarchies. In: Waugh, T. C. and Healey, R. G. (Eds.), Advances in GIS Research 2, London: Taylor and Francis, 957-972.

Richardson, D. E., and Thomson, R. C. 1996. Integrating thematic, geometric and topological information in the generalization of road networks. Cartographica, 33(1), 75–83.

Rusak Mazur, E., and Castner, H. W. 1990. Horton’s ordering scheme and the generalisation of river networks, The Cartographic Journal, 27, 104-112.

Shreve, R. L. 1966. Statistical law of stream numbers, Journal of Geology 74: 17-37.

Strahler, A. N. 1957. Quantitative analysis of watershed geomorphology, Trans. of the American Geophysical Union 38: 913-920.

Thomson, R. C. 1999. Interpreting temporal relations in rock micrograph and geological section images, Proceedings of the 1st International Symposium on Imaging Applications in Geology (Liège, Belgium), 229-232.

Thomson, R. C., and Claridge, E. 1989. A 'computer vision' approach to the analysis of crystal profiles in rock sections, Proceedings of the 6th Scandinavian Conference on Image Analysis (Oulu, Finland), 1208-1215.

Thomson, R. C., and Richardson, D. E. 1999. The ‘Good Continuation’ Principle of Perceptual Organization applied to the Generalization of Road Networks, Proceedings of the ICA 19th International Cartographic Conference (Ottawa, 14-21 August), 1215–1223.

Thomson, R. C., and Richardson, D. E. 1995. A graph theory approach to road network generalization, Proceedings of the 17th International Cartographic Conference (Barcelona, Spain), 1871-1880.

Töpfer, F., and Pillewizer, W. 1966. The principles of selection: a means of cartographic generalization, The Cartographic Journal, 3(1), 10-16.

Weibel, R. 1995. Three essential building blocks for automated generalization. In Müller, J. C., Lagrange, J. P and Weibel, R. (Eds.) GIS and Generalization: Methodology and Practice, London: Taylor and Francis, 56-69.

Witkin, A. P., and Tenenbaum, J. M. 1983). What is perceptual organization for?, Proceedings of the 8th International Joint Conference on Artificial Intelligence (Karlsruhe, Germany), 1023-1026.