GeoVISTA Studio: a geocomputational workbench

Mark Gahegan, Masahiro Takatsuka, Mike Wheeler and Frank Hardisty

GeoVISTA Center, Department of Geography, 302, Walker Building, The Pennsylvania State University, University Park, PA 16802, USA.

Email: mark@geog.psu.edu URL: http://www.geog.psu.edu/~geovista/

Abstract

One barrier to the uptake of Geocomputation is that, unlike GIS, it has no system or toolbox that provides easy access to useful functionality. This paper describes an experimental environment, GeoVISTA Studio, that attempts to address this shortcoming. Studio is a Java-based, visual programming environment that allows for the rapid, programming free development of complex data exploration and knowledge construction applications to support geographic analysis. It achieves this by leveraging advances in geocomputation, software engineering, visualisation and machine learning.

At the time of writing, Studio contains full 3D rendering capability and has the following functionality: interactive parallel coordinate plots, visual classifier, sophisticated colour selection (including Munsell colour-space), spreadsheet, statistics package, self-organising map (SOM) and learning vector quantisation.

Through examples of Studio at work, this paper demonstrates the roles that geocomputation and visualization can play throughout the scientific cycle of knowledge creation, emphasising their supportive and mutually beneficial relationship. A brief overview of the design of Studio is also given. Results are presented to show practical benefits of a combined visual and geocomputational approach to analysing and understanding complex geospatial datasets.

Keywords: geocomputation, visualisation, knowledge discovery, classification, inference.

1. Introduction

Geocomputation encompasses a wide range of different tools and techniques, including data mining, knowledge construction, simulation and visualization, all operating within the geographical realm. These activities take place along the entire extent of the scientific process, often beginning with abductive tasks such as hypothesis formation and knowledge construction, through inductive tasks such as classification and learning from examples and ending with deductive systems that build prescriptive models (that are common in spatial analysis and GIS). However, unlike GIS, there is no standard package or system that currently supplies these different types of functionality as an integrated whole, instead users must resort to a set of disparate (and often clumsy) programs that are difficult to connect together operationally. This is a serious problem, and probably the biggest barrier to the uptake of geocomputational methods currently.

Making such methods freely available and able to work in harmony is a difficult task involving conceptual challenges associated with knowledge construction as well as practical difficulties in a software engineering sense. One possible solution is to build a software environment that can hide some of the engineering, metadata and conceptual problems from the user, whilst at the same time offering extensibility and the ability to customise. GeoVISTA Studio is such an environment, offering programming-free software development for a combination of geocomputation and geographic visualization activities. Studio employs a visual programming interface, allowing users to quickly assemble their own applications using a data-flow paradigm from a library of functionality implemented as JavaBeans™ (JavaBean is a trademark of Sun Microsystems, Inc., 901 San Antonio Road, Palo Alto, CA 94303 USA.)

After the following motivation section, the roles that geocomputation and visualization can play throughout the scientific cycle are described (Section 2), emphasising their supportive and mutually beneficial relationship. Section 3 then examines some of the alternative methods by which knowledge is constructed, and shows how they might be integrated conceptually. Following this, an overview of the architecture of Studio is given. (Section 4). Example results from Studio are then presented in Section 5 to show practical benefits of a combined visual and geocomputational approach to analysing and understanding complex geospatial datasets.

1.1 Reasons for building Studio

It is worth noting at the outset that building tools to support knowledge construction and other geocomputational activities is not straightforward because little is known yet about how these activities might be formalised by a machine, or even how they might interact effectively. So, the building of a geocomputational environment is at this stage somewhat speculative, it is not possible to produce a design specification based on functionality, as is common within the GIS industry. Bearing that in mind, the following list of points acts as both motivating factors and design goals.

Humans engage in many different forms of reasoning (Section 3 takes this up further) when tackling a scientific problem, yet current GIS address only one explicitly, namely deduction, and provide only ad-hoc support for others. New types of co-ordinated functionality are needed to generate the categories, hypotheses, relationships and objects that GIS employ.
We do not know yet how best to support these knowledge and hypotheses construction activities. By assembling together a range of learning, knowledge discovery and visualisation tools in the same environment we can more easily investigate the kinds of tools, linkages, controls and meta data that prove to be effective in an operational setting. In doing so we hope to learn how to make this kind of functionality interoperable in the future.
Stronger links between visualisation and geographical analysis are required. GIS are getting better at visualising data, but are still largely tied in to the cartographic paradigm, so lack the flexibility and functionality required to support visualization targeted at knowledge construction. Furthermore, some geographic simulation models are becoming so complex that it is imperative to have visual means of tracking and steering their behaviour as they execute (e.g. coupled atmosphere and ocean climate models, Hibbert et al., 1996).
It is very difficult to exchange geographical models with other researchers. Although standards for exchanging data are now quite advanced, we have perhaps have lost sight of the fact that our data is only useful with appropriate analytical models (Goodchild 2000), and do not yet know how to make these interoperable. Studio provides both an environment in which complex functionality can be linked together into models, and a simple method to 'wrap up' the assembled functionality into an application (a JavaBean) that can be easily disseminated. Furthermore, using Bean technology, it is straightforward to extend these models or couple them to additional methods (Section 4 gives more detail).
By combining visual and geocomputational approaches within the same environment, many benefits are realisable; three cases follow. Firstly, a visual interface allows abductive knowledge discovery agents to report their findings within a visual domain, thus drawing the expert's attention to potentially significant patterns within highly multivariate data spaces. Secondly, inductive learning agents can be trained in these visual data spaces from anomalies and structures recognised by human experts. Thirdly, visualization allows the behaviour of machine learning tools to be monitored during training or configuration as a form of audit and control to ensure correct functioning (e.g. visualization of hyperplane movement in neural networks). Section 5 provides some practical examples.

1.2 Studio Capabilities

At the time of writing, Studio has the following functionality:

full 3D rendering capability including frame-by-frame animation, as is available in Viz5D (Hibbert et al., 1996) but with the addition of full control over all visual variables during 'playback';
interactive parallel coordinate plots, for dynamically visualising the relationships between many attributes (Inselberg, 1997; Edsall, 1999);
a visual classifier, to convert continuous variables into mapable discrete colour ranges (Slocum, 1999: Chapter 4), including manual and Jenks Optimal methods (Jenks, 1977);
sophisticated colour selection, including RGB and Munsell colour-spaces (Slocum, 1999: Chapter 5);
a spreadsheet, for keeping track of numerical values and for co-ordinating data selection activities with other components;
a statistics package, to provide descriptive statistics on both samples and populations, including some rudimentary spatial statistics and a range of analysis and classification methods including: k-means clustering, ISODATA and maximum likelihood (Dunteman, 1984);
self-organising map (SOM) and learning vector quantisation (LVQ) neural networks (Kohonen, 1995) for classification, pattern analysis and machine learning in complex feature spaces.

2. The Spectrum of Science

Our efforts to understand and model the world around us take many forms (Mark et al., 1999). Even when a scientific perspective is specifically adopted, there is no single universal standpoint or origin from which to begin; the creation or uncovering of knowledge occurs at many levels and starts in many places. What we see in GIS, for example, is often a well-ordered world, comprised of discrete objects drawn from crisp categories, and associated together with a small and precise set of logical relationships. Analysis in GIS often starts with these objects, categories and relationships being accepted as 'given', and proceeds to use them as part of some deterministic model. The fact that GIS has proved itself to be beneficial in a number of organisations and practical settings attests to the usefulness of these starting assumptions. However, we do well to remember that they are assumptions, nothing more. GIS are successful precisely because prior activities such as fieldwork and data interpretation produced the objects, categories and relationships used, from less abstract sources of information.

The idea, often implicit in GIS, that objects and categories have some sort of 'natural' existence and order just waiting to be 'uncovered', is a very old one, cropping up in the teachings of Aristotle. It has been justifiably claimed that humans need categories to function (Lakoff, 1987) and it appears that GIS do too! However, anyone who has worked with landcover classification, geological mapping or eco-regions, for example, will well understand the difficulty in creating classes in the first instance, not to mention the problem of communicating them effectively to others. Conversely, those who have needed to use such classes, but who did not play a part in creating them, will be acutely aware of the frustration of never being fully certain of what a class is supposed to represent. To make matters worse, often the only clue to the true identity of a derived class is in the name or label it is given and possibly a short description in an annotated legend.

This description highlights a number of problems:

Constructing categories, objects and relationships requires different tools to the ones available in GIS.
Geospatial information is often constructed in a different system to the one within which it is applied, and increasingly not by the people who will be the end users.
There is often a good deal of information loss when data are moved from the system that created it to one that will apply it.
This lost detail may later be critical in understanding or correctly applying some created object.

Geocomputation activities span this full range of knowledge construction, and indeed are often associated with activities external to GIS because they are creating information for a GIS to use.

3. Knowledge Construction

Focussing just on categories, this section describes different ways in which categories can be formed, and some of the tools that might support their construction. The structure and internal form of categories has long been debated within philosophy and psychology (e.g. Peirce, 1891; Rosch, 1973; Baker, 1999) and has more recently received interest from the machine learning community as algorithms are designed to automatically construct and recognise (label) categories (e.g. Mitchell, 1997; Luger & Stubblefield, 1998; Sowa, 1999). This interest has stemmed from the real need to extract information from large and complex datasets and reduce large data volumes and complexities into some manageable form, for example by using data mining or classification (Piatetsky-Shapiro et al., 1996; Koperski et al., 1999; Landgrebe, 1999).

3.1 Defining categories

We now briefly address the issue of how categories might be defined in order to understand better how computational methods might help. A number of different mechanisms have been proposed by which a category might be defined, based on cognitive studies of humans. A comprehensive overview is given by MacEachren (1995: Chapter 4). The following are three of the most obvious:

Typical examples, not necessarily real (e.g. Crocodile Dundee as a typical Australian). Typical (but imaginary) examples are sometimes included in map legends, in the interests of clarity.
Exemplars or best examples (e.g. Rock groups: the Beatles, the Stones). These are defining examples that demonstrate the range or scope of a category, and about which other members may cluster.
By some attributes or properties and their relationships (e.g. a hot day is one where the temperature exceeds 25° C). Most machine learning tools classify data using attributes only so their categories are of this form.

The process of constructing a category in the absence of prior knowledge is often associated with the inferential mode called abduction. Taking some examples of a category and then producing a generalised description that can be used, say for classification, uses an inductive mode of inference. More on inferential mechanisms follows to help motivate the need for new tools with which to perform geographical analysis. A full description of inference in the earth sciences is given by Baker, 1999).

Deduction

A deductive tool behaves in a deterministic manner. It is not able to adapt to the particularities of any given dataset so its outcome is defined purely in terms of methods or rules that are pre-defined. Inductive and abductive tools will produce different results if the dataset is changed because they rely on the data to help structure the outcome (see below). In the example above, defining a day as 'hot' if the temperature exceeds 25° C, is deductive. The category would remain the same even if, in reality, most days have a temperature above 30° C. Deduction is most useful where a system is clearly understood, as in category type 3 above.

Induction

With induction, characteristics in the specific data under consideration help to shape the definition of a category. To continue with the above example we might choose some sample days that we would term as 'hot' (i.e. the concept of 'hot' is pre-defined by examples) and then construct a general category from the attributes of the sample. In doing so we might find that the concept of 'hot' varies with humidity as well as heat, or varies with place (e.g. from the UK to Australia), or varies with time (from summer to winter). Induction is very useful if the concept to be defined is complex, the dataset is complex or we are uncertain how to define the concept deterministically, but are confident in our ability to point to examples (category types 1 and 2 above). For precisely these reasons, inductively-based classifiers are now becoming common in remote sensing applications, especially when dealing with hyperspectral or multitemporal data (Benediktsson et al., 1990; Foody et al., 1995; German & Gahegan, 1996). They may well also become a crucial tool for understanding the large geospatial databases now being created for socio-demographic and epidemiological applications, simply because they are able to scale up to very large attribute spaces more readily than conventional deductive approaches (Gahegan, 2000). Field scientists often perform induction when they are presented with a number of examples of some phenomena from which they must form a mental model of a category, an example of the second way by which categories might be defined given above.

Figure 1. Visual analysis of class separability using a Parallel Coordinate Plot. See text for details. Figure 1 shows part of an inductive exercise. The categories in this case have been imposed upon the data (the left-most axis) and two of them, water (blue strings) and cleared land (green strings) are explored for potential class separation problems with during classification. In this case the two classes separate well on many of the available attributes, but note the presence of some strange outliers, especially blue strings with very high values for Landsat TM-Band-5 and TM-Band-7. These are likely to be erroneous and should probably be removed prior to classification (see later).

Abduction

Abduction is the inferential mechanism used to generate categories in the first place. In the Earth sciences, abduction is usually driven by observations (data) and expertise working together. True abduction must propose a categorisation and simultaneously give a hypothesis by which the category can be recognised or defined. It is our own adeptness with abductive reasoning that makes humans good Earth scientists. For example, when a field geologist is logging an area, the categories to be used may not be clear or fixed at the outset and new categories might need to be created. Furthermore, simple labels are not the only outcome, the categorisation produced has as its hypothesis an evolutionary geological model explaining how each category might have come to be. Categories may be based on form, structure, mechanical and chemical properties and reliance on any of these properties may differ from category to category. They may well also reflect the education, biases and experience of the geologist in question (see Brodaric et al., also in this volume). In computational data mining, unsupervised classification or clustering is often used to identify candidate categories, with the algorithm that separates the classes (class definitions) provided as the hypothesis. In contrast to the geology example, this is perhaps a weaker form of abduction because the hypothesis produced only relates to the attribute values in the data, and not to any additional (externally held) knowledge.

Figure 2. Using a parallel coordinate plot to search for possible classes in unstructured data. An example of the search for possible classes is shown in Figure 2. In this example, the target landcover categories are undefined and the user is exploring the clustering in the data that is characteristic of the 'Geology' attribute. There appears to be some kind of partial relationship between the remote sensing channels and this attribute, as evidenced by large swaths of yellow and purple strings concentrating in the TM-Band-4 and TM-Band-5 axes.

3.2 Combining inferential tools and techniques

Having defined these generic types of inference, it should be clear that geographers are in fact quite skilled in utilising all three, often in combination and with no pre-defined methodological structure. That is to say, there is no single mechanism by which these types of inference should be combined; the task of building a 'system' for knowledge construction is itself non-deterministic.

Figure 3 shows one possible arrangement for deriving categories from data, involving iteration between abduction and induction. Visualisation can play a key role in a number of these stages: in presenting a visual overview of the data so that categories might be hypothesised, in evaluating individual examples with respect to their 'representativeness', in portraying the boundaries between categories (e.g. in feature space) and in showing the results of applying the new knowledge to structure the data (Lee & Ong, 1996; Keim & Kriegel, 1996; MacEachren et al., 1999).

Figure 3. One possible scenario for the iterative construction of knowledge, involving first abductive then inductive reasoning.

The results (Section 4) show examples of knowledge construction and analysis using a variety of inferential tools and visualization methods, working in combination.

4. The Design of Studio

In order to carry out the sophisticated data analysis tasks outlined above, a system has to bring together the various kinds of geocomputational tools and techniques mentioned in Section 1.2. At its heart Studio has a component-oriented software building system (called "builder") that employs a visual programming environment to connect program components together into useful applications (see Figure 4).

Figure 4. A builder constructs an application by connecting program components. The builder allows different components, each offering pieces of the required functionality, to communicate freely with each other. However, the nature of these connections, i.e. what should be connected and how, is not clear at the outset (as described in Section 3.2). Consequently, the system needs also to provide an experimental environment to test and discover how components should be connected to maximise the effectiveness of constructing knowledge or otherwise analysing geographical data. To meet these and other needs, the builder was designed to address the following points: "Open Standards", "Cross Platform Support", and "Integration and Scalability".

4.1 Open Standards

A builder has to accommodate components (tools) developed by different parties. In order to handle this multi-developer problem, component design must adhere to well-established standards. Studio employs Java as the system programming language and so uses JavaBean technology to construct tools. All visualization, geocomputation, machine leaning and other components are implemented in the form of JavaBeans. The JavaBean specification defines a set of standardised, component, Application Programming Interfaces (APIs) for the Java platform. As long as a component is built according to this specification, it can be incorporated into any JavaBean capable builders as shown in Figure 5.

Figure 5. The builder integrates JavaBeans by using JavaBean APIs. For the developers of components, this provides a straightforward mechanism to ensure their components can be employed by many users (see Figure 6), and is therefore extremely useful at increasing productivity in a postgraduate laboratory setting! In other words, the end users can employ a wide variety of JavaBean components developed by various suppliers. In using a JavaBean component, a JavaBean capable builder automatically finds out a syntactic description of its functionalities and input/output methods as described below in Section 4.3.

Figure 6. A JavaBean is accepted in any JavaBean capable Builders. A non-JavaBean component is only usable in a propriety builder.

4.2 Cross Platform Support

Studio is designed to run on various operating systems (Solaris, Windows, Linux, IRIX, etc.) and hardware architectures (Intel, SPARC, MIPS, etc.). Since Studio itself is written in pure Java, Studio and its JavaBean components will run on any operating system and hardware combination as long as a Java Virtual Machine is available. (To check on currently supported platforms, refer to http://java.sun.com/cgi-bin/java-ports.cgi.) Moreover, Java's network capability allows a user to build network-aware systems for heterogeneous hardware and operating system environments. For instance, imagine a situation where two JavaBeans, A and B, are processing a task together. If component B is carrying out a computationally heavy task, a user might want to execute this component on a more powerful machine. If both A and B implement network functions, they can still communicate with each other as if they were executing on the same machine (see Figure 7(a)). Even if both components do not have network capabilities, a user can achieve the same result by connecting them using network capable JavaBean components (see Figure 7(b)).

Figure 7. Using Java's network capability, JavaBeanson different (hardware and software) platforms can seamlessly communicate.

4.3 Integration and Scalability

Studio, as a builder, must be able to integrate various kinds of components from a multitude of sources. Studio provides a basic set of JavaBean components, most of which are independently executable. A user can combine these components to construct useful visualization, geocomputational, and machine learning systems. However, solutions to a specific problem might be difficult to build using only the pre-supplied components; a user might wish to add in locally-produced components, or might obtain components from some other source (Due to the popularity of Java, there are many useful JavaBeans available already on the Internet). In all cases, Studio has to provide a mechanism to integrate any JavaBean component, regardless of its source.

In order to meet these requirements for a JavaBean capable builder, Studio utilises Java's introspection functions. With these functions, Studio is able to find out what kinds of input and output are available as well as all the customisable properties of a bean when it is dynamically added to Studio. A JavaBean sometimes supplies its own tool to customise itself. Even in this case, Studio is able to incorporate this special customising tool by using interfaces defined in the JavaBean specification. With these features, Studio does not need any prior knowledge of a new JavaBean in order to integrate it into a program developed under Studio. This is a clear advantage over older mechanisms involving recompilation and linking, such as used in third generation computing languages.

Studio utilises Java's event model in order to let a user connect JavaBean components together. When something of interest happens within a JavaBean, the bean notifies other components by sending an event object (like sending a message). When a user connects two JavaBeans with a mouse dragging action an event adapter object, which is invisible to a user, is created and is registered with an event source JavaBean as an event listener object. When the event adapter detects that something interesting happens it will call an appropriate method on the target JavaBean component.

Studio is also capable of 'wrapping' a whole connected graph of JavaBean components into another single JavaBean. This allows a user to gradually build up large scale and complex applications by constructing from small and less complex components. Moreover, any JavaBeans constructed at each stage of this gradual development can be shared or distributed among colleagues since they are independent working program components.

5. Experiments using Studio

Two sample experiments are shown here, both aimed at investigating the structure inherent in a large and complex dataset.

The first shows the co-ordinated application of: 1) a spreadsheet, showing numerical values for individual data records; 2) an interactive Parallel Coordinate Plot (PCP), depicting each record as a single 'string'; 3) a Visual Classifier (VC), to interactively impose a categorisation on the data; 4) an unsupervised k-means classifier. Using the Studio environment described above these components are connected together as shown in Figure 8. Other ancillary beans to read in and visualise data also appear in the figure.

Figure 8. Design Box from Studio showing components connected for data exploration leading to classification. See text for details. Connections show the flow of data and co-ordination of activities between the beans

The spreadsheet, PCP and VC can be used together to help explore an unfamiliar dataset and lead to the configuration of a successful classification. The PCP and spreadsheet allow the user to explore the data for outliers, errors or missing values that can then be removed or corrected prior to classification, since their inclusion will likely lead to problems. The VC and PCP allow the user to experiment with different class structures by changing the colours used for each of the strings, according to some chosen attribute value. This helps the user to hypothesise structure or relationships within the data (the beginnings of abduction) and also to select an appropriate value for k, the number of classes to be used in classification. To aid co-ordination, the components possess a degree of interaction. For example, clicking on a string in the PCP will select the appropriate row in the spreadsheet, and vice versa. Figure 9 shows an outlier in the PCP (in red) that has been visually identified. Selecting it (with the pointing device) automatically highlights the offending record in the spreadsheet (Figure 10). It can then be deleted if required. Other selection behaviours are also co-ordinated, for example selecting an axis in the PCP will highlight a column in the spreadsheet, and change the focus of the VC, and so forth. The selection of appropriate class breaks for visual display, using the VC is shown in Figure 11. The data, minus any problematic examples that are removed via the PCP or spreadsheet, are then passed to the k-means classifier which then computes k centroids to represent data classes. These centroids are then used to generalise to the entire dataset to form the classified image shown in Figure 12, an inductive step. Without removal of the problematic data, the final mapped result can be markedly different to that shown.

Figure 9. An interactive PCP used to study suitability of a training sample. An outlier is shown in red, and this can be deleted if required. Strings are grouped into five classes from a visual classification on the 'Shape' attribute.

Figure 10. The spreadsheet, showing part of the training data and automatically selecting the troublesome example highlighted in the previous figure.

Figure 11. The Visual Classifier (VC) used to assign data ranges to the five colours shown in the parallel Coordinate Plot above.

Figure 12. Results of running a k-means classifier on the cleaned up dataset shown above. The green class represents water, red is cleared land, blue, black and white represent a mixture of forest vegetation types.

The components described above are shown, as they appear in Studio operationally, in Figure 13.

Figure 13. A screen snapshot from Studio of the classification exercise described above.

If a more sophisticated classifier is used, such as Kohonen's Self Organising Map (Gahegan & Takatsuka, 1999), then one associated problem has been the difficulty in understanding the inner workings of the classifier, thus lowering our confidence in the outcome it produces and oftentimes leading to exhaustive testing as a substitute. However, Studio allows us to simply connect the state of the hidden layer of neurons at each timestep (a 2D array of distance measures forming a surface) to the 3D renderer so we can observe the classifier training in real time and ensure that it does indeed converge to a reasonable solution. Figure 14 shows four timesteps from the convergence of such a neural network and indicate a stable progression towards the final outcome (a good sign).

Figure 14. Images from the inside of the Self Organising Map. The 3D rendering shows distance between neighbouring neurons in feature space. Distance is normalised on the z axis from 0 - 10, with colour and height both visually encoding this distance. The images show clean convergence of the network at iterations 100 (top left), 300 (top right), 600 (bottom left) and 900 (bottom right).
The above applications were created without resource to conventional programming, but instead by connecting together a series of independently created components. These components form the backbone of Studio and obviously have been designed to integrate effectively. However, other geocomputational tools can be added into the mix with ease.

6. Conclusions and Future Work

With the continued development of advanced computational and visualisation methods, coupled with a greater understanding of how these can be applied to geographical problems and accompanied by breakthroughs in software engineering we are entering an exciting era of new possibilities for geographical analysis. Studio represents one approach for taking advantage of these possibilities and is a serious attempt to address the fundamental problems associated with knowledge discovery, exploratory analysis, classification and object creation as they relate to geography. It is perhaps too early to say whether the design of Studio facilitates these tasks effectively, but our own experience in developing the applications described above shows a promising gain in efficiency over traditional programming methods, and a much greater degree of integration and co-ordination among the component pieces, fostering easier exploration and better understanding of both tools and data.

With the visual programming environment now completed, future Studio development effort will focus on specific tools for geographic visualisation and analysis. Our current plans include interactive scatterplots, Bayesian knowledge discovery agents, and metadata (including semantic histories) to allow us to study and communicate the formation of geographic objects in greater detail. Further information about Studio, including sample images and downloadable applications and data are available from http://www.geovista.psu.edu/studio/. Future developments will also be posted to this site.

References

Baker, V. R. (1999). Geosemiosis. GSA Bulletin, May 1999, 111(5), 633-645.

Benediktsson, J. A., Swain, P. H. and Ersoy, O. K. (1990). Neural network approaches versus statistical methods in classification of multisource remote sensing data. IEEE Transactions on Geoscience and Remote Sensing, 28(4), 540-551.

Dunteman, G. H. (1984). Introduction to multivariate analysis. Sage Publications, New York, USA.

Edsall, R. M (1999). The dynamic parallel coordinate plot: visualizing multivariate geographic data. Proc. 19th International Cartographic Association Conference, Ottawa, May 1999. URL: http://www.geog.psu.edu/~edsall/JSM99/paper.htm.

Foody, G. M., McCulloch, M. B. and Yates, W. B. (1995). Classification of remotely sensed data by an artificial neural network: issues relating to training data characteristics. Photogrammetric Engineering and Remote Sensing, 61(4), 391-401.

Gahegan, M. (2000). On the application of inductive machine learning tools to geographical analysis. Geographical Analysis, 32(2), 113-139.

Gahegan, M. and Takatsuka, M. (1999). Dataspaces as an organizational concept for the neural classification of geographic datasets. Proc. Fourth International Conference on GeoComputation, Virginia, USA: http://www.geovista.psu.edu/geocomp/geocomp99/Gc99/011/gc_011.htm

German, G. and Gahegan, M. (1996). Neural network architectures for the classification of temporal image sequences. Computers and Geosciences, 22(9), 969-979.

Goodchild, M. F. (2000). Keynote address, Conference of the Association of American Geographers, Pittsburgh, 2000.

Hibbard, W. L., Anderson, J., Foster, I., Paul, B. E., Jacob, R., Schafer, C. and. Tyree, M. K. (1996). Exploring coupled atmosphere-ocean models using Vis5D. International Journal of Supercomputing Applications and High Performance Computing. 10(2/3), 211-222.

Inselberg, A. (1997). Multidimensional detective. Proc. IEEE conference on Visualization (Visualization '97), Los Alamitos, CA: IEEE Computer Society, pp. 100-107

Jenks, G. F. (1977). Optimal data classification for choropleth maps, Occasional paper No. 2. Lawrence, Kansas: University of Kansas, Department of Geography.

Keim, D. and Kriegel, H.-P.(1996). Visualization techniques for mining large databases: a comparison. IEEE Transactions on Knowledge and Data Engineering (Special Issue on Data Mining),

Kohonen, T. (1995). Self-Organizing Maps. Springer-Verlag, Berlin, Germany.

Koperski, K. Han, J. and Adhikary, J. (1999). Mining knowledge in geographic data. Comm. ACM (to appear). URL: http://db.cs.sfu.ca/sections/publication/kdd/kdd.html.

Lakoff, G. (1987). Women, Fire and Dangerous Things: What Categories Reveal about the Mind. Chicago: University of Chicago Press.

Landgrebe, D. (1999). Information extraction principles and methods for multispectral and hyperspectral image data. In: Information Processing for Remote Sensing (Ed. Chen, C. H.). River Edge, NJ, USA: World Scientific.

Lee, H. Y. and Ong, H. L. (1996). Visualization support for data mining. IEEE Expert Intelligent Systems and their Applications, 11(5), 69-75.

Luger, G. F. and Stubblefield, W. A (1998). Artificial Intelligence: structures and strategies for complex problem solving. Reading, MA: Addison-Wesley.

MacEachren, A. M. (1995) How Maps Work. Guilford Press, NY, USA.

MacEachren, A. M., Wachowitz, M., Edsall, R., Haug, D. and Masters, R. (1999). Constructing knowledge from multivariate spatio-temporal data: integrating geographical visualization with knowledge discovery in database methods. International Journal of Geographic Information Science, 13(4), 311-334.

Mark, D. M., Freska, C., Hirtle, S. C., Lloyd, R. and Tversky, B. (1999). Cognitive models of geographical space. International Journal of Geographic Information Science, 13(8), 747-774.

Mitchell, T. M. (1997). Machine Learning, New York, USA, McGraw Hill.

Peirce, C. S. (1891). The architecture of theories. The Monist, 1, pp. 161-176

Piatetsky-Shapiro G., U. Fayyad & P. Smith (1996). From Data Mining to Knowledge Discovery: An Overview. In: Advances in Knowledge Discovery and Data Mining, (U. Fayyad, G. Piatetsky-Shapiro, P. Smith & R. Uthurusamy, eds.), AAAI/MIT Press, pp. 1-35.

Rosch, E. (1973). Natural categories. Cognitive Psychology, 4, 328-350.

Slocum, T. A. (1999). Thematic Cartography and Visualization, Prentice Hall: New Jersey.

Sowa, J. F. (1999). Knowledge Representation: Logical, Philosophical, and Computational Foundations. Pacific Grove, CA: Brooks/Cole.