SPIN!-project Working Paper
State-of-the-Art Geographical Data Mining
Andy Turner
a.turner@geog.leeds.ac.uk
Disclaimer
The views expressed in this paper are those of the author and
do not necessarily reflect those of the SPIN!-project
consortium.
Abstract
This SPIN!-project working
paper was drafted in December 2002 to provide an assessment of the state-of-the-art
in geographical data mining (GDM). The
paper contends that the spatial data mining system for data of public interest
developed during the SPIN!-project
(SPIN) is a prototype GDM system which is state-of-the-art in terms of its
architecture and functionality.
Two years ago a SPIN!-project
report was compiled that assessed the state-of-the-art in Exploratory Spatial
Data Analysis (Turner, 2000). Since then
the state-of-the-art has not progressed far.
Although most geographical software that has been developed during this
time has looked to take advantage of lower level improvements, and some
software features have been enhanced and additional functionality has been
incorporated, on the whole there have been no major breakthroughs. This opinion is based on experience and a
careful examination of relevant literature rather than a large scale practical
and critical evaluation. This is unfortunate,
but a consequence of the restricted availability of resources. Hopefully, the subjective assessment of the
state-of-the-art that this paper offers will be useful. It aims to provide a reference for further
research and highlights some developments that are likely to play an important role
in developing future state-of-the-art GDM systems.
Additionally, this working paper details a crucial difference
in the meanings of clustering terms as used in data mining and geography.
1. Background and Introduction
Available computational power has increased at an
accelerating exponential rate (URL 1).
The increasing power comes from faster and more connected processors and
memory, larger and more organised memory banks, and more efficient operating
systems and software. As computational
power increases and means for human computer interaction evolves there arise new
opportunities for the analysis of the vast amounts of geographical data that
are collected. The human and computer resources
available for software development are immense yet the available software for
GDM does not readily facilitate the processing of the massive volumes of
geographical data into more useful information that can be analysed in highly
automated ways so as to develop our understanding of geographical phenomenon.
Open source development and the Internet are coming of age. This is coinciding with trends to modularise,
develop well specified libraries of functionality and adopt and develop
standards. The data analysis capabilities
of GIS, DMS, mathematical and statistical packages, and other bespoke tools are
continually being enhanced. Cross-fertilisation
is occurring whereby methods developed in one type of data analysis software
are being adapted and incorporated in others. More concise functional toolkits for
specialist applications can be more readily assembled. Methods are also becoming more robust and
packages more interoperable.
The spatial data mining system for data of public interest
being built in the SPIN!-project
(SPIN) is an attempt to integrate geographical information system (GIS) and
data mining system (DMS) functionality in an open and extensible way. In essence SPIN is a GDM system and any such
system will have the following features:
1.
a database - for storing, querying and retrieving
data;
2.
a graphical user interface (GUI) - for display and for
interacting with and developing functionality; and,
3.
a suite of GDM
analysis and GeoVisualisation methods.
Until recently most DMS were backed by databases which had special
handlers for temporal references, but had no special handlers for spatial references
and so could only readily treat them in the same way as general attribute data. Until recently most GIS were backed by databases
which had special handlers for spatial references, but had no special handlers
for temporal references and so could only readily treat them in the same way as
general attribute data. Databases are standardising
whether they are proprietary databases for backing GIS and DMS or not. Most database software being developed are geared to handle complex data types (e.g. geographical
data) that may have both spatial and temporal references. Open source GDM systems developed in the near future
are likely to be based on postgreSQL and mySQL (URL 2, URL 3).
Display environments that facilitate interactive and dynamic
linking of data plots, graphs, tables, maps etc., and which enable animation are
likely to be standard in the next generation of GDM systems. Along with interactive display functionality,
GUIs are likely to have visual programming workspaces which can be used to
develop functions which in turn can be encapsulated and added to the widgets (buttons,
menus etc.) and made accessible to command line interfaces and for scripting.
Arguably a GeoVisualisation system
can be based solely on the first two features as described above. What is needed for a GDM system are the
additional classification, generalisation, cluster detection and inductive
analysis tools geared for analysing large volumes of geographical data.
This section ends by listing the aims of the SPIN!-project:
·
To develop an integrated, interactive,
internet-enabled spatial data mining system for data of public interest.
·
To improve knowledge discovery by providing an
enhanced capability to visualise data mining results in spatial, temporal and
attribute dimensions.
·
To develop new and integrated ways of revealing
complex patterns in spatio-temporally referenced data that were previously
undiscovered using existing methods.
·
To enhance decision making capabilities by developing
interactive GIS techniques, which provide an integrated exploratory and
statistical basis for investigating spatial patterns.
·
To deepen the understanding of spatio-temporal
patterns by visual simulation.
·
To publish and disseminate geographical data mining
services over the internet.
2. State-of-the-art ESDA at the beginning of the
SPIN!-project circa 2000
This section looks back to the state-of-the-art in
exploratory spatial data analysis (ESDA) as it was described by Turner (2000).
At the beginning of the year 2000 most GIS offered:
·
no inductive analysis methods;
·
only very limited statistical functionality;
·
only basic mathematical operations that would work on
tables of data; and,
·
no linked
dynamic display environments.
With the exception of network analysis, most GIS were limited
to fairly basic forms of spatial analysis.
However, it was noted that a new generation of GIS was emerging which
provided linked dynamic display environments (e.g. CDV (Dykes 1997, 2002);
Descartes (Andrienko
and Andrienko, 1999); GeoVista
Studio (Gahegan et
al., 2000). These systems are now
commonly referred to as GeoVisualisation software although
some may reasonably claim to be GDM systems.
GIS were becoming more modular with extensions being
developed that could be plugged in to enhance a core of functionality. The modular architectures came hand in hand
with the establishment of larger communities of users working together by
sharing scripts and indeed playing a part in the development of extensions
married to their specialist requirements.
The OGC had
begun delivering spatial interface specifications that were to encourage interoperability
between technologies that use geographical information. For developers of open source GIS and the
academic community in general, it lead to the establishment of many open source
projects and the provision of much freeware GIS. Details on much of this can be obtained via
the Free GIS web site (URL 4).
In 2000 there were numerous data mining systems (DMS) that had
developed largely unconnected to GIS research.
These offered:
·
most multivariate statistical methods including linear
and logistic regression;
·
various classification and modelling tools, such as
decision trees, rule induction methods, neural networks, memory or case based
reasoning and k-means and other so called clustering methods; and,
·
sequence discovery for
finding patterns in time series data.
It had been shown that many of these methods (neural networks
in particular) could be usefully applied to geographical data (Openshaw and Openshaw, 1997; Openshaw and Abrahart, 2000). However, GIS contained little or none of this
functionality.
Despite their powerful flexible and user friendly toolkits, even
the most advanced DMS had no mechanism for handling location or spatial
aggregation or for coping with spatial entities or spatial concepts such as a
region or circle or distance or map topology.
Though the need to support so-called non-standard data types including
multi-media data, images, audio and so forth which contain special patterns had
been recognised, (Goebel and Gruenwald, 1999).
There was (and arguably still is) a considerable amount of hype
about the capabilities and functionality of DMS. Vendors argue that no credible evaluation of
their software can be performed by evaluators that have not undertaken (often expensive)
training courses. Furthermore the
systems were (and many still are) tuned to work on very specific hardware with
specially configured operating systems underlying them.
Goebel and Gruenwald (1999) offered
an intelligible survey of data mining and knowledge discovery software tools
and noted the need for seamless integration with databases so that algorithms
are scalable and do not rely on having all data in fast access memory. It was observed that many systems relied on
querying an underlying database (whose data is stored in large but slow memory
banks) and holding the resulting data from the query in the fast access memory banks
(RAM, real memory or “database engine”) in order to run many of the data mining
methods. This often meant that the
methods worked poorly with large volumes of data and were not really as scaleable
as data mining often demands.
In 2000 most
ESDA tools were based on interactive graphics.
The Geographical Analysis Machine GAM/K was an exception and was already
planned to be integrated as a key component of SPIN. GAM/K is detailed by Openshaw
(1995). More recently, Openshaw (1998a) also detailed a Geographical Explanations
Machine (GEM) which like GAM/K is based on the concept of identifying clusters. It operates similarly to GAM/K in that data from
overlapping circular regions is analysed across a range of different scales and
the result is output as a map.
In addition to GAM/K and GEM there were a number of other
cluster detection methods. These have
been detailed in other SPIN!-project
reports and those with merit are included in SPIN. Back in 2000 it was also argued that SPIN
should contain a geographically weighted regression method which could be used
to examine the geographical variation in regression model parameter estimates (Fotheringham et al.
2002; Brunsdon et
al. 1996). A case was also made for
including local indicators of spatial association methods and a couple of other
spatial statistical methods.
Shortly before the SPIN!-project
began Openshaw (1999) outlined the key design issues
for developing GDM systems and highlighted the need to respect the special
features of geographical information:
·
it is spatially referenced
·
it is often temporally referenced
·
observations are not independent
·
data uncertainty and errors tend to be spatially
structured
·
spatial coverage is rarely global
·
non-stationarity is to be
expected
·
relationships are often geographically localised
·
non-linearity is the norm
·
data distributions are usually non-normal
·
there are often many variables but much redundancy
·
there are often many missing values
·
spatial, temporal and spatio-temporal
clustering is important
·
data can be aggregated and disaggregated in space and
time and in space-time
Turner (2000) noted that the geocyberspace
contained increasing amounts of 3D spatial geographical information and whilst
global coverage was rare some datasets did exist and were increasingly being used
for a wide range of environmental applications. It was argued that functionality to analyse
and visualise 3D spatial information would be of great utility for the SPIN!-project especially for a seismic
application in order to examine the relation between earthquake occurrences, crustal stresses and tidal patterns.
It is appreciated that there are many ways to represent
spatial, temporal and spatio-temporal data, and that
the most appropriate representation depends on what information is required
and/or what phenomenon/process is under study.
In many cases topographic maps, time maps and cartograms provide more
useful visualisations than Euclidean maps (Orford et al., 1998). The ability to develop such representations
in SPIN was discussed but much of the functionality has not been integrated.
In addition to being able to work with 3D spatial data with
temporal references and alternatives to Euclidean representations, it is
necessary to develop methods that:
·
can handle no-data by attempting to minimise assumptions
of what the value of data would be;
·
are scalable and can cope with very large datasets;
and,
·
are robust and
precise so as to handle the potentially large ranges of numbers and detail
involved.
3. The state-of-the-art in GDM
There are a number of summaries regarding the challenges and
achievements of GDM research, (see for example Buttenfield
et. al., 2001; Yuan et al., 2001; Openshaw and Abrahart,
2000; Openshaw, 1999). These agree that the development of data
mining and knowledge discovery tools for geographical use must be fundamentally
based on using and coping with the special features of geographical information
as listed in the previous section.
SPIN
integrates GIS and spatial data mining functionality in an open and extensible
way using a client server architecture based on Enterprise JavaBeans, (May and Savinov (2002).
Enterprise JavaBeans (EJB) offers a specification and guidelines for
using Java Remote Method Invocation (RMI) and JDBC (URL ref). Java RMI enables elements of Java programs to
take some actions on other Java elements on remote machines (URL ref). JDBC is the set of interfaces for connecting
to using database software (URL ref).
The SPIN architecture was adopted at an
early stage of the SPIN!-project and
has also been adopted by other similar systems (Bertolotto
et al., 2001; Takatsuka
and Gahegan, 2002).
Since the start of the SPIN!-project
the EJB specification has been revised and among other enhancements version 2.1
has added support for web services. EJB
offers a state-of-the-art architecture for developing a GDM system.
SPIN offers state-of-the-art GeoVisualisation
functionality derived from Descartes and CommonGIS
(Voss et al., 2002). GeoVista Studio
developed mainly by a research team based in the
GeoVista Studio has a useful and convenient
deployment mechanism via Java Web start (Sun, 2002b) and can readily be customised
and packaged up as an applet that can be run in most web browsers. It is moving towards being open source and is
moving to base itself on GT2. GT2 is an open source,
Java GIS toolkit for developing standards compliant solutions. It aims to support Open GIS and other relevant
standards as they are developed.
4. Differences
in clustering terminology
Research into
developing and testing methods of detecting and measuring clustering in space
and over time is being undertaken in epidemiology, criminology, geography, data
mining and science in general.
Unfortunately, there are different interpretations of what a cluster is and confusion about the terms
clustering, spatial cluster and spatial
clustering.
In the field of data mining, the term cluster is generally used to mean things which share similar
characteristics or attributes like classes, sets or groups; and clustering is a term reserved for the
process of classifying, sub-setting or grouping a set of things. It differs from classification in that the
number of clusters is not pre-specified.
A spatial cluster in data
mining is a term which has been used for a collection of spatial objects with
similar locations in space irrespective of their other attributes or
characteristics. The term spatial clustering has thus been reserved
for the process of classifying or grouping spatial objects into spatial clusters without apriori specifying how many spatial clusters there are (Murray, 1997; Estivill-Castro
and Houle, 1999; Han et. al., 2001).
Although the above offers a reasonable interpretation, it is
not until comparatively recently that researchers in the data mining field have
begun to consider clustering and spatial clustering in geographical data. This has lead to a clash in terminology, thus
it is important to note that:
·
spatial clustering in data
mining pays no attention to the attributes associated with spatial location;
·
geographic data is of
a higher dimensionality and is not only a set of locations, but comprises a set
of measured attributes which may include temporal references.
The
simplest way of defining a cluster
(as used in epidemiology and much geography) is as a localised excess incidence rate that is
unusual in that there is more of some variable than might be expected. For example, a cluster could be:
·
a
local excess disease rate; an unusual crime rate (hot spots);
·
an
unusual unemployment rate or road traffic accident rate (black spot);
·
a
region of unusually high positive residuals from a model;
·
an unusual concentration of plant species or earthquake epicentres, etc.
Virtually any variable that has a spatial
distribution will contain some degree of pattern or concentration. Clusters are where and when these
concentrations are extreme. There are
essentially two types of clusters.
Clusters of excess, and clusters of deficit. The former refers to unusually high
concentrations of some rate and the later refers to unusually low
concentrations of some rate.
It is very important to appreciate
the difference between the interpretations of these terms. Likewise it is important that GDM methods
deal with the special nature of geographical data.
5. The
future state-of-the-art in GDM
There is demand for GDM tools:
·
for analysing geographical data all the time and as
they are produced;
·
that analyse patterns in the spatial locations and
attributes of geographical objects over time;
·
that search for spatial correlation and autocorrelation
·
that work across a range of spatial scales
6. Conclusions
The spatial data mining system for data of public interest
being developed in the SPIN!-project
(SPIN) is an attempt to integrate state-of-the-art GIS and data mining
functionality in an open and extensible way.
The result is effectively a prototype GDM system which is arguably the
most complete example of its kind. Its
range of functionality is very broad and it has many advanced capabilities,
however much can be done to improve it.
SPIN has not been subject to much user testing and has not
been scientifically evaluated. Is it
really user-friendly? If there are
patterns hidden in spatial data or indeed relationships encrypted in spatial
information then can SPIN help users to find them by at least pointing them in
the right direction?
SPIN is not really open because it is not open source. If the SPIN!-project
comes to an end then what will happen to it? Is SPIN only to be available to the SPIN!-project consortium for research
purposes? Will it not evolve into an
open source project and become openly distributed?
Currently some algorithms in SPIN are dependent on using
Oracle database software. Would it not be
better if it could optionally use one of the open source databases as well?
In general there is a need for new analysis methods and more example
applications that show how GDM tools can be used by presenting novel results
that are meaningful in a clear and understandable way.
References
and Bibliography
Andrienko G., Andrienko
N. (1999) Interactive Maps for Visual Data Exploration.
In the International Journal of Geographical Information Science 13 (4) 355-374.
Adhikary J. (1996) Knowledge
Discovery in Spatial Databases: Progress and Challenges. Paper presented at an Association for
Computing Machinery Workshop on Research Issues on Data Mining and Knowledge
Discovery,
Alexander F., Boyle P. (2000) Do cancers
cluster? In Eliott
P.,
Alexander F., Boyle P. (1996) Methods for
Investigating Localised Clustering of Disease. International
Agency for Research on Cancer scientific publication 135.
Andrienko N., Andrienko
G. (2001) Intelligent Support for Geographic Data Analysis and Decision Making in the Web.
Geographical Information and Decision Analysis 5 (2) 115-128.
Bertolotto M., McGeown
L., Carswell J., McMahon J (2001) e-SpatialTM
Technology for Spatial Analysis and Decision Making in Web-Based Land
Information Management Systems. In
Journal of Geographic Information and Decision Analysis 5 (2) 95-114.
Bivand R. (2002) Implementing
spatial data analysis software tools in R*.
Paper presented at a Center for Spatially
Integrated Social Science Specialist Meeting on Spatial Data Analysis Software
Tools,
Brunsdon C., Fotheringham
S., Charlton M. (1996) Geographically Weighted Regression: A method for
exploring non-stationarity. In Geographical Analysis,
28 (4), 281-298.
Brunsdon C., MacGill
J., Openshaw S., Turner A., Turton
I., (1999) Testing space-time and more complex hyperspace geographical analysis
tools. Paper presented at the 7th
GISRUK conference,
Buttenfield B., Gahegan M., Miller H., Yuan M. (2001) Geospatial Data
Mining and Knowledge Discovery. A UCGIS
White Paper on Emergent Research Themes submitted to UCGIS Research Committee.
Câmara G., Neves M., Monteiro A (2002)
SPRING and TerraLib: Integrating Spatial Analysis and
GIS. Paper
presented at Center for Spatially Integrated Social
Science Specialist Meeting on Spatial Data Analysis Software Tools,
Carr D., Chen J.,
Diggle P.
(2000) Overview of statistical methods for disease mapping and its relationship
to cluster detection. In Eliott P.,
Dykes J. (2002)
Developing Tools for GeoVisualization Research. Paper presented at Center for Spatially Integrated Social Science Specialist
Meeting on Spatial Data Analysis Software Tools,
Dykes J. (1997) Exploring
spatial data representations with dynamic graphics. In Computers &
Geosciences 23 (4), 347-370.
Edsall R. (1999)
Tools for the Exploration and Multivariate Classification of Large Geographical
Databases. Paper presented at the 95th Annual Meeting of the
Association of American Geographers,
Edsall R., Roedler
A. (2002) An Enhanced GIS Environment for Multivariate
Exploration: a Linked Parallel Coordinate Plot Applied to Urban Greenway Use
Survey Data. Paper presented at the Center for
Spatially Integrated Social Science Specialist Meeting on Spatial Data Analysis
Software Tools,
Estivill-Castro
V. (2002) Why so many clustering algorithms – A position paper. In SIGKDD explorations newsletter of the ACM Special Interest Group on
Knowledge Discovery in Data and Data Mining 4 (1) 65-75.
Estivill-Castro
V., Houle M. (1999) Robust Distance-Based Clustering
with Applications to Spatial Data Mining. Algorithmica 30 (2) 216-242.
Estivill-Castro
V., Lee I. (2001) Data Mining Techniques for Autonomous Exploration of Large
Volumes of Geo-referenced Crime Data. Paper presented at the 6th
International Conference on
Fotheringham S., Brunsdon
C, Charlton M. (2002) Geographically Weighted Regression (Wiley,
Fotheringham S., Brunsdon
C, Charlton M. (2000) Quantitative Geography: Perspectives on Spatial Analysis.
(Sage,
Fotheringham S.,
Charlton M. (1994) GIS and exploratory data analysis: An overview of some basic
research issues. In Geographical Systems, 1 (4), 315-327.
Fotheringham S., Rogerson
P. (1994) Spatial Analysis and GIS. (
Fulcher C., Barnett Y., Barnett
C. (2002) Spatial Analysis Software for Community Decision Support. Paper presented at a Center
for Spatially Integrated Social Science Specialist Meeting on Spatial Data
Analysis Software Tools,
Gahegan M.
(2001) Data mining and knowledge discovery in the geographical domain. White Paper: National Academies Computer
Science and Telecommunications Board. (Intersection of Geospatial Information
and IT content and Knowledge distillation) http://www7.nationalacademies.org/cstb/wp_geo_gahegan.pdf
Gahegan M. (2000) On the application of inductive machine learning tools to
geographical analysis. In Geographical Analysis 32 (2), 113-139.
Gahegan M., Miller
H., Yuan M. (2001) Geospatial Data Mining and Knowledge Discovery. http://www.ucgis.org/emerging/gkd.pdf
(Accessed in December 2002)
Gahegan M., Takatsuka M., Wheeler M., Hardisty
F. (2000a) GeoVISTA Studio: a geocomputational
workbench. Paper presented at Geocomputation 2000,
Gahegan M., Wachowicz M., Harrower M., Rhyne
T-M.
(2000b) The Integration of Geographic Visualization
with Knowledge Discovery in Databases and Geocomputation. In Cartography and Geographical Information
Systems (special issue on the International Cartographic Association research
agenda)
Goebel M., Gruenwald
L. (1999) A survey of Data Mining and Knowledge Discovery Software Tools. In SIGKDD Explorations 1 (1) 20-33.
Han J., Kamber M., Tung A. K. H. (2001) Spatial Clustering Methods in Data Mining: A Survey. In H. Miller and J. Han
(eds.) Geographic Data Mining and Knowledge Discovery (
Hewitson B., Crane R. (eds.), 1994, Neural Nets:
Applications in Geography (Kluwer,
Indulska M., Orlowska
E. (2002) Gravity Based Spatial Clustering.
Paper presented at the 10th Association for Computing
Machinery International Symposium on Advances in Geographical Information
Systems,
Krivoruchko K. (2002) Bridging the
Gap Between GIS and Solid Spatial Statistics. Paper presented at a Center
for Spatially Integrated Social Science Specialist Meeting on Spatial Data
Analysis Software Tools,
Koperski K., Adhikary
J., Han J. (1996) Spatial Data Mining: Progress and Challenges Survey
paper. Paper presented at an Association
for Computing Machinery Workshop on Research Issues on Data Mining and
Knowledge Discovery,
Koperski K., Han J., Adhikary J. (1998) Mining Knowledge in Geographical Data. ftp://ftp.fas.sfu.ca/pub/cs/han/pdf/geo_survey98.pdf
(Accessed December 2002).
Lawson A. (1999) A Review
of Cluster Detection Methods. In Lawson
A., Biggeri A., Böhning D.,
Lesaffre E., Viel J-F., Bertollini R. (eds.) Disease Mapping and Risk Assessment
for Public Health. (Wiley,
Lazarevic A., Fiez T., Obradovic Z. (2000) A
Software System for Spatial Data Analysis and Modeling. Paper presented at the
33rd International Conference on Systems Sciences,
Levine N. (1998) Hot Spot
Analysis Using both the SYSTAT K-Means Routine and a Risk Assessment. Paper presented at the
MacEachren A., Hardisty F., Gahegan M., Wheeler
M., Dai X., Guo D., Takatsuka
M. (2001) Supporting visual integration and analysis of geospatially-referenced
data through web-deployable, cross-platform tools. Paper presented at 20th International
Cartographic Conference,
Masters R., Edsall
R. (2000) Interaction Tools to Support Knowledge Discovery: A Case Study Using
Data Explorer and Tcl/Tk. Paper presented at the Visualization Development
Environments workshop,
May M., Savinov
A. (2002) An integrated platform for spatial data
mining and interactive visual analysis.
Paper presented at the 3rd International Conference on Data
Mining Methods and Databases for Engineering, Finance and Other Fields,
Miller H., Wentz E. (2002) Geographic
Information Systems and Spatial Analysis: Enhancing Analytical Capabilities by
Expanding Geographic Representations. http://www.geog.utah.edu/~hmiller/research.html
(Accessed in December 2002).
Miller H., Han J. (2000)
Discovering Geographic Knowledge in Data Rich Environments: A Report on a
Specialist Meeting. In SIGKDD Explorations 1 (2), 105-107.
Openshaw S. (1999) Geographical
data mining: key design issues. Paper presented at the 4th
International Conference on
Openshaw S.
(1998) Building automated Geographical Analysis and Exploration Machines. In Geocomputation:
A primer, Longley P., Brooks S., Mcdonnell B. (eds.)
(Macmillan Wiley, Chichester) 95-115.
Openshaw S. (1995) Developing
automated and smart spatial pattern exploration tools for geographical
information systems applications. The Statistician 44 (1), 3-16.
Openshaw S., Abrahart
R. (eds.), 2000, GeoComputation (Taylor &
Francis,
Openshaw S., Charlton M., Wymer C. and Craft A.W. (1987). A mark I geographical analysis machine for
the automated analysis of point data sets.
International Journal of Geographical Information Systems, 1, 335-358.
Openshaw S., Openshaw
C. (1997) Artificial Intelligence in geography (Wiley,
(…details incomplete) Openshaw
S.,
Openshaw S.,
Openshaw S.,
Orford S., Dorling D., Harris R
(1998) Review of Visualization in the Social Sciences: A State of the Art
Survey and Report. Report for the
Advisory Group on Computer Graphics.
Pacheco B. (2001)
Assessing the Applicability and Usability of GeoVISTA
Studio for Health Geographics. Published in: The
Paddenburg A., Wachowicz
M. (2001) The Effect of Spatial Generalisation On
Filtering Noise For Spatio-Temporal Analyses. Paper presented at the 6th
International Conference on
Roddick J., Spiliopoulou M. (2002) A survey of Temporal Knowledge
Discovery Paradigms and Methods. IEEE Transactions on Knowledge and Data Engineering 14 (4) 750-767.
Roddick J., Hornsby K., Spiliopoulou M. (2001) Temporal, Spatial and Spatio-Temporal Data Mining Research and Knowledge
Discovery Research Bibliography. http://kdm.first.flinders.edu.au/IDM/STDMBib.html
(Accessed in December 2002)
Roddick J., Lees B. (2001) Paradigms
for Spatial and Spatio-Temporal Data Mining. In Miller H., Han J. (eds.)
Geographic Data Mining and Knowledge Discovery (
Sun (2002a)
Sun (2002b) JavaTM
Web Start http://java.sun.com/products/javawebstart/ (Accessed in December
2002).
Shekhar S., Vatsavai
R. (2002) Spatial Data Mining Research by the Spatial Database Research Group,
Shekhar S., Huang Y., Wu W., Lu
C., Chawla S. (2001) What’s Spatial About Spatial
Data Mining: Three Case Studies. In
Kumar V., Grossman R., Kamath C., Namburu
R. (eds.) Data Mining for Scientific and Engineering Applications (Kluwer).
Symanzik J., Swayne D., Lang D., Cook d. (2002) Software Integration for
Multivariate Exploratory Spatial Data Analysis. Paper presented at a Center
for Spatially Integrated Social Science Specialist Meeting on Spatial Data
Analysis Software Tools,
Takatsuka M. (2002)
An Open Component-Oriented Visual Programming Environment for Integrating
Geospatial Data Analysis and Visualization Tools. Paper presented at the Center
for Spatially Integrated Social Science Specialist Meeting on Spatial Data
Analysis Software Tools,
Takatsuka M., Gahegan
M. (2002) GeoVista Studio: A Codeless Visual
Programming Environment for Geoscientific Data
Analysis and Visualization. To appear in Computers & Geosciences 28 (10) 1131-1144.
Tango T. (1999) Comparison of General
Tests for Spatial Clustering. In Lawson
A., Biggeri A., Böhning D.,
Lesaffre E., Viel J-F., Bertollini R. (eds.) Disease Mapping and Risk Assessment
for Public Health. (Wiley,
Tung A., Hou J., Han J. (2001) Spatial Clustering in the Presence of
Obstacles. Paper presented at the 17th
International Conference on Data Engineering,
Turner A. (2000) State-of-the-art
Exploratory Spatial Data Analysis. SPIN!-project
working paper.
Turner A.,
Voss H., Andrienko N., Andrienko G., Gatalsky P. (2001) Web-based
Spatio-temporal Presentation and Analysis of Thematic
Maps. In the Cities and Regions
Journal of the Standing Committee on Regional and Urban Statistics and
Research.
Voss H., Andrienko N., Andrienko G. (2002) Exploratory Data Analysis and Decision
Making with Descartes and CommonGIS. Paper presented at a
Center for Spatially Integrated Social Science
Specialist Meeting on Spatial Data Analysis Software Tools,
Wakefield J., Kelsall
J., Morris S (2000) Clustering, cluster detection and spatial variation in
risk. In Eliott
P.,
Wu Y-H., Miller H. (Forthcoming)
Computational Tools for Measuring Space-Time Accessibility within
Transportation Networks with Dynamic Flow. In Journal of Transportation and Statistics special issue on
accessibility 4 (2/3) 1-14.
The
following literature was not acquired and read although it was wanted:
Andrienko N., Andrienko
G., Savinov A., Voss H., Wettschereck
D. (2001) Exploratory
Analysis of Spatial Data Using Interactive Maps and Data Mining. In Cartography and
Geographic Information Science 28
(3) 151-165.
Edsall R. (1999) Development of
Interactive Tools for the exploration of Large Geographic Databases. Paper
presented at the 19th International Cartographic Conference,
MacEachren A., Wachowicz
M., Edsall R., Haug D.,
Masters R. (1999) Constructing
knowledge from multivariate spatiotemporal data: Integrating Geographic
Visualization with Knowledge Discovery in Database Methods. In
a special issue of the International Journal of Geographical Information
Science.
URLs
1.
Computer history
http://home.earthlink.net/~mrob/pub/computer-history.html
2.
Free GIS
http://www.freegis.org/
3.
Geographically Weighted Regression (GWR)
http://www.ncl.ac.uk/geography/GWR
4.
GeoTools
http://geotools.sourceforge.net
5.
Open GIS Consortium
http://opengis.org/
6.
Oracle spatial
http://otn.oracle.com/products/spatial/
7. Postgis
http://postgis.refractions.net/
8. PostgreSQL
http://www.postgresql.org/
9. MySQL
http://www.mysql.com