The Semantics of GeoComputation

Judy Ehlen and Douglas Caldwell
U.S. Army Engineer Research and Development Center, Topographic Engineering Center, Alexandria, VA 22315-3864
E-mail: jehlen@tec.army.mil | caldwell@tec.army.mil

Stephen Harding
Center for Intelligent Information Retrieval, Department of Computer Science, University of Massachusetts, Amherst, MA 01003-4610
E-mail: harding@ciirsrv.cs.umass.edu

Abstract

What is GeoComputation? This question has challenged researchers since the concept was presented to the geographic community at the first Conference on GeoComputation at the University of Leeds in 1996. Leading academicians, including Helen Couclelis, Paul Longley, Mark Gahegan, and Stan Openshaw, have all defined, discussed, and/or described the nature of GeoComputation. These are top-down definitions reflecting each author's perspective, based on their personal backgrounds in geography, on GeoComputation in the overall context of geographic and computational research.

However, we believe that the cumulative body of research as expressed in the abstracts of the papers, posters, and keynote addresses from the five GeoComputation conferences may best characterize GeoComputation, not the work or definition of any one individual. Consequently, this paper does not attempt to define GeoComputation per se, but explores the scope or nature of GeoComputation by examining the body of research presented at the five conferences between 1996 and 2000 at the University of Leeds, UK, the University of Otago, NZ, the University of Bristol, UK, and Mary Washington College, USA, as well as most abstracts submitted for the conference at the University of Greenwich, UK. In other words, this is a bottom-up approach: we look at GeoComputation in terms of what GeoComputation researchers say they do.

Text analysis software developed by the Center for Intelligent Information Retrieval at the University of Massachusetts was used for the analysis. Word and phrase frequencies in the abstracts for each conference were analyzed separately and then compared. The results provide insight into GeoComputation by describing the range of research topics, core technologies, and concepts encompassed by GeoComputation. General trends and patterns are identified and defined in a semi-quantitative manner.

1. What is GeoComputation?

Couclelis (1998) adopted the working definition that GeoComputation, is ". . . the eclectic application of computational methods and techniques 'to portray spatial properties, to explain geographical phenomena, and to solve geographical problems.'" (p. 17). As such, she believes that GeoComputation makes little contribution to the scientific community. The place of GeoComputation in geography, its epistomological legitimacy, and its societal, historical, and institutional antecedents and implications are addressed to determine whether GeoComputation has anything additional to offer. Without such a context, GeoComputation is unlikely to receive the broad support - financial and institutional - required to continue or advance. GeoComputation must meet the theoretical challenge to ". . . formulate a model based on the computational notion of machine that justifies the 'geo' prefix." (p.25). At present, GeoComputation is merely a tool-based approach using tools derived from artificial intelligence research. Couclelis sees a flicker of hope, but states that we are not there yet. She lists five major challenges for GeoComputation that must be overcome for GeoComputation to achieve intellectual and practical acceptance as a new perspective or paradigm.

In the Epilogue of Geocomputation, A Primer, the same book in which Couclelis' paper appears, Macmillan (1998) essentially takes issue with Couclelis' position. He believes that GeoComputation includes the latest forms of computational geography and that it is not an incremental development. He accepts that sound theory is needed, but believes that it has to some extent already been provided by Openshaw, at least as a form of inductivism. Macmillan's "definition" of GeoComputation is much broader than that suggested by Couclelis; he believes GeoComputation ". . . is concerned with the science of geography in a computationally sophisticated environment." (p. 258).

Gahegan (1999), like Couclelis, sees the concern of GeoComputation as ". . . to enrich geography with a toolbox of methods to model and analyze a range of highly complex, often non-deterministic problems." (p 204). But he views GeoComputation as an enabling technology, one needed to fill the ". . . gap in knowledge between the abstract functioning of these tools . . . and their successful deployment to the complex applications and data sets that are commonplace in geography." (p. 206). He also lists a series of challenges that GeoComputation must overcome, but these problems involve the application of the sophisticated tools available to GeoComputation researchers, as well as the complex problems associated with handling large, unwieldy data sets. Gahegan's is a practical approach to GeoComputation, but one with promise and vision, different from Couclelis's philosophical, possibly pessimistic, perspective.

Now to the work of Stan Openshaw, who, if anyone can be so-called, is the father of GeoComputation. In the Preface to GeoComputation, Openshaw and Abrahart (2000) define GeoComputation as a fun, new word. They see GeoComputation as a follow-on revolution to GIS; once the GIS databases are set up and expanded, GeoComputation takes over. They state that "GeoComputation is about using the various different types of geo-data and about developing relevant geo-tools within the overall context of a 'scientific' approach." (p. ix); it is about solving all types of problems, converting computer "toys" into useful tools that can be usefully applied. And it is about using existing tools to accomplish this and finding new uses for existing tools. They also link GeoComputation to high performance computing. As both Couclelis and Gahegan did, Openshaw and Abrahart list a series of challenges for GeoComputation, but challenges inherent to GeoComputation, not challenges that GeoComputation must overcome to survive.

Openshaw (2000) states that GeoComputation ". . . can be regarded . . . as the application of a computational science paradigm to study a wide range of problems in geographical and earth systems . . . contexts." (p. 3). He identifies three aspects that make GeoComputation special. The first is emphasis on "geo" subjects, i.e., GeoComputation is concerned with geographical or spatial information. Second, the intensity of the computation required is distinctive. It allows new or better solutions to be found for existing problems, and also lets us solve problems heretofore insoluble. Finally, GeoComputation requires a unique mind set, because it is based on ". . . substituting vast amounts of computation as a substitute for missing knowledge or theory and even to augment intelligence." (p. 5). Openshaw clearly sees GeoComputation as dependent upon high performance computing, as suggested above. He sees the challenge for GeoComputation as developing the new ideas, methods, and paradigms needed to use increasing computer speeds to do useful science in a variety of geo contexts.

Openshaw (2000) also looks at definitions of GeoComputation presented by Couclelis (1998), Longley (1998), and Macmillan (1998). He disagrees with Couclelis: GeoComputation is not just using computational techniques to solve spatial problems, it is a major paradigm shift affecting how the computing is applied. Openshaw sees GeoComputation as a much bigger thing, i.e., ". . . the use of computation as a front-line problem-solving paradigm which offers a new perspective and a new paradigm for applying science in a geographical context." (p. 8). Openshaw basically agrees with Macmillan's description of the nature of GeoComputation, but he would probably disagree with the scope: he views GeoComputation as something much larger and broader. GeoComputation relies on the potential of applying of high performance computing to solve currently unsolvable or even unknown problems. It awaits the involvement of appropriately innovative, forward-thinking 'geocomputationalists' to achieve that potential.

Openshaw (2000) also notes that not all researchers agree with his definition of GeoComputation. He believes that this may be because other definitions, such as that of Couclelis, focused on the contents of presentations made at the previous GeoComputation conferences. He believes definition should be developed in a more abstract, top-down manner.

These top-down definitions rely on each author's perspective, based on their personal backgrounds in geography, to place GeoComputation in the overall context of geographic and computational research. Longley (1998), however, states that at this point we must assume that GeoComputation is ". . . what its researchers and practitioners do, nothing more, nothing less . . ." (p. 9). Regardless of Openshaw's (2000) comments about this type of definition, we agree with Longley's statement and therefore use the cumulative body of research as expressed in the abstracts of the papers, posters, and keynote addresses from the GeoComputation conferences to characterize GeoComputation. We do not attempt to define GeoComputation, but explore its scope or nature by examining the body of research presented at the five conferences held at the University of Leeds, UK, in 1996; at the University of Otago, Dunedin, NZ, in 1997; at the University of Bristol, UK, in 1998; at Mary Washington College, Fredericksburg, USA, in 1999; and at the University of Greenwich, Chatham, UK, in 2000. We attempt to determine "what's in" and "what's out," as well as evaluating more subtle changes in research emphasis over time. In other words, this is a bottom-up approach: we look at GeoComputation in terms of what GeoComputation researchers say they do (in the context of acceptable material as determined by the individual conference organizers) and use this information to elucidate future trends. What makes this approach different is that we analyze the abstracts of the five conferences in a semi-quantitative way.

2. Procedures

There are 98 abstracts from the 1996 Leeds conference, 45 abstracts from the 1997 Dunedin conference, 87 abstracts from the 1998 Bristol conference, 83 abstracts from the 1999 Fredericksburg conference, and 64 from the 2000 Chatham conference (of which only 58 were included in these analyses because of late submissions). These abstracts represent contributions from more than 150 authors and co-authors from 18 countries at Leeds; 100 authors and co-authors from eight countries at Dunedin; more than 150 authors and co-authors from 11 countries at Bristol; more than 160 authors and co-authors from 12 countries at Fredericksburg; and, finally, more than 160 authors and co-authors from 21 countries at Chatham. GeoComputation is truly international in scope.

The abstracts from the Leeds, Bristol, and Fredericksburg conferences were transferred from the GeoComp 99 CD-ROM proceedings (Diaz, et al., 1999) to a word processor. Abstracts for the Bristol keynote lectures and those from the Dunedin conference were entered manually from Longley, et al. (1998) and Pascoe (1997), respectively. Those from the Chatham conference were downloaded from a web site set up by the conference organizers for review by the International Steering Committee for GeoComputation or received directly from the conference organizers by email. One file was made for each conference. These files were then edited to remove all parentheses, numbers, equations, special characters, bolding, underlining, and italics. All references cited in the text or at the end of an abstract were removed, as were place names and the names of individuals (e.g., "Horton" as in "Horton's method") and institutions. All acronyms and abbreviations were written out in full, except for "GIS" and "www." Finally, the English was standardized using the UK English option in the word processor spell checker. The files were saved in ASCII format and sent to the Center for Intelligent Information Retrieval in the Department of Computer Science at the University of Massachusetts for word and phrase analysis.

Two files were generated for each conference, one containing words and the other, phrases. Each file consists of the list of terms or phrases sorted in decreasing order according to the number of times that word or phrase is used in the abstracts and the number of abstracts in which each word or phrase occurs, such as shown in Table 1. The phrase "spatial analysis" thus occurs nine times in eight abstracts and the phrase "data model(s)," six times in four abstracts.

Table 1. Example of Phrase Analysis Software Output

Phrase			Phrase Frequency	Abstract Frequency
spatial	analysis	-	9	8
digital	elevation	model(s)	7	4
spatial	variation(s)	-	7	2
functional	pattern	-	7	1
data	model(s)	-	6	4
geographic	space	-	6	2
visibility	index(ices)	-	6	2

WordNet, a dictionary-based lookup table, was used to identify parts of speech for finding noun phrase candidates (Feng and Croft, 2000). The sequence number was set at a maximum of five words in a row. Sentence boundaries were respected as was sequence ordering, so phrase candidates that span two or more sentences were not included.

A trained Markov model was then applied to extract the noun phrases from the phrase candidates. Delimiting rules included stop words (commonly occurring words such as "the," "a," or "and"), numbers, punctuation (e.g., hyphens, quotation marks, periods), verb patterns, and formatting delimiters, such as table fields and section heads.

A Markov model is a statistical process in which future events are determined by present state rather than by the method path in which the present state arose. The phrase detection model uses the set of states for each word position in the phrase. A single term has a single set of states, while a five-word phrase has five such state sets. There are probabilities that each word in the vocabulary (all unique words in the collection) can occur in a given phrase word state set. Non-noun words are removed from the end of phrase candidates, and the phrases are clustered based on occurrence frequency above a threshold value. Candidates not passing the threshold are discarded, leaving the proposed phrases.

Each file was then run through a simple C program that adds the frequencies of occurrence for each repeat entry. An intermediate file was produced in which identical phrase entries were summed for single abstracts; this was not done for words. The results of the summation process were then sorted according to word or phrase frequency and abstract frequency for each file. These final summation files were used for the analysis of word and phrase frequencies for each conference: "abstract frequency" in Table 1 thus represents the total number of abstracts that contained the word or phrase for an individual conference. Word or phrase frequency refers to the total number of times the word or phrase is used. The summations ignore case, so "GIS" and "gis," for example, were considered equal. Plural forms were then combined manually into one form for the phrases. Thus, "neural network" and "neural networks" have combined statistics and are entered as "neural networks(s). A test set of data was checked for accuracy, and spot checks indicated the results were satisfactory.

The resulting files, two for each conference, were reviewed, and all words and phrases that occurred in only one abstract were deleted to reduce the files to manageable size. Words and phrases meaningless in a GeoComputation context, e.g., words such as "versa" as in "vice versa", "priori" as in "a priori;" and phrases such as "paper describes," "works well," and "wide variety" were also deleted. Table 2 shows the changes in file size as these steps were accomplished. Finally, the percent frequency for each word or phrase was determined by dividing the number of abstracts in which that word or phrase occurred by the number of abstracts at that particular conference. This not only permitted the words and phrases from one set of conference abstracts to be evaluated in a semi-quantitative manner, but also allowed comparison between conferences. Because plural forms were merged with singular forms for the phrases, some inflation occurs in the phrase percent frequencies.

Table 2. Reduction of Data File Content

Conference	Number of Words		Number of Phrases
Conference	Original	Final	Original	Final
Leeds	5416	746	5478	332
Dunedin	1647	174	802	20
Bristol	3186	486	2293	94
Fredericksburg	3731	594	3085	149
Chatham	3002	387	1885	61

3. Word Analysis

The words used for analysis are those with the highest percent frequencies from each set of conference abstracts that we considered as having a "GeoComputation context," or in other words, words, such as "method," "approach," and "problem" which describe how we do what we do are not included. We address only those words that refer to what we deal with, mainly nouns, ignoring most verbs, adjectives, adverbs, and pronouns. The same words were deleted from each conference file, in addition to those that occurred in only one abstract. Certain words which occur in the list of those with highest percent frequencies, such as "object" and "sets," were also not addressed because they are highly likely to belong to phrases , e.g., "object oriented" and "data sets," and are therefore not particularly meaningful in the context of a single word.

As noted above, in order to compare words among the five conferences, the number of abstracts in which a word occurred for each conference was normalized according to the number of abstracts presented at that conference. Table 3 lists the percent frequencies for the most frequently used words for each conference, and Table 4 shows the percent frequencies for the most frequently used words common to all five conferences. The order of importance was determined by sorting the list using mean percent frequency for each common word. We attempted to restrict ourselves to the 25 most frequently used words at each conference, but this was not possible because word 25 was typically in the middle of a list of words with the same frequency. The source data used for the word analysis is available for the reader's reference.

Table 3: Most Frequently Used Words at Each of the Five Conferences

1996		1997		1998		1999		2000
Word	Percent frequency	Word	Percent frequency	Word	Percent frequency	Word	Percent frequency	Word	Percent frequency
data	81.63	GIS	64.44	data	67.82	data	81.93	data	82.76
spatial	73.47	data	62.22	spatial	59.77	spatial	63.86	spatial	68.97
GIS	64.29	spatial	51.11	model	48.28	model	53.01	analysis	51.72
model	63.27	information	48.89	analysis	40.23	Information	49.4	information	46.55
analysis	56.12	system	37.78	modelling	37.93	analysis	46.99	GIS	41.38
models	55.1	analysis	35.56	models	36.78	time	43.37	models	37.93
information	53.06	models	35.56	GIS	35.63	GIS	40.96	model	31.03
time	50	geographic	33.33	time	29.89	models	37.35	system	31.03
modelling	48.98	systems	33.33	scale	27.59	area	36.14	process	29.31
systems	46.94	time	33.33	area	25.29	modelling	36.14	time	29.31
number	43.88	environmental	31.11	information	25.29	process	33.73	area	27.59
areas	43.88	model	31.11	system	25.29	system	30.12	areas	25.86
system	42.86	classification	28.89	systems	24.14	number	28.92	map	24.14
process	38.78	tool	28.89	number	22.99	space	28.92	scale	22.41
area	35.71	field	26.67	process	21.84	systems	28.92	modelling	22.41
geographic	31.63	area	22.22	geographic	20.69	local	27.71	structure	22.41
processes	31.63	boundaries	22.22	processes	19.54	areas	25.3	number	20.69
range	31.63	computer	22.22	resolution	18.39	geographic	24.1	statistical	20.69
point	30.61	modelling	22.22	areas	17.24	resolution	24.1	region	20.69
statistical	30.61	tools	22.22	flow	17.24	processes	22.89	range	20.69
structure	30.61	database	20.00	space	17.24	surface	22.89	points	18.97
software	29.59	land	20.00	structure	17.24	digital	21.69	local	18.97
scale	28.57	local	20.00	environment	16.09	distribution	21.69	computer	18.97
computer	27.55	spatially	20.00	classification	14.94	features	21.69	objects	17.24
tools	27.55	areas	17.78	distribution	14.94	field	21.69	systems	17.24
interaction	27.55	dimensions	17.78	location	14.94	tools	21.69	classification	17.24
-	-	landscape	17.78	numerical	14.94	values	21.69	software	17.24
-	-	map	17.78	-	-	-	-	space	17.24
-	-	maps	17.78	-	-	-	-	tools	17.24
-	-	patterns	17.78	-	-	-	-	computational	17.24
-	-	points	17.78	-	-	-	-	environment	17.24
-	-	positioning	17.78	-	-	-	-	series	17.24
-	-	scale	17.78	-	-	-	-	-	-

Table 4. Percent Frequencies for the 25 Most Frequently Used Common Words

-	1996	1997	1998	1999	2000
Word	% frequency	% frequency	% frequency	% frequency	% frequency
data	81.63	62.22	67.82	81.93	82.76
spatial	73.47	51.11	59.77	63.86	68.97
GIS	64.29	64.44	35.63	40.96	41.38
analysis	56.12	35.56	40.23	46.99	51.72
model	63.27	31.11	48.28	53.01	31.03
information	53.06	48.89	25.29	49.40	46.55
models	55.10	35.56	36.78	37.35	37.93
time	50.00	33.33	29.89	43.37	29.31
modelling	48.98	22.22	37.93	36.14	22.41
system	42.86	37.78	25.29	30.12	31.03
systems	46.94	33.33	24.14	28.92	17.24
area	35.71	22.22	25.29	36.14	27.59
process	38.78	15.56	21.84	33.73	29.31
number	43.88	15.56	22.99	28.92	20.69
areas	43.88	17.78	17.24	25.30	25.86
tools	27.55	22.22	13.79	43.37	17.74
geographic	31.63	33.33	20.69	24.10	13.79
classification	17.35	28.89	14.94	16.87	17.24
distribution	26.53	15.56	14.94	21.69	15.52
processes	31.63	11.11	19.54	22.89	8.62
computer	27.55	22.22	12.64	12.05	18.97
field	20.41	26.67	11.49	21.69	10.34
tool	20.41	28.89	9.20	14.46	15.52
spatially	15.31	20.00	10.34	28.92	6.90
global	20.41	15.56	11.49	14.46	13.79

3.1 Words

The number of words analyzed for the Leeds conference is 26 (Table 3). The most frequently used were data, and spatial, followed closely by GIS and model. Other words that occurred in 50% or more of the abstracts are analysis, models, information, and time. At the 1997 conference, the most important words were GIS and data (Table 3). The only other word that occurred in more than 50% of the abstracts was spatial. The number of words on this list is 33. At Bristol, the most important word was data; spatial followed closely behind in percent frequency (Table 3). These were the only two words among the 27 on the list that occurred in more than 50% of the abstracts. Of the 26 words in the list for the 1999 conference, data and spatial are again the most important (Table 3). Model is the only other word that occurs in more than 50% of the abstracts, although information, at 49.40%, is close. At Chatham, data and spatial were again the most important words, with analysis the only other word exceeding 50% frequency (Table 3). The number of words on this list is 32.

From these data, we can see that geocomputationalists are more concerned with data and the spatial nature of their data, than they are with modelling, results, or applications. Our data consist of numbers, points, and models, at least some of which are digital. The spatial nature of GeoComputation is exemplified by such words as areas, maps, patterns, scale, space, distribution, location, and region. The tools we use include GIS, statistics, classification, maps, positioning, and images. We apply the tools to processes, land, landscapes, and the environment over time. We deal with information, systems, and all things geographic, and we use computers to do this.

3.2 Word Comparison

If we look at the emphasis put on the different words in Table 4, which shows the 25 words with highest percent frequencies common to all five conferences, data are used least at Dunedin and Bristol: the percent frequencies for data are remarkably consistent at the other three conferences (Figure 1), ranging from about 81% to about 83%. Spatial was the second most important word at four of the five conferences, ranking behind GIS at Dunedin. Of greatest importance at Leeds, then decreasing at Dunedin, spatial has increased in frequency from 1997 through 2000. GIS was of greatest importance at the 1996 and 1997 conferences. It decreased in importance at the 1998 conference, but percent frequencies have remained at approximately the same level since then. GIS appeared in about 36% to 42% of the abstracts between 1998 and 2000. Emphasis on analysis was least at Dunedin and Bristol, but, again, the frequencies are relatively consistent, ranging from about 36% to 56%. Emphasis on model(s) and modelling has decreased over time.

We can also look at tools and applications in this same way. The most common tools are unspecified; the only specific tools, in addition to GIS, among the words with the highest percent frequencies, are classification and computer (Figure 2). This is because most tools are phrases, e.g., neural networks, cellular automata. Emphasis on unspecified tools was least at the Bristol conference, and greatest at the Leeds and Dunedin conferences. If one combines the frequencies for the two words (tool and tools), however, emphasis is quite consistent from conference to conference. The use of GIS was again highest at Leeds and Dunedin, and although percent frequencies were lower at the 1998, 1999, and 2000 conferences, they appear stable, ranging from about 36% to 42%, as noted above. Emphasis on classification was greatest at Dunedin; the percent frequencies at the other four conferences were similar, ranging from just under 15% to just over 17%. Percent frequencies for computer decreased consistently from 1996 through 1999, but rose slightly in 2000. Figure 2 suggests that emphasis on tools in general has diminished over time; percent frequencies for individual tools are becoming more similar.

Longley (1998) implies that GIS provides the basis for GeoComputation, yet others (e.g., the conference announcement for Dunedin) state emphatically that this is not so. Percent frequencies in Table 4 indicate that Longley's statement is true. It should be noted, however, that many references to GIS in the abstracts are made in an almost negative or derogatory way: authors typically refer to advances or improvements that their work has made to "traditional GIS." "Geographic Information Systems" is, of course, a phrase, but all usages were converted to the word "GIS" in the abstracts before word analysis was done. The words "geographic," "information," and "systems," as they appear in Table 4, are in fact separate words. Typical uses of these words are geographic information and information systems.

Only two applications appear in the list of 25 words with the highest percent frequencies: process and processes (Figure 3). Percent frequencies for both were highest at Leeds in 1996. Percent frequencies decrease at Dunedin, but then steadily increase from 1998 through 2000. Overall, percent frequencies for process and processes appear to be decreasing over time.

We can also look at words not included on the most frequently used list, for example, GeoComputation and geocomputational. Figure 4 shows how use of these terms has changed since 1996. Percent frequencies were lowest at Leeds, when the word was first introduced, then increased appreciably at Dunedin. Percent frequencies for GeoComputation decreased from 1997 through 1999, then increased substantially in 2000. Geocomputational has decreased in frequency consistently since 1997. These percent frequencies, however, are quite low. GeoComputation was mentioned in about 3% of the abstracts in 1996; in about 11%, in 1997; in about 7%, in 1998; in just over 2%, in 1999; and in about 12%, in 2000. Geocomputational appeared in about 2% of the abstracts in 1996; 12% in 1997; just over 8% in 1998; and in about 3.5% of the abstracts in 1999 and 2000. Disregarding the Dunedin and Fredericksburg conferences, the percent frequencies have been remarkably consistent from conference to conference. It is possible that the very low percent frequencies in 1999 are due to "first use" of the term in the United States; very few participants at previous conferences were Americans and many authors were thus unfamiliar with the term. GeoComputation is most commonly used in keynote presentations, in the context of definition, nature, and scope.

At this point, temporal and spatial limitations restrict the number of words that can be compared, so we have arbitrarily chosen to track from 1996 to 2000 several application areas and several tools as well as a couple of words that are of general interest to us. We look at applications in human geography and hydrology; and then at the tools statistics, artificial intelligence, and the Internet; and finally, at time and data quality.

Because the term "GeoComputation" was coined with reference to human geography (Openshaw, 2000), let us begin with this disciplinary area. There are many words from all five conferences that relate to this subject, but few are exclusive to human geography or to the various subdisciplines within human geography. We have therefore selected five words that are related to human geography and are applicable to most areas of research within this field: social, demographic, census, city, and urban. The percent frequencies for these words are shown graphically in Figure 5. But first a few caveats: none of these words occur in two or more Dunedin abstracts, so this data set is not included in the analysis; demographic is used in only one Fredericksburg abstract, and was thus not included in the data set analyzed; and city does not appear in the Bristol abstracts. All terms, with the exceptions of demographic and census, have decreased in percent frequency from 1996 to 2000. Percent frequencies for all words were lowest at Fredericksburg in 1999. Percent frequencies for social, city, and cities were highest at the Leeds conference. The highest frequency for census was at the Chatham conference, which is not surprising with the 2001 UK census imminent. The low frequency for census at the Fredericksburg conference is surprising for the reverse reason; the 2000 US census was imminent at the time. The highest percent frequency for urban occurred at the Bristol conference in 1998 and the lowest at Fredericksburg. Bristol is the largest city in which a GeoComputation conference has been held, and Fredericksburg is the smallest.

Another application area of interest is hydrology (Figure 6). Papers on hydrology have been presented at all five conferences, although there were few at Dunedin. Eight words were selected for analysis, words that occurred in at least four sets of abstracts and that are unequivocally related to hydrology. These words are: catchment, drainage, flood, hydrologic(al), river, runoff, stream, and water. Percent frequencies were highest in 1998 and lowest in 1997, where only flood and water were used. Interestingly enough, the frequencies for these two words at Dunedin, 8.9% and 13.3%, respectively, were the highest for those words at all five conferences. Percent frequencies were relatively stable for the 1996, 1999, and 2000 conferences, in which these words appeared in about 43% to 52% of the abstracts. With respect to the individual words, percent frequencies for flood and water decreased from 1996 to 2000, and those for hydrologic(al) and stream increased. Percent frequencies for runoff and catchment are generally up; river and drainage appear to be relatively stable. It thus appears that hydrology continues to be a major application area for geocomputational research.

One tool not included in the 25 words with highest percent frequencies that can be addressed with respect to single words is statistics (Figure 7). Percent frequencies for statistical and statistics were highest at the 1996 conference, and then decreased considerably at the 1997 conference, but between 1997 and 2000, percent frequencies appear to have stabilized, appearing in about 6% to 10% and about 3.5% to just under 7% of the abstracts, respectively. Five words that can unequivocally be related to statistics that occurred in abstracts from at least four conferences were chosen for analysis: correlated, multivariate, nonlinear, regression, and variance. All five words were used at the 1996, 1998, 1999, and 2000 conferences; none were used in 1997. Of these five words, only regression and variance appear to be increasing in use, and only multivariate is decreasing. Nonlinear appears to be rather unstable, whereas correlated is relatively stable. All percent frequencies are low; the only percent frequency over 25% being for nonlinear at Fredericksburg. Mean percent frequencies are below 8.5% for the remaining four words. Traditional statistical analysis, however, has a continuing, albeit low level, presence in the conference series, and thus appears to be here to stay as a geocomputational tool.

Because artificial intelligence plays such a big role within GeoComputation, it is useful to look at tool words related to this specialty. Five words were selected for analysis: neural, fuzzy, expert, genetic, and automata (Figure 8). It is obvious that all five words are actually parts of phrases, i.e., neural network(s), fuzzy logic, expert systems, genetic algorithms and programming, and cellular automata, but we believe analysis of individual words may provide meaningful information about this area of expertise. All five words appear in the abstracts for 1996, 1998, 1999, and 2000; only neural, expert, and genetic occur in the 1997 abstracts. Percent frequencies for all five words were at their highest at Leeds. Expert and genetic decreased in percent frequency from 1996 through 1998, then increased from 1999 to 2000. Automata was relatively stable from 1996 through 1999, but decreased in frequency in 2000. Neural, the word with highest percent frequency, decreased in frequency between 1996 and 1997, and then increased from 1998 through 2000. Fuzzy decreased in frequency between 1996 and 1998, increased in 1999, but then decreased again in 2000. If one ignores the increase in 1999, percent frequencies for fuzzy seem to be decreasing over time. Overall, percent frequencies for these artificial intelligence tools decreased from 1996 to 1998, and then increased in frequency from 1998 to 2000, suggesting a renewal of interest.

One final tool that can be effectively evaluated using individual words is the Internet (Figure 9). Reference to the Internet itself and all associated terms was at a maximum at Fredericksburg and Chatham. Furthermore, percent frequencies, with the exception of the 1997 conference, have steadily increased since 1996. Four words - Internet, web, worldwide web (abbreviated www in word analysis), and online - were selected for analysis. Each word is present in the abstracts from at least three conferences. Percent frequencies are low: only web at Fredericksburg occurred in more than 20% of the abstracts. However, the number of words used at the individual conferences is increasing over time: three words appeared in the Leeds abstracts; one, in the Dunedin abstracts; three, in the Bristol abstracts; and all four, in both the Fredericksburg and Chatham abstracts. Use of the words online, which did not appear until 1998, and the Internet, are increasing, and the percent frequency of web is generally increasing. Only percent frequencies for worldwide web are generally decreasing. This may result from general changes in word usage over time or from lack of consistency by the person who prepared the abstracts for word and phrase analysis (JE). There are also many references to various web sites in abstracts from all five conferences. These references were not included in the analysis, primarily because they were given to provide information in addition to that found in the abstract about the author's work. Regardless of the low percent frequencies, however, we believe this is an emerging technology in the field of GeoComputation.

Time is an important concept at all five conferences. Three words that deal with time, which occurred in the abstracts of three or more conferences, were selected for analysis: time, spatiotemporal, and temporal (Figure 10). Percent frequencies for all words, except time, are low, with a maximum of just under 17% for temporal at Fredericksburg. Three of these words were used in the Leeds, Fredericksburg, and Chatham abstracts; and two in the Dunedin and Bristol abstracts. The highest percent frequencies were achieved at Leeds. However, mean percent frequencies have decreased over time, possibly suggesting diminishing interest. Only spatiotemporal is increasing in use. Time itself is decreasing (!!). Only temporal may be stable; the pattern for this word is irregular.

Finally, as noted by Brooks and Anderson (1998), data quality is a key issue in GeoComputation. So before moving to the analysis of phrases, we look at how GeoComputation researchers treat data with reference to data quality, accuracy, errors, and uncertainty (Figure 11). Five words were selected for analysis - quality, accurate, error, errors, and uncertainty - although the word quality may not be unequivocal in this context. The five words were used at the 1996, 1998, 1999, and 2000 conferences; only four were used in 1997. In general, quality is decreasing over time. However, percent frequencies for uncertainty and accurate increased from 1996 through 1999, although they decrease in 2000. Error appears relatively stable, but percent frequencies for errors are decreasing. With the emphasis on modelling in GeoComputation, often using synthetic data or data over which the modeller has little to no control (e.g., data from various Internet sites, digital elevation data, or census data), an understanding of data quality, and the ability to address accuracy, error, and uncertainty quantitatively is often of crucial importance. One hopes the decreases in the percent frequencies for these words at Chatham are aberrations.

4. Phrases

The same conventions were used for analysis of phrases in the abstracts from the five conferences as were used for analysis of words. The phrases referred to herein are the most frequently used phrases that we considered to have a "GeoComputation" context. Again, phrases involving words such as "method," "approach," and "problem" are not included. It should also be noted that "GIS" was treated as a word, rather than a phrase and thus does not occur in the phrase analyses unless combined with other words in a phrase, e.g. GIS technology.

Several caveats are required before we begin analysis of phrases. First, spot checks of the phrase source data show that what the software identified as a phrase is not necessarily. Examples of such errors include "biochemistry exhibiting reflectance," "dimensional topological," and "important research remains." This problem is at least partly due to the complexity of the English language: "remains," for example, can be either a noun or a verb, and in this case the software identified a verb as a noun, producing a meaningless phrase. Second, we have found that certain phrases of interest to us, such as high performance computing and exploratory data analysis, were not identified by the software. Finally, the software is arbitrary in its identification process. For example, we are interested in the phrase "artificial intelligence." Let us say that the software identified this phrase in two abstracts at one conference. But it also identified the words "artificial" and "intelligence" as parts of a different phrase, e.g., "artificial intelligence technologies," in two abstracts. We cannot combine these and say the phrase "artificial intelligence" occurs in four abstracts because we do not know whether the two abstracts in which "artificial intelligence" occurs are the same as the two in which "artificial intelligence technologies" occur. We could determine this, by checking every single phrase of interest in every abstract, but because of the laborious nature of this task, we chose to use the results of phrase analysis as is. In this example, then, we would say there were only two occurrences of the phrase "artificial intelligence." Finally, it must be remembered that our data set includes only those phrases that occur in two or more abstracts at each conference. The results of phrase analysis discussed below must thus take these caveats into consideration.

4.1 Phrases

The number of phrases analyzed from each conference varies - Leeds, 35; Dunedin, 13; Bristol, 66; Fredericksburg, 103; and Chatham, 43 - because the phrases at the bottom of each list occur in the same number of abstracts, i.e., have the same percent frequencies, so there was no way to separate them. The Dunedin abstracts tended to be short, significantly shorter than abstracts for the other four conferences, and this may at least in part explain the reduced number of phrases in this data set. The phrases from each conference are shown in Tables 5A and 5B. These percent frequencies represent the number abstracts in which that phrase occurs normalized by the number of abstracts for that conference. Because singular and plural forms were combined, these percent frequencies are inflated by an unknown amount.

Table 5A. Most Frequently Used Phrases, 1996, 1997, and 2000

-	1996		1997		2000
Rank	Phrase	Percent frequency	Phrase	Percent frequency	Phrase	Percent frequency
1	spatial data	20.41	spatial data	8.89	spatial data	18.97
2	data set(s)	18.37	spatial information	6.67	neural networks	12.07
3	neural network(s)	16.49	spatial analysis	6.67	data set(s)	10.34
4	spatial analysis	13.27	neural networks	6.67	spatial analysis	8.62
5	spatial object(s)	10.20	quantitative revolution	4.44	genetic algorithm(s)	6.9
6	spatial distribution	8.16	digital elevation	4.44	GIS software	6.90
7	knowledge base	7.14	aerial photographs	4.44	Voronoi diagram(s)	6.90
8	digital elevation model(s)	7.14	geographic space	4.44	census data	5.17
9	geographic data	7.14	geographic data	4.44	cluster analysis	5.17
10	genetic algorithm	6.12	spatial dimensions	4.44	data structure	5.17
11	expert system(s)	6.12	computational geography	4.44	digital elevation models	5.17
12	cellular automata	5.10	resource management	4.44	geographic information	5.17
13	data model(s)	5.1	expert systems	4.44	time series	5.17
14	data analysis	5.10	-	-	catchment area	3.45
15	data structure(s)	5.1	-	-	cellular automata	3.45
16	spatial relations	5.1	-	-	computational tool(s)	3.45
17	sensitivity analysis	5.1	-	-	computer simulation	3.45
18	fractal dimension	4.08	-	-	correlation coefficient	3.45
19	genetic programming	4.08	-	-	data mining	3.45
20	statistical analysis	4.08	-	-	digital photogrammetry	3.45
21	elevation model(s)	4.08	-	-	earth=s surface	3.45
22	time series	4.08	-	-	economic variables	3.45
23	raster GIS	4.08	-	-	elevation model(s)	3.45
24	mathematical model(s)	4.08	-	-	GeoComputation techniques	3.45
25	artificial intelligence	4.08	-	-	GIS technology	3.45
26	analytical tool(s)	4.08	-	-	GIS application	3.45
27	modelling tools	4.08	-	-	grid modelling	3.45
28	physical processes	4.08	-	-	predictive models	3.45
29	regression analysis	4.08	-	-	proximity relations	3.45
30	spatial information	4.08	-	-	satellite images	3.45
31	spatial reasoning	4.08	-	-	self-organizing map	3.45
32	spatial relationship(s)	4.08	-	-	spatial object(s)	3.45
33	statistical model(s)	4.08	-	-	spatial distribution	3.45
34	mathematical modelling	4.08	-	-	spatiotemporal data	3.45
35	regional scale(s)	4.08	-	-	SPOT image(s)	3.45
36	-	-	-	-	statistical analysis	3.45
37	-	-	-	-	structural analysis	3.45
38	tools		-	-	temporal scale(s)	3.45
39	analysis and modelling		-	-	transition rules	3.45
40	data		-	-	urban system(s)	3.45
41	results		-	-	vector data	3.45
42	products		-	-	visual basic	3.45
43	applications		-	-	Voronoi modelling	3.45

Table 5B. Most Frequently Used Phrases, 1998 and 1999

-	1998		1999
Rank	Phrase	Percent frequency	Phrase	Percent frequency
1	spatial data	11.49	data set(s)	15.66
2	spatial analysis	9.20	spatial data	10.84
3	data set(s)	9.2	digital elevation model(s)	9.64
4	neural network(s)	6.90	neural network(s)	9.64
5	spatial relations	5.75	remote sensing	7.23
6	spatial distribution(s)	4.60	spatial distribution	6.02
7	data model(s)	4.6	spatial analysis	4.82
8	digital elevation model(s)	4.60	fuzzy logic	4.82
9	cellular automata	3.45	elevation models	4.82
10	enumeration district(s)	3.45	hydrologic modelling	4.82
11	spatial interaction models	3.45	triangulated irregular network	4.82
12	spatial pattern(s)	3.45	geographic space	4.82
13	spatial resolution	3.45	spatial statistic(s)	4.82
14	spatial scale(s)	3.45	data analysis	3.61
15	spatial accuracy	2.30	urban areas	3.61
16	rainfall amounts	2.30	cellular automata	3.61
17	geographic analysis	2.30	land cover	3.61
18	catchment area(s)	2.30	time period(s)	3.61
19	urban area(s)	2.30	time step(s)	3.61
20	drainage basin	2.30	operational system(s)	3.61
21	climate change(s)	2.30	decision tree(s)	3.61
22	social characteristics	2.30	data warehouse	3.61
23	land cover	2.30	economic activity	2.41
24	attribute data	2.30	genetic algorithm	2.41
25	census data	2.30	propagation algorithm(s)	2.41
26	geographic data	2.30	spatial autocorrelation	2.41
27	spatial databases	2.30	population census	2.41
28	fractal dimension(s)	2.30	fuzzy classification	2.41
29	GIS environment	2.30	statistical classifier(s)	2.41
30	soil erosion	2.30	supervised classifier	2.41
31	geometric feature(s)	2.30	thin clients	2.41
32	spatial features	2.30	GIS community	2.41
33	visibility index(ices)	2.30	environmental conditions	2.41
34	spatial information	2.30	spatial correlation	2.41
35	data integration	2.30	feature data	2.41
36	human intervention	2.30	geographic data	2.41
37	visual intrusion	2.30	GIS data	2.41
38	process laws	2.30	hyperspectral data	2.41
39	elevation model	2.30	rainfall data	2.41
40	computer models	2.30	sample data	2.41
41	hydrologic model(s)	2.30	sensed data	2.41
42	numerical model(s)	2.30	simulated data	2.41
43	spatial model(s)	2.3	training data	2.41
44	hydrologic modelling	2.3	drainage density	2.41
45	numerical modelling	2.3	urban development	2.41
46	spatial objects	2.3	standard deviation	2.41
47	GIS packages	2.30	spatial domain	2.41
48	river flow prediction	2.30	natural environment(s)	2.41
49	pore water pressure	2.30	geological features	2.41
50	house prices	2.30	linear feature(s)	2.41
51	new road(s)	2.30	surface features	2.41
52	transition rules	2.30	water flow	2.41
53	factor of safety	2.30	vector format	2.41
54	catchment scale	2.30	power function(s)	2.41
55	geographic space	2.30	data fusion	2.41
56	urban space(s)	2.30	hyperspectral imagery	2.41
57	gauging stations	2.3	spatial information	2.41
58	time step	2.3	spatial interpolation techniques	2.41
59	earth's surface	2.3	ordinary kriging	2.41
60	support system	2.3	hidden layer(s)	2.41
61	graph theory	2.30	edge length(s)	2.41
62	visualization tools	2.3	maximum likelihood	2.41
63	sediment transport	2.30	decision maker(s)	2.41
64	soil type(s)	2.30	binary map(s)	2.41
65	spatial variability	2.3	geologic maps	2.41
66	spatial variation(s)	2.3	topographic maps	2.41
67	-	-	error matrix(ces)	2.41
68	-	-	statistical methods	2.41
69	-	-	cellular model	2.41
70	-	-	data model(s)	2.41
71	-	-	dimensional models	2.41
72	-	-	environmental models	2.41
73	tools		spatial modelling	2.41
74	analysis and modelling		road network(s)	2.41
75	data		mineral occurrence	2.41
76	results		field of view	2.41
77	products		predictor pattern(s)	2.41
78	applications		spatial patterns	2.41
79	-	-	training phase	2.41
80	-	-	gold potential	2.41
81	-	-	classification process	2.41
82	-	-	spatial processes	2.41
83	-	-	surface processes	2.41
84	-	-	physical properties	2.41
85	-	-	temporal resolution(s)	2.41
86	-	-	temporal scale(s)	2.41
87	-	-	training set	2.41
88	-	-	cell space	2.41
89	-	-	slope stability	2.41
90	-	-	cell states	2.41
91	-	-	topological structure	2.41
92	-	-	spatial structure(s)	2.41
93	-	-	decision support	2.41
94	-	-	earth's surface	2.41
95	-	-	self-organizing system	2.41
96	-	-	GIS system(s)	2.41
97	-	-	information system(s)	2.41
98	-	-	GIS technology	2.41
99	-	-	classification tree(s)	2.41
100	-	-	Delaunay triangulation	2.41
101	-	-	ground truth	2.41
102	-	-	data values	2.41
103	-	-	pixel values	2.41

We have developed an innovative approach to analyze these data. The phrases are separated into categories that represent the majority of the types of phrases in each list. Percent frequencies and minimum, maximum, and mean ranks for the phrases in each category are then computed. These values provide us with a sense of the relative importance of each category in each conference, which in turn gives an impression of how the conferences have changed over time, or, in other words, how GeoComputation as illuminated by conference presentations has changed over time. The categories are tools, analysis and modelling, data, results, products, and applications. Each category is color coded in Tables 5A and 5B - tools in blue, analysis and modelling in green, data in red, results in pink, products in aqua, and applications in purple. Phrases that are not color coded, e.g., geographic space, temporal scales, are not included in any of the categories and are treated separately.

Eleven tools - neural networks, knowledge-based (systems), genetic algorithms, expert systems, cellular automata, genetic programming, raster GIS, artificial intelligence, various unspecified analytical and modelling tools, and spatial reasoning - were used at the Leeds conference. The types of analysis used were spatial, data, sensitivity, statistical, and regression analysis; and mathematical modelling. The data used comprised spatial data, data sets, spatial objects, digital elevation models, geographic data, data models and structures, elevation models, time series, and mathematical and statistical models. Results obtained included spatial distributions and relations, fractal dimensions, and spatial information and relationships. This research was applied to physical processes and done at regional scales. If we look at each of the categories in Table 6, we can gain some understanding from these phrases with respect to emphasis at the 1996 conference. The mean ranks (ranks are given in ascending order, so the lowest mean rank equates to the lowest rank, and is thus "best" suggest that the 1996 conference was a data-based conference. Next in importance to the data used were tools and analysis and modelling. No products occur among the 35 phrases, and only one application, physical processes.

Table 6. Comparison of Categories, Leeds, 1996

-	Rank			Percent phrases
-	Mean	Minimum	Maximum	-
Tool	17.64	3	31	31.43
Analysis and modelling	19.67	4	34	17.14
Data	13.91	1	33	31.43
Results	20.40	6	32	14.29
Products	na	na	na	na
Applications	28.00	-	-	2.86

At the Dunedin conference, three tools - neural networks, computational geography, and expert systems - were used. The main type of analysis was spatial analysis. Analyses were done in geographic space to obtain spatial information. Data types included spatial data and information, digital elevation (models), aerial photographs, and geographic data. The most important results were spatial dimensions, and the primary application was resource management. GeoComputation was viewed as an important advance in the quantitative revolution. The mean ranks in Table 7 indicate that analysis and modelling were most important at this conference; the data were second in importance.

Table 7: Comparison of Categories, Dunedin, 1997

-	Rank			Percent phrases
-	Mean	Minimum	Maximum	-
Tools	10	6	13	23.1
Analysis and modelling	3	-	-	7.7
Data	5.2	1	9	38.5
Results	10	-	-	7.7
Products	na	na	na	na
Applications	5	-	-	7.7

Emphasis changed slightly at the 1998 Bristol conference. Five tools, neural networks, cellular automata, spatial interaction models, GIS packages, and visualization tools, were used. The most important types of analysis and modelling were spatial and geographic analysis, computer models, hydrologic and numerical modelling, and transition rules. Data types included spatial data; data sets and models; Enumeration District (data); rainfall amounts; attribute, census, and geographic data; geometric and spatial features; spatial information; elevation, hydrologic, and numerical models; spatial objects; and house prices in geographic and urban space. Results included spatial relations, distributions, and patterns; fractal dimensions; visibility indices; and spatial variability and variation. The primary product was support systems. Applications in physical geography included catchment areas, drainage basins, climate change, soil erosion, water flow predictions, catchment scale (studies), the earth's surface, sediment transport, and soil types. Applications in human geography include urban areas, social characteristics, land cover, and new roads. Spatial resolution, accuracy, and scales were also important. Finally, much of this research was done in a GIS environment. The mean ranks shown in Table 8 indicate that tools and data were equally important at Bristol. Results and analysis followed closely behind. Participants at Bristol thus placed their emphasis a bit differently than those at Leeds or Dunedin.

Table 8. Comparison of Categories, Bristol, 1998

-	Rank			Percent phrases
-	Mean	Minimum	Maximum	-
Tool	26.60	4	62	7.58
Analysis and modelling	33.33	2	52	9.09
Data	26.58	1	50	28.79
Results	30.71	5	66	10.61
Products	60	-	-	1.52
Applications	37.85	18	64	19.7

At Fredericksburg, eight tools, neural networks, remote sensing, fuzzy logic, cellular automata, genetic and propagation algorithms, ordinary kriging, and GIS (systems and technology), were used. The most important types of analysis and modelling among the 16 phrases in Table 5B were spatial analysis, hydrologic modelling, spatial statistics, and data anlysis. Analyses were done over periods of time, in time steps, with temporal resolutions and scales, and in cell and geographic space. The most important data types among the 27 listed in Table 5B include data sets, spatial data, digital elevation and elevation models, and triangular irregular networks. Data were stored in data warehouses. Results included spatial distributions, standard deviations, error matrix(ces), two- and three-dimensional and environmental models, and predictor and spatial patterns. The most important products were operational systems, decision trees, various types of maps, decision support (systems), and classification trees. The most important applications among the 19 listed in Table 5B were in human geography, land cover and urban areas. The mean ranks for this conference in Table 9 indicate that the tools used were most important at Fredericksburg. Analysis and modeling were second in importance, followed by the data.

Table 9. Comparison of Categories, Fredericksburg, 1999

-	Rank			Percent phrases
-	Mean	Minimum	Maximum	-
Tool	37.22	4	98	8.74
Analysis and modelling	40.13	7	81	15.53
Data	49.14	1	103	27.18
Results	59.57	6	78	6.80
Products	61.14	20	99	6.80
Applications	57.26	15	94	18.45

And finally, at Chatham in 2000, eleven tools - neural networks, genetic algorithms, GIS software, cellular automata, unspecified computational tools, computer simulation, data mining, digital photogrammetry, GeoComputation techniques, GIS technology, and Visual Basic - were used. The types of analysis and modelling used at this conference were spatial and cluster analysis, grid modelling, statistical and structural analyses, transition rules, and Voronoi modelling. Data types include spatial data, data sets, census data, data structure, digital elevation models, geographic information, time series, economic variables, elevation models, satellite images, spatial objects and distributions, spatiotemporal data, SPOT images, and vector data. The results included Voronoi diagrams, correlation coefficients, predictive models, proximity relations, and spatial distributions. The only product was self-organizing maps. Applications included catchment areas, the earth's surface, unspecified GIS applications, and urban systems. Table 10 indicates that this conference is also tool based like Fredericksburg, with data ranking second in importance. Results follow closely behind.

Table 10. Comparison of Categories, Chatham, 2000

-	Rank			Percent phrases
-	Mean	Minimum	Maximum	-
Tools	17.4	2	42	25.58
Analysis and modelling	27.9	4	43	16.28
Data	19.6	1	41	32.56
Results	23.0	7	33	11.63
Products	31.0	-	-	2.33
Applications	25.3	14	40	9.3

Figure 12 shows the relative importance of each category of phrases at each conference in reverse rank order, i.e., the highest number (6) shown in this graph indicates which category is most important. We can see that emphasis has changed over time: data were most important in 1996; analysis and modelling, in 1997; data again, in 1998, and tools, in 1999 and 2000. We also see a change in which category was second in importance: in 1996 tools were second in importance; in 1997, data; in 1998, tools; in 1999 and 2000, analysis and modelling. We seem to be moving toward greater concern with how we do things, i.e., what tools we use, and toward the methods we use, i.e, analysis and modelling. Emphasis on results and products, which were not included in the 1996 and 1997 abstracts, appears to be increasing, suggesting that we're moving toward developing more practical application of our research over time as well. This suggestion is supported by the increase in interest in applications per se over time, disregarding the 1997 conference.

4.2 Phrase Comparison

Only three phrases are common to the five conferences, spatial data, neural networks, and spatial analysis. It is apparent from Table 11 and Figure 13 that, although these phrases are common, percent frequencies for each phrase have generally decreased since 1996. Unfortunately, this small number of phrases tells us little about trends in GeoComputation or what GeoComputation will mean in the future. So let us look at phrase usage based on the words that were most important at the five conferences, e.g., what types of data and tools do we use? What types of models are we interested in?, to see what trends can be identified in this data set.

Table 11: Percent Frequencies for Common Phrase

Phrase	1996	1997	1998	1999	2000
spatial data	20.4	8.9	11.5	2.4	5.2
spatial analysis	13.3	6.7	9.2	4.8	5.2
neural network(s)	16.5	6.7	66.9	9.6	5.2

4.3 Word Associations

Data is the most important word used at the five conferences (Table 4). What types of data, data sets, and databases are used in GeoComputation? Figure 14 shows the types of data that are mentioned in the conference abstracts. Spatial data is by far the most important, almost three times more important than its nearest "rival," and the only type of data common to all five conferences. Geographic data are also important, appearing in the abstracts from four of the conferences. If we look at Table 12, however, we see that references to various data types has diminished from 1996 to 1999. There were 15 references to data types in 1996; two in 1997; four, in 1998; ten, in 1999; and four, in 2000. No trends in data types are apparent, because different types of data have been used by participants at each conference. Note that except for a few phrases, percent frequencies are very low, typically less than 3% (appearing in three abstracts or fewer).

Table 12. Percent Frequencies for the Types of Data Used by GeoComputation Researchers

-	1996	1997	1998	1999	2000
spatial data	20.4	8.9	11.5	2.4	5.2
geographic data	7.1	4.4	2.3	2.4	-
feature data	-	-	-	10.8	-
census data	-	-	2.3	-	5.2
training data	3.1	-	-	2.4	-
vector data	2	-	-	-	3.5
GIS data	2	-	-	2.4	-
attribute data	2	-	2.3	-	-
spatiotemporal data	-	-	-	-	3.5
socioeconomic data	3.1	-	-	-	-
raster data	3.1	-	-	-	-
simulated data	-	-	-	2.4	-
sample data	-	-	-	2.4	-
rainfall data	-	-	-	2.4	-
hyperspectral data	-	-	-	2.4	-
(remotely) sensed data	-	-	-	2.4	-
synthetic data	2.0	-	-	-	-
qualitative data	2.0	-	-	-	-
multivariate data	2.0	-	-	-	-
missing data	2.0	-	-	-	-
digital data	2.0	-	-	-	-
aggregate data	2.0	-	-	-	-
(geo)referenced data	2.0	-	-	-	-

spatial databases	3.1	-	2.3	-	-
geographic databases	2.0	-	-	-	-
relational databases	2.0	-	-	-	-

data sets	2.0	-	9.2	15.7	-
training (data) set	-	-	-	2.4	-
geographic data sets	2	-	-	-	-

References are also made to three types of databases - spatial databases, relational databases, and geographic databases - but, again, the percent frequencies are very low (Table 12). Databases were referred to only at Leeds, with the exception of spatial databases, which were also mentioned in two abstracts at Bristol. Reference to data sets is also limited: the general term is commonly used and has moderate percent frequencies, but reference to specific types of data sets is very limited.

We also deal with models and modelling - a large number of different types - as shown in Table 13. It is obvious from these data that the word model means very different things to different people. No references to specific model(s) and/or modelling were made in more than one abstract in 1997. The most common types of models at the other four conferences are digital elevation and data models, which occur in the abstracts of four conferences, followed closely by elevation models, which occur in the abstracts of three. No pattern emerges from this data, however, because of the absence of references in the Dunedin abstracts and very small number of references in the Chatham abstracts and the overall low percent frequencies. In addition to these three, the only types of models mentioned in more than one set of abstracts are computer models, spatial interaction models, spatial models, and hydrologic models. With respect to modelling, hydrologic modelling is more important than any other type, and its importance appears to be increasing, with the exception of Chatham. Percent frequencies for all types of modelling are typically low. In addition to hydrologic modelling, only spatial modelling appears in more than one set of conference abstracts. No trends can thus be identified.

Table 13. Percent Frequencies for the Types of Models and Modelling
Used by GeoComputation Researchers

-	1996	1997	1998	1999	2000
digital elevation model(s)	7.2	-	4.6	9.6	5.2
data model(s)	5.1	-	4.6	2.4	3.4
elevation model(s)	4.1	-	2.3	4.8	-
computer model(s)	3.1	-	2.3	-	-
spatial interaction models	2.0	-	3.4	-	-
spatial model(s)	3.1	-	2.3	-	-
hydrologic models	3.1	-	2.3	-	-
mathematical model(s)	4.1	-	-	-	-
statistical model(s)	4.1	-	-	-	-
grid models	-	-	-	-	3.4
geographic model(s)	3.1	-	-	-	-
hybrid model(s)	3.1	-	-	-	-
cellular model	-	-	-	2.4	-
dimensional models	-	-	-	2.4	-
environmental models	-	-	-	2.4	-
numerical model(s)	-	-	2.3	-	-
forest fire model	2	-	-	-	-
dynamic model(s)	2	-	-	-	-
fractal model	2.0	-	-	-	-
fuzzy model	2.0	-	-	-	-
fuzzy logic model(s)	2.0	-	-	-	-
regression models	2.0	-	-	-	-
tessellation models	2.0	-	-	-	-

hydrologic modelling	2.0	-	2.3	4.8	-
spatial modelling	2.0	-	-	2.4	-
mathematical modelling	4.1	-	-	-	-
voronoi modeling	-	-	-	-	3.4
elevation modelling	-	-	-	-	3.4
computer modelling	3.1	-	-	-	-
numerical modelling	-	-	2.3	-	-
spatial process modelling	2.0	-	-	-	-
environmental modelling	2.0	-	-	-	-
fuzzy modelling	2.0	-	-	-	-
dimensional modelling	2.0	-	-	-	-
fuzzy logic modelling	2	-	-	-	-
spatial interaction modelling	2	-	-	-	-

As noted above and shown in Tables 3 and 4, systems are also of importance in GeoComputation. Table 14 lists the types of systems with which we deal. Just like the word model, system means different things to different people. Most references to systems were made in the 1996 and 1999 abstracts; little emphasis was placed on systems in 1997, 1998, or 2000. Expert systems and information systems were most important, with decision support systems not far behind: percent frequencies are very low, however. The pattern of use is similar to that for models and modelling, in that, with the exceptions of expert, information, and decision support systems, specific systems are referred to at only one conference. No one type of system is in continuing use between 1996 and 2000, and those that are mentioned in the abstracts from more than one conference appear to be decreasing in frequency over time.

Table 14. Percent Frequencies for the Types of Systems Used by
GeoComputation Researchers

-	1996	1997	1998	1999	2000
expert system(s)	6.1	4.4	-	-	-
information system(s)	7.1	-	-	2.4	-
decision support system	4.1	-	2.3	-	-
global positioning system	-	4.4	-	-	-
operational system(s)	-	-	-	3.6	-
earths system(s)	-	-	-	-	3.4
urban systems	-	-	-	-	3.4
prototype system	3.1	-	-	-	-
dynamical system(s)	-	-	-	2.4	-
GIS system(s)	-	-	-	2.4	-
self-organizing system	-	-	-	2.4	-
computer system(s)	2.0	-	-	-	-
geographic systems	2.0	-	-	-	-
integrated system	2.0	-	-	-	-
spatial information system	2.0	-	-	-	-

4.4 Tool Associations

We can also look at the tools that are used to analyze these data and to develop the models. These data are shown in Table 15, which contains a combination of phrases and phrases categories. The latter are italicized in Table 15, as well as in the text below. In addition to reducing the number of tool phrases, using categories allows us to view related types of tools in context. For example, different presenters use different parts of a GIS in their research -- some may use raster data whereas others use vector data, yet they are all doing this within a GIS. The standard caveat applies with respect to tool phrases and categories as it has done previously: percent frequencies tend to be quite low. Most are less than 5%, so any patterns identified must be considered tentative.

Table 15. Percent Frequencies for the Tools Used by GeoComputation Researchers

-	1996	1997	1998	1999	2000
spatial analysis	33.6	6.7	19.5	25.1	5.2
GIS	26.6	-	2.3	4.8	17.1
neural networks	16.5	6.7	6.9	9.6	5.2
statistical analysis	23.3	-	-	4.8	3.4
remote sensing	2	4.4	-	9.6	8.6
algorithms (various types)	12.1	-	-	4.8	6.9
knowledge base & expert (systems)	13.2	4.4	-	-	-
maps (various types)	6	-	-	7.2	3.4
cellular automata	5.1	-	3.4	3.6	3.4
classification (various types)	4	-	-	9.6	-
fractal analysis	9.2	-	2.3	-	-
fuzzy methods	4	-	-	4.8	-
rules	5.1	-	2.3	-	-
visualization	5.1	-	2.3	-	-
web software	-	-	-	-	6.9
artificial intelligence	6.1	-	-	-	-
cluster analysis	2	-	-	-	3.4
geographic/data mining	2	-	-	-	3.4
sensitivity analysis	5.1	-	-	2.4	-
(object)-oriented programming	4.1	-	-	-	-
geocomputational technology	-	-	-	-	3.4
numerical automata	-	-	-	-	3.4
computational tools	-	-	-	-	3.4
inference engine	3.1	-	-	-	-
ordinary kriging	-	-	-	2.4	-
intelligent agents	2	-	-	-	-
model breeding	2	-	-	-	-
computer graphics	2	-	-	-	-
genetic programming	2	-	-	-	-
virtual reality	2	-	-	-	-
computer systems	2	-	-	-	-

Spatial analysis and GIS are by far the most important tools used in GeoComputation. References to spatial analysis include spatial correlation, distributions and statistics; spatial operators, processes, and reasoning; and spatial interpolation techniques. Specific phrase references to GIS include GIS functions; ArcInfo, MapInfo; commercial, (two- and three-) dimensional, and raster GIS; the GIS literature; and GIS packages, software, systems, tools, techniques, and technologies. Next in importance are neural networks and statistical analysis, which rank 3 and 4, respectively. References to statistical analysis include (linear) regression; statistical and frequency distributions; power functions; maximum likelihood classifiers; statistical methods and modelling; and summary statistics. The types of algorithms referred to include genetic, neural, parallel, propagation, and serial. Remote sensing references include satellite information; SPOT and hyperspectral imagery; and aerial photographs. Classification tools include fuzzy and statistical classifiers, supervised classification, the classification process; and the various types of maps are binary, two- and three-dimensional, geologic, self-organizing, systemic, and topographic. Fractal analysis includes fractal dimension and fractal geometry; fuzzy methods include fuzzy logic and fuzzy set theory; and rules are either simple or transition. Visualization includes tools and methods and dynamic visualizations. Mean percent frequencies for each tool or tool category are listed in Table 15; those with mean percent frequencies equal to or greater than 1% are shown graphically in Figure 15.

Figure 16 shows the percent frequencies for the tool phrases and categories that occur in two or more sets of abstracts from the five conferences. The only tool phrases or categories that occur at all five conferences are spatial analysis and neural networks; references to GIS, remote sensing, and cellular automata occur at four conferences; and statistical analysis, and various algorithms and maps were mentioned at three conferences. The remaining 21 tools on the list are mentioned in two sets of abstracts, suggesting a large amount of diversity in tool use among geocomputationalists.

4.5 Application Associations

Finally, we can look at how applications associated with GeoComputation have changed over time. No application phrases were identified by the phrase analysis software for the Chatham conference in 2000, so we must restrict our discussion to the first four conferences. Table 16 shows the percent frequencies for phrases and phrase categories that deal with applications, and Figure 17 shows the percent frequencies for the 14 phrases and phrase categories that appear in two or more sets of conference abstracts. Phrase categories are italicized. Again, percent frequencies are low, most less than 6%.

Table 16. Applications of GeoComputation, 1996-1999

-	1996	1997	1998	1999
hydrology	4	-	16.1	9.6
time	17.2	-	2.3	9.6
the environment	10.2	4.4	-	4.8
earth sciences	6.1	-	4.6	7.2
urban areas	6.1	-	2.3	6
decision support	4.1	-	-	7.2
soil	4.1	-	6.9	-
population census	4.1	-	3.4	2.4
processes	7.1	-	-	2.4
land cover	2	-	2.3	3.6
climate changes	4.1	-	2.3	-
ecological studies	6.1	-	-	-
social sciences	3.1	-	2.3	-
roads	-	-	2.3	2.4
human activities	2	-	2.3	-
economic activity	-	-	-	2.4
decision maker(s)	-	-	-	2.4
visual intrusion	-	-	2.3	-
artificial life	2	-	-	-
agricultural production	2	-	-	-

The most frequent applications of GeoComputation research involve hydrology and time, followed by environmental applications and applications in the earth sciences, excluding soil. The frequency of phrases relating to hydrology, which include catchment areas and scale; drainage basins, density and networks; water flow; hydrologic modelling; river flow prediction; and rainfall amounts, increases from 1996 to 1998, then decreases in 1999. References to time, which include the right time; real time; time step, series, periods, and scales; time-consuming process; and time complexity, are most frequent at Leeds in 1996, decrease in frequency at Bristol in 1998, and then increase at Fredericksburg in 1999, giving some support to Macmillan's (1998) contention that space-time is increasing in importance in geography. Only two phrases show an increase in interest with respect to application: land cover, which includes land cover and land cover classification, and the earth sciences, including sediment transport, the earth's surface, gold potential, mineral occurrences and deposits, physical properties, and slope stability and angle. References to the environment, which include environmental change, impacts, conditions, and modelling; the natural environment; environmental resource management; and environmental processes, decrease from 1996 to 1999, suggesting either that interest in the environment is decreasing or that research interests are becoming more specific. The frequency of phrases referring to soil, soil properties and types, pore water pressure, and soil erosion, increases from 1996 to 1998, but decreases in 1999. Processes, which include biological, chemical, physical, and surface, and climate change decrease from 1996 to 1999.

Frequencies for applications in human geography generally decrease from 1996 to 1999. References to urban areas, which include urban density, development, and planning, follow a pattern similar to that for time - they are most frequent in 1996, decrease in 1998, and then increase in 1999. References to decision support increase from 1996 to 1999, but occur only in the Leeds and Fredericksburg abstracts. References to roads, which include new roads and road networks, are not present in the 1996 and 1997 abstracts, and frequencies increase from 1998 to 1999. The frequency of phrases referring to the social sciences and social characteristics decrease from 1996 to 1998, and do not occur in the 1999 abstracts. References to the census also follow this pattern, which is surprising with the US census in 2000 and the UK census in 2001. And lastly, the frequency of human activities, which is very low at both Leeds and Bristol, reduces to zero at Fredericksburg.

5. Conclusions

So what do the words and phrases used by participants in the GeoComputation conference series tell us about changes since 1996 and what future trends do we see? First, we must reiterate that our analyses, particularly of phrases, are only as good as the software used, and as noted several times previously, we know the phrase analysis software did not identify at least some phrases of importance to GeoComputation. Second, it must be remembered that our data set comprises only words and phrases that were identified in two or more abstracts at each conference. Finally, the percent frequencies are low for most words and phrases, so any trends suggested must necessarily be tentative. However, we believe the low frequencies accurately reflect the vitality, increasing breadth, and expanding focus of GeoComputation as it has evolved around the world since the first conference at Leeds in 1996.

5.1 Word Analysis

Word patterns tell us that, at least in one sense, GeoComputation has not changed in the five years between 1996 and 2000 - GeoComputation researchers are most concerned with their data, the spatial aspects of what they deal with, modelling, and GIS. The most important word at all conferences except Dunedin is data, and second in importance is the spatial nature of that data. GIS ranks first at Dunedin and third at Leeds, and model ranks third at Bristol and Fredericksburg. What about trends in the use of these words? No obvious trend is apparent in the use of data since 1996. But references to spatial, GIS, model, models, and modelling have all decreased appreciably since1996.

Interest in human geography, based on the five words analyzed, has decreased slightly over time, but human geography is obviously an important and continuing presence in GeoComputation. Interest in hydrology has been somewhat erratic, but hydrology has more or less maintained its status as the most important application area in GeoComputation since 1996. The use of statistics has varied from conference to conference, but emphasis appears to be increasing slightly over time. Artificial intelligence has an important presence at all five conference. Although interest appears to have waned a bit between 1996 and 1998, based on the five words we analyzed, it appears to be increasing since then. The Internet in particular appears to be an emerging tool in GeoComputation with percent frequencies generally increasing since 1996 for the four words we analyzed. Interest in time appears to be decreasing between 1996 and 2000. Interest in data quality generally increased from 1996 to 1999, but unfortunately decreased in 2000.

5.2 Phrase Analysis

Analysis of phrases over the five year period gives a slightly different, and more comprehensive, image of GeoComputation than analysis of words. The 1996 Leeds conference was data based, and participants were significantly more concerned with results, tools, and analysis and modelling, than they were with applications. The focus changed slightly at the 1997 conference, where the analysis and modelling were most important, and data were second in importance. Applications, results, and tools were of least importance. No products were among the most common phrases at the Leeds and Dunedin conferences. The 1998 conference at Bristol again exhibited a slightly different focus: Data were most important, followed closely by tools and results, both of which were more important at Bristol than at the previous conferences, suggesting increasing breadth in perspective over time. Products were mentioned for the first time. Differences among mean ranks are also less at the 1998 conference than at the 1996 and 1997 conferences, suggesting this conference was more balanced than previous conferences. Participants at the Fredericksburg conference placed tools highest in importance. Analysis and modelling were next, followed closely by data. Least importance was placed on applications, results, and products. Differences among mean ranks are slightly greater than at the 1998 conference, but because all six categories are addressed, the trend toward increasing breadth identified at Bristol appears to be continuing. Tools were again of greatest importance at the 2000 conference, followed by analysis and modelling and results. Applications, data, and products were of least importance. It thus appears we are moving from emphasis on the data itself to the tools used and the types of analysis and modelling used on the data, and that the more practical aspects of GeoComputation, exemplified by applications, results and products are gaining in importance over time.

No trends are evident among the different types of data used at the five conferences; there is, nonetheless, some variation among them. Spatial data, geographic data, census data, and feature data were the most frequently used data phrases, although only spatial data were referred to in all five sets of abstracts. Different types of data were typically mentioned at each of conference, most occurring in only one or two sets of abstracts. Emphasis on modelling has remained steady since the GeoComputation conference series was initiated in 1996, although very few types of models or modelling were mentioned in more than one abstract, again emphasizing the diversity in GeoComputation. Although no one type of system has been in continuing use since 1996, reference to various types of systems has remained steady.

Interest in tools has been of utmost importance at all five conferences - two of the five conferences were tool based. Spatial analysis is by far the most frequently used tool in GeoComputation, although percent frequencies have varied considerably from conference to conference. GIS, neural networks, and remote sensing were also important, and each was mentioned in four sets of abstracts. Of the remaining tools identified by the phrase analysis software, only statistical analysis, various algorithms, maps, and cellular automata were mentioned in the abstracts of three conferences. Only two trends are suggested for tools that occurred in three sets of abstracts or more: interest in remote sensing appears to be increasing since 1996, and interest in statistical analysis, although continuing, appears to be decreasing.

With respect to applications, no application phrases were identified by the phrase analysis software among the 2000 abstracts, so the following refers only to 1996 through 1999. And only hydrology, time, the environment, earth sciences, and urban areas, all of which are categories, were referred to in three sets of abstracts. The patterns of percent frequencies for all applications are irregular, with the possible exception of the environment, which may be decreasing. No general trends can be identified for applications.

The Scope of GeoComputation

How do the trends, and the increasing diversity, and breadth that we see in GeoComputation fit with the top-down definitions of Couclelis, Gahegan, and Openshaw? Couclelis' (1998) perspective that GeoComputation is more or less a grab bag of tools applied to geographic problems is not supported by our data: only two of the five conferences were tool based, and both these conferences took place after her paper was published. The change in focus and emphasis between 1996 and 2000, however, support her hope that GeoComputation may develop into something with broader perspective and application, and Macmillan's (1998) contention that it already has. Couclelis also suggested that GeoComputation requires a context before it can receive the financial and institutional support needed to survive. This is already happening: the first appointment of a lecturer in GeoComputation was made in 2000 at the University of Oxford. But as Couclelis said, we are not quite there yet. Gahegan's (1999) view, that the purpose of GeoComputation is to fill the gap in knowledge between the abstract functioning of geocomputational tools and their successful deployment to geographic applications, receives support from the increased breadth that occurs between 1996 and 2000 and the increasing emphasis on results, products, and applications.

Finally, it is difficult to relate Openshaw's (2000) definition to the findings reported here: his definition is abstract and futuristic, and is thus more difficult to relate to our results. We agree that GeoComputation researchers deal with a wide range of geographic problems (and this range appears to be increasing), but whether or not this work is being done within the computational science paradigm cannot be directly addressed using our approach. An additional problem is that his definition is predicated upon the use of high performance computing now and in the future: no such phrase was identified by the software as occurring in two or more abstracts at any conference. So perhaps Openshaw's futuristic view of GeoComputation is just that: futuristic.

So how do we describe the scope of GeoComputation? Based on analyses of words and phrases in the abstracts of papers presented at the five conferences between 1996 and 2000, GeoComputation is the practice of analyzing and modelling spatial data often with GIS and/or neural networks, but using both traditional and newly developed (and developing) computer-based tools as well. These procedures are applied primarily to complex problems involving analysis of large and complicated data sets or databases, e.g., census data, digital elevation models. Many of the problems addressed were heretofore incapable of analysis because of their complexity and the ever-increasing requirement for massive computing power to handle the data. Overall, diversity and breadth is increasing and evolving over time and will continue to do so in the future.

References

Brooks, S.M. and Anderson, M.G., 1998, On the status and opportunities for physical process modelling in geomorphology: in Geocomputation, A Primer, edited by Longley, P.A., Brooks, S.M., McDonnell, R., and Macmillan, B., Chichester, John Wiley and Sons, pp 193-230.

Couclelis, H., 1998, Geocomputation in context: in Geocomputation, A Primer, edited by Longley, P.A., Brooks, S.M., McDonnell, R., and Macmillan, B., Chichester, John Wiley and Sons, pp 17-29.

Diaz, J., Tynes, R., Caldwell, D., and Ehlen, J., eds., 1999, GeoComputation 99, Proceedings of the 4^th International Conference on GeoComputation, 25-28 July, 1999, Mary Washington College, Fredericksburg, Virginia: Greenwich, U.K., GeoComputation CD-ROM.

Feng, F. and Croft, W.B., 2000, Probabilistic Techniques for Phrase Extraction: Amherst, MA, University of Massachusetts, Center for Intelligent Information Retrieval, Department of Computer Science. Technical Report IR-187.

Gahegan, M., 1999, What is Geocomputation? Transactions in GIS, vol. 3 (3), pp 203-206.

Longley, P.A., 1998, Foundations: in Geocomputation, A Primer, edited by Longley, P.A., Brooks, S.M., McDonnell, R., and Macmillan, B.: Chichester, John Wiley and Sons, pp 3-15.

Longley, P.A., Brooks, S.M, McDonnell, R., and Macmillan, B. (eds.), 1998, Geocomputation, A Primer: Chichester, John Wiley and Sons, 278 p.

Macmillan, B. 1998, Epilogue: in Geocomputation, A Primer, edited by Longley, P.A., Brooks, S.M., McDonnell, R., and Macmillan, B., Chichester, John Wiley and Sons, pp 257-264.

Openshaw, S., 2000, GeoComputation: in GeoComputation, edited by Openshaw, S. and Abrahart, R.J., London, Taylor and Francis, pp 1-31.

Openshaw, S. and Abrahart, R.J., Preface: in GeoComputation, edited by Openshaw, S. and Abrahart, R.J., London, Taylor and Francis, pp ix-xii.

Pascoe, R.T., ed., 1997, GeoComputation 97, Proceedings of the second annual conference, 26-29 August, 1997, University of Otago, Dunedin, New Zealand: Dunedin, University of Otago, 411 p.