Stephen Harding
Center for Intelligent Information Retrieval, Department of Computer
Science, University of Massachusetts, Amherst, MA 01003-4610
E-mail: harding@ciirsrv.cs.umass.edu
However, we believe that the cumulative body of research as expressed in the abstracts of the papers, posters, and keynote addresses from the five GeoComputation conferences may best characterize GeoComputation, not the work or definition of any one individual. Consequently, this paper does not attempt to define GeoComputation per se, but explores the scope or nature of GeoComputation by examining the body of research presented at the five conferences between 1996 and 2000 at the University of Leeds, UK, the University of Otago, NZ, the University of Bristol, UK, and Mary Washington College, USA, as well as most abstracts submitted for the conference at the University of Greenwich, UK. In other words, this is a bottom-up approach: we look at GeoComputation in terms of what GeoComputation researchers say they do.
Text analysis software developed by the Center for Intelligent Information Retrieval at the University of Massachusetts was used for the analysis. Word and phrase frequencies in the abstracts for each conference were analyzed separately and then compared. The results provide insight into GeoComputation by describing the range of research topics, core technologies, and concepts encompassed by GeoComputation. General trends and patterns are identified and defined in a semi-quantitative manner.
In the Epilogue of Geocomputation, A Primer, the same book in which Couclelis' paper appears, Macmillan (1998) essentially takes issue with Couclelis' position. He believes that GeoComputation includes the latest forms of computational geography and that it is not an incremental development. He accepts that sound theory is needed, but believes that it has to some extent already been provided by Openshaw, at least as a form of inductivism. Macmillan's "definition" of GeoComputation is much broader than that suggested by Couclelis; he believes GeoComputation ". . . is concerned with the science of geography in a computationally sophisticated environment." (p. 258).
Gahegan (1999), like Couclelis, sees the concern of GeoComputation as ". . . to enrich geography with a toolbox of methods to model and analyze a range of highly complex, often non-deterministic problems." (p 204). But he views GeoComputation as an enabling technology, one needed to fill the ". . . gap in knowledge between the abstract functioning of these tools . . . and their successful deployment to the complex applications and data sets that are commonplace in geography." (p. 206). He also lists a series of challenges that GeoComputation must overcome, but these problems involve the application of the sophisticated tools available to GeoComputation researchers, as well as the complex problems associated with handling large, unwieldy data sets. Gahegan's is a practical approach to GeoComputation, but one with promise and vision, different from Couclelis's philosophical, possibly pessimistic, perspective.
Now to the work of Stan Openshaw, who, if anyone can be so-called, is the father of GeoComputation. In the Preface to GeoComputation, Openshaw and Abrahart (2000) define GeoComputation as a fun, new word. They see GeoComputation as a follow-on revolution to GIS; once the GIS databases are set up and expanded, GeoComputation takes over. They state that "GeoComputation is about using the various different types of geo-data and about developing relevant geo-tools within the overall context of a 'scientific' approach." (p. ix); it is about solving all types of problems, converting computer "toys" into useful tools that can be usefully applied. And it is about using existing tools to accomplish this and finding new uses for existing tools. They also link GeoComputation to high performance computing. As both Couclelis and Gahegan did, Openshaw and Abrahart list a series of challenges for GeoComputation, but challenges inherent to GeoComputation, not challenges that GeoComputation must overcome to survive.
Openshaw (2000) states that GeoComputation ". . . can be regarded . . . as the application of a computational science paradigm to study a wide range of problems in geographical and earth systems . . . contexts." (p. 3). He identifies three aspects that make GeoComputation special. The first is emphasis on "geo" subjects, i.e., GeoComputation is concerned with geographical or spatial information. Second, the intensity of the computation required is distinctive. It allows new or better solutions to be found for existing problems, and also lets us solve problems heretofore insoluble. Finally, GeoComputation requires a unique mind set, because it is based on ". . . substituting vast amounts of computation as a substitute for missing knowledge or theory and even to augment intelligence." (p. 5). Openshaw clearly sees GeoComputation as dependent upon high performance computing, as suggested above. He sees the challenge for GeoComputation as developing the new ideas, methods, and paradigms needed to use increasing computer speeds to do useful science in a variety of geo contexts.
Openshaw (2000) also looks at definitions of GeoComputation presented by Couclelis (1998), Longley (1998), and Macmillan (1998). He disagrees with Couclelis: GeoComputation is not just using computational techniques to solve spatial problems, it is a major paradigm shift affecting how the computing is applied. Openshaw sees GeoComputation as a much bigger thing, i.e., ". . . the use of computation as a front-line problem-solving paradigm which offers a new perspective and a new paradigm for applying science in a geographical context." (p. 8). Openshaw basically agrees with Macmillan's description of the nature of GeoComputation, but he would probably disagree with the scope: he views GeoComputation as something much larger and broader. GeoComputation relies on the potential of applying of high performance computing to solve currently unsolvable or even unknown problems. It awaits the involvement of appropriately innovative, forward-thinking 'geocomputationalists' to achieve that potential.
Openshaw (2000) also notes that not all researchers agree with his definition of GeoComputation. He believes that this may be because other definitions, such as that of Couclelis, focused on the contents of presentations made at the previous GeoComputation conferences. He believes definition should be developed in a more abstract, top-down manner.
These top-down definitions rely on each author's perspective, based on their personal backgrounds in geography, to place GeoComputation in the overall context of geographic and computational research. Longley (1998), however, states that at this point we must assume that GeoComputation is ". . . what its researchers and practitioners do, nothing more, nothing less . . ." (p. 9). Regardless of Openshaw's (2000) comments about this type of definition, we agree with Longley's statement and therefore use the cumulative body of research as expressed in the abstracts of the papers, posters, and keynote addresses from the GeoComputation conferences to characterize GeoComputation. We do not attempt to define GeoComputation, but explore its scope or nature by examining the body of research presented at the five conferences held at the University of Leeds, UK, in 1996; at the University of Otago, Dunedin, NZ, in 1997; at the University of Bristol, UK, in 1998; at Mary Washington College, Fredericksburg, USA, in 1999; and at the University of Greenwich, Chatham, UK, in 2000. We attempt to determine "what's in" and "what's out," as well as evaluating more subtle changes in research emphasis over time. In other words, this is a bottom-up approach: we look at GeoComputation in terms of what GeoComputation researchers say they do (in the context of acceptable material as determined by the individual conference organizers) and use this information to elucidate future trends. What makes this approach different is that we analyze the abstracts of the five conferences in a semi-quantitative way.
The abstracts from the Leeds, Bristol, and Fredericksburg conferences were transferred from the GeoComp 99 CD-ROM proceedings (Diaz, et al., 1999) to a word processor. Abstracts for the Bristol keynote lectures and those from the Dunedin conference were entered manually from Longley, et al. (1998) and Pascoe (1997), respectively. Those from the Chatham conference were downloaded from a web site set up by the conference organizers for review by the International Steering Committee for GeoComputation or received directly from the conference organizers by email. One file was made for each conference. These files were then edited to remove all parentheses, numbers, equations, special characters, bolding, underlining, and italics. All references cited in the text or at the end of an abstract were removed, as were place names and the names of individuals (e.g., "Horton" as in "Horton's method") and institutions. All acronyms and abbreviations were written out in full, except for "GIS" and "www." Finally, the English was standardized using the UK English option in the word processor spell checker. The files were saved in ASCII format and sent to the Center for Intelligent Information Retrieval in the Department of Computer Science at the University of Massachusetts for word and phrase analysis.
Two files were generated for each conference, one containing words and the other, phrases. Each file consists of the list of terms or phrases sorted in decreasing order according to the number of times that word or phrase is used in the abstracts and the number of abstracts in which each word or phrase occurs, such as shown in Table 1. The phrase "spatial analysis" thus occurs nine times in eight abstracts and the phrase "data model(s)," six times in four abstracts.
|
|
|
||
spatial | analysis |
|
|
|
digital | elevation | model(s) |
|
|
spatial | variation(s) |
|
|
|
functional | pattern |
|
|
|
data | model(s) |
|
|
|
geographic | space |
|
|
|
visibility | index(ices) |
|
|
|
WordNet, a dictionary-based lookup table, was used to identify parts of speech for finding noun phrase candidates (Feng and Croft, 2000). The sequence number was set at a maximum of five words in a row. Sentence boundaries were respected as was sequence ordering, so phrase candidates that span two or more sentences were not included.
A trained Markov model was then applied to extract the noun phrases from the phrase candidates. Delimiting rules included stop words (commonly occurring words such as "the," "a," or "and"), numbers, punctuation (e.g., hyphens, quotation marks, periods), verb patterns, and formatting delimiters, such as table fields and section heads.
A Markov model is a statistical process in which future events are determined by present state rather than by the method path in which the present state arose. The phrase detection model uses the set of states for each word position in the phrase. A single term has a single set of states, while a five-word phrase has five such state sets. There are probabilities that each word in the vocabulary (all unique words in the collection) can occur in a given phrase word state set. Non-noun words are removed from the end of phrase candidates, and the phrases are clustered based on occurrence frequency above a threshold value. Candidates not passing the threshold are discarded, leaving the proposed phrases.
Each file was then run through a simple C program that adds the frequencies of occurrence for each repeat entry. An intermediate file was produced in which identical phrase entries were summed for single abstracts; this was not done for words. The results of the summation process were then sorted according to word or phrase frequency and abstract frequency for each file. These final summation files were used for the analysis of word and phrase frequencies for each conference: "abstract frequency" in Table 1 thus represents the total number of abstracts that contained the word or phrase for an individual conference. Word or phrase frequency refers to the total number of times the word or phrase is used. The summations ignore case, so "GIS" and "gis," for example, were considered equal. Plural forms were then combined manually into one form for the phrases. Thus, "neural network" and "neural networks" have combined statistics and are entered as "neural networks(s). A test set of data was checked for accuracy, and spot checks indicated the results were satisfactory.
The resulting files, two for each conference, were reviewed, and all words and phrases that occurred in only one abstract were deleted to reduce the files to manageable size. Words and phrases meaningless in a GeoComputation context, e.g., words such as "versa" as in "vice versa", "priori" as in "a priori;" and phrases such as "paper describes," "works well," and "wide variety" were also deleted. Table 2 shows the changes in file size as these steps were accomplished. Finally, the percent frequency for each word or phrase was determined by dividing the number of abstracts in which that word or phrase occurred by the number of abstracts at that particular conference. This not only permitted the words and phrases from one set of conference abstracts to be evaluated in a semi-quantitative manner, but also allowed comparison between conferences. Because plural forms were merged with singular forms for the phrases, some inflation occurs in the phrase percent frequencies.
Table 2. Reduction of Data File Content
Conference |
|
|
||
|
|
|
|
|
Leeds |
|
|
|
|
Dunedin |
|
|
|
|
Bristol |
|
|
|
|
Fredericksburg |
|
|
|
|
Chatham |
|
|
|
|
As noted above, in order to compare words among the five conferences, the number of abstracts in which a word occurred for each conference was normalized according to the number of abstracts presented at that conference. Table 3 lists the percent frequencies for the most frequently used words for each conference, and Table 4 shows the percent frequencies for the most frequently used words common to all five conferences. The order of importance was determined by sorting the list using mean percent frequency for each common word. We attempted to restrict ourselves to the 25 most frequently used words at each conference, but this was not possible because word 25 was typically in the middle of a list of words with the same frequency. The source data used for the word analysis is available for the reader's reference.
Table 3: Most Frequently Used Words at Each of the Five Conferences
|
|
|
|
|
|||||
|
|
|
|
|
|
|
|
|
|
data |
|
GIS |
|
data |
|
data |
|
data |
|
spatial |
|
data |
|
spatial |
|
spatial |
|
spatial |
|
GIS |
|
spatial |
|
model |
|
model |
|
analysis |
|
model |
|
information |
|
analysis |
|
Information |
|
information |
|
analysis |
|
system |
|
modelling |
|
analysis |
|
GIS |
|
models |
|
analysis |
|
models |
|
time |
|
models |
|
information |
|
models |
|
GIS |
|
GIS |
|
model |
|
time |
|
geographic |
|
time |
|
models |
|
system |
|
modelling |
|
systems |
|
scale |
|
area |
|
process |
|
systems |
|
time |
|
area |
|
modelling |
|
time |
|
number |
|
environmental |
|
information |
|
process |
|
area |
|
areas |
|
model |
|
system |
|
system |
|
areas |
|
system |
|
classification |
|
systems |
|
number |
|
map |
|
process |
|
tool |
|
number |
|
space |
|
scale |
|
area |
|
field |
|
process |
|
systems |
|
modelling |
|
geographic |
|
area |
|
geographic |
|
local |
|
structure |
|
processes |
|
boundaries |
|
processes |
|
areas |
|
number |
|
range |
|
computer |
|
resolution |
|
geographic |
|
statistical |
|
point |
|
modelling |
|
areas |
|
resolution |
|
region |
|
statistical |
|
tools |
|
flow |
|
processes |
|
range |
|
structure |
|
database |
|
space |
|
surface |
|
points |
|
software |
|
land |
|
structure |
|
digital |
|
local |
|
scale |
|
local |
|
environment |
|
distribution |
|
computer |
|
computer |
|
spatially |
|
classification |
|
features |
|
objects |
|
tools |
|
areas |
|
distribution |
|
field |
|
systems |
|
interaction |
|
dimensions |
|
location |
|
tools |
|
classification |
|
|
|
landscape |
|
numerical |
|
values |
|
software |
|
|
|
map |
|
|
|
|
|
space |
|
|
|
maps |
|
|
|
|
|
tools |
|
|
|
patterns |
|
|
|
|
|
computational |
|
|
|
points |
|
|
|
|
|
environment |
|
|
|
positioning |
|
|
|
|
|
series |
|
|
|
scale |
|
|
|
|
|
|
|
Table 4. Percent Frequencies for the 25 Most Frequently Used Common Words
|
|
|
|
|
|
Word |
|
|
|
|
|
data |
|
|
|
|
|
spatial |
|
|
|
|
|
GIS |
|
|
|
|
|
analysis |
|
|
|
|
|
model |
|
|
|
|
|
information |
|
|
|
|
|
models |
|
|
|
|
|
time |
|
|
|
|
|
modelling |
|
|
|
|
|
system |
|
|
|
|
|
systems |
|
|
|
|
|
area |
|
|
|
|
|
process |
|
|
|
|
|
number |
|
|
|
|
|
areas |
|
|
|
|
|
tools |
|
|
|
|
|
geographic |
|
|
|
|
|
classification |
|
|
|
|
|
distribution |
|
|
|
|
|
processes |
|
|
|
|
|
computer |
|
|
|
|
|
field |
|
|
|
|
|
tool |
|
|
|
|
|
spatially |
|
|
|
|
|
global |
|
|
|
|
|
From these data, we can see that geocomputationalists are more concerned with data and the spatial nature of their data, than they are with modelling, results, or applications. Our data consist of numbers, points, and models, at least some of which are digital. The spatial nature of GeoComputation is exemplified by such words as areas, maps, patterns, scale, space, distribution, location, and region. The tools we use include GIS, statistics, classification, maps, positioning, and images. We apply the tools to processes, land, landscapes, and the environment over time. We deal with information, systems, and all things geographic, and we use computers to do this.
We can also look at tools and applications in this same way. The most common tools are unspecified; the only specific tools, in addition to GIS, among the words with the highest percent frequencies, are classification and computer (Figure 2). This is because most tools are phrases, e.g., neural networks, cellular automata. Emphasis on unspecified tools was least at the Bristol conference, and greatest at the Leeds and Dunedin conferences. If one combines the frequencies for the two words (tool and tools), however, emphasis is quite consistent from conference to conference. The use of GIS was again highest at Leeds and Dunedin, and although percent frequencies were lower at the 1998, 1999, and 2000 conferences, they appear stable, ranging from about 36% to 42%, as noted above. Emphasis on classification was greatest at Dunedin; the percent frequencies at the other four conferences were similar, ranging from just under 15% to just over 17%. Percent frequencies for computer decreased consistently from 1996 through 1999, but rose slightly in 2000. Figure 2 suggests that emphasis on tools in general has diminished over time; percent frequencies for individual tools are becoming more similar.
Longley (1998) implies that GIS provides the basis for GeoComputation, yet others (e.g., the conference announcement for Dunedin) state emphatically that this is not so. Percent frequencies in Table 4 indicate that Longley's statement is true. It should be noted, however, that many references to GIS in the abstracts are made in an almost negative or derogatory way: authors typically refer to advances or improvements that their work has made to "traditional GIS." "Geographic Information Systems" is, of course, a phrase, but all usages were converted to the word "GIS" in the abstracts before word analysis was done. The words "geographic," "information," and "systems," as they appear in Table 4, are in fact separate words. Typical uses of these words are geographic information and information systems.
Only two applications appear in the list of 25 words with the highest percent frequencies: process and processes (Figure 3). Percent frequencies for both were highest at Leeds in 1996. Percent frequencies decrease at Dunedin, but then steadily increase from 1998 through 2000. Overall, percent frequencies for process and processes appear to be decreasing over time.
We can also look at words not included on the most frequently used list, for example, GeoComputation and geocomputational. Figure 4 shows how use of these terms has changed since 1996. Percent frequencies were lowest at Leeds, when the word was first introduced, then increased appreciably at Dunedin. Percent frequencies for GeoComputation decreased from 1997 through 1999, then increased substantially in 2000. Geocomputational has decreased in frequency consistently since 1997. These percent frequencies, however, are quite low. GeoComputation was mentioned in about 3% of the abstracts in 1996; in about 11%, in 1997; in about 7%, in 1998; in just over 2%, in 1999; and in about 12%, in 2000. Geocomputational appeared in about 2% of the abstracts in 1996; 12% in 1997; just over 8% in 1998; and in about 3.5% of the abstracts in 1999 and 2000. Disregarding the Dunedin and Fredericksburg conferences, the percent frequencies have been remarkably consistent from conference to conference. It is possible that the very low percent frequencies in 1999 are due to "first use" of the term in the United States; very few participants at previous conferences were Americans and many authors were thus unfamiliar with the term. GeoComputation is most commonly used in keynote presentations, in the context of definition, nature, and scope.
At this point, temporal and spatial limitations restrict the number of words that can be compared, so we have arbitrarily chosen to track from 1996 to 2000 several application areas and several tools as well as a couple of words that are of general interest to us. We look at applications in human geography and hydrology; and then at the tools statistics, artificial intelligence, and the Internet; and finally, at time and data quality.
Because the term "GeoComputation" was coined with reference to human geography (Openshaw, 2000), let us begin with this disciplinary area. There are many words from all five conferences that relate to this subject, but few are exclusive to human geography or to the various subdisciplines within human geography. We have therefore selected five words that are related to human geography and are applicable to most areas of research within this field: social, demographic, census, city, and urban. The percent frequencies for these words are shown graphically in Figure 5. But first a few caveats: none of these words occur in two or more Dunedin abstracts, so this data set is not included in the analysis; demographic is used in only one Fredericksburg abstract, and was thus not included in the data set analyzed; and city does not appear in the Bristol abstracts. All terms, with the exceptions of demographic and census, have decreased in percent frequency from 1996 to 2000. Percent frequencies for all words were lowest at Fredericksburg in 1999. Percent frequencies for social, city, and cities were highest at the Leeds conference. The highest frequency for census was at the Chatham conference, which is not surprising with the 2001 UK census imminent. The low frequency for census at the Fredericksburg conference is surprising for the reverse reason; the 2000 US census was imminent at the time. The highest percent frequency for urban occurred at the Bristol conference in 1998 and the lowest at Fredericksburg. Bristol is the largest city in which a GeoComputation conference has been held, and Fredericksburg is the smallest.
Another application area of interest is hydrology (Figure 6). Papers on hydrology have been presented at all five conferences, although there were few at Dunedin. Eight words were selected for analysis, words that occurred in at least four sets of abstracts and that are unequivocally related to hydrology. These words are: catchment, drainage, flood, hydrologic(al), river, runoff, stream, and water. Percent frequencies were highest in 1998 and lowest in 1997, where only flood and water were used. Interestingly enough, the frequencies for these two words at Dunedin, 8.9% and 13.3%, respectively, were the highest for those words at all five conferences. Percent frequencies were relatively stable for the 1996, 1999, and 2000 conferences, in which these words appeared in about 43% to 52% of the abstracts. With respect to the individual words, percent frequencies for flood and water decreased from 1996 to 2000, and those for hydrologic(al) and stream increased. Percent frequencies for runoff and catchment are generally up; river and drainage appear to be relatively stable. It thus appears that hydrology continues to be a major application area for geocomputational research.
One tool not included in the 25 words with highest percent frequencies that can be addressed with respect to single words is statistics (Figure 7). Percent frequencies for statistical and statistics were highest at the 1996 conference, and then decreased considerably at the 1997 conference, but between 1997 and 2000, percent frequencies appear to have stabilized, appearing in about 6% to 10% and about 3.5% to just under 7% of the abstracts, respectively. Five words that can unequivocally be related to statistics that occurred in abstracts from at least four conferences were chosen for analysis: correlated, multivariate, nonlinear, regression, and variance. All five words were used at the 1996, 1998, 1999, and 2000 conferences; none were used in 1997. Of these five words, only regression and variance appear to be increasing in use, and only multivariate is decreasing. Nonlinear appears to be rather unstable, whereas correlated is relatively stable. All percent frequencies are low; the only percent frequency over 25% being for nonlinear at Fredericksburg. Mean percent frequencies are below 8.5% for the remaining four words. Traditional statistical analysis, however, has a continuing, albeit low level, presence in the conference series, and thus appears to be here to stay as a geocomputational tool.
Because artificial intelligence plays such a big role within GeoComputation, it is useful to look at tool words related to this specialty. Five words were selected for analysis: neural, fuzzy, expert, genetic, and automata (Figure 8). It is obvious that all five words are actually parts of phrases, i.e., neural network(s), fuzzy logic, expert systems, genetic algorithms and programming, and cellular automata, but we believe analysis of individual words may provide meaningful information about this area of expertise. All five words appear in the abstracts for 1996, 1998, 1999, and 2000; only neural, expert, and genetic occur in the 1997 abstracts. Percent frequencies for all five words were at their highest at Leeds. Expert and genetic decreased in percent frequency from 1996 through 1998, then increased from 1999 to 2000. Automata was relatively stable from 1996 through 1999, but decreased in frequency in 2000. Neural, the word with highest percent frequency, decreased in frequency between 1996 and 1997, and then increased from 1998 through 2000. Fuzzy decreased in frequency between 1996 and 1998, increased in 1999, but then decreased again in 2000. If one ignores the increase in 1999, percent frequencies for fuzzy seem to be decreasing over time. Overall, percent frequencies for these artificial intelligence tools decreased from 1996 to 1998, and then increased in frequency from 1998 to 2000, suggesting a renewal of interest.
One final tool that can be effectively evaluated using individual words is the Internet (Figure 9). Reference to the Internet itself and all associated terms was at a maximum at Fredericksburg and Chatham. Furthermore, percent frequencies, with the exception of the 1997 conference, have steadily increased since 1996. Four words - Internet, web, worldwide web (abbreviated www in word analysis), and online - were selected for analysis. Each word is present in the abstracts from at least three conferences. Percent frequencies are low: only web at Fredericksburg occurred in more than 20% of the abstracts. However, the number of words used at the individual conferences is increasing over time: three words appeared in the Leeds abstracts; one, in the Dunedin abstracts; three, in the Bristol abstracts; and all four, in both the Fredericksburg and Chatham abstracts. Use of the words online, which did not appear until 1998, and the Internet, are increasing, and the percent frequency of web is generally increasing. Only percent frequencies for worldwide web are generally decreasing. This may result from general changes in word usage over time or from lack of consistency by the person who prepared the abstracts for word and phrase analysis (JE). There are also many references to various web sites in abstracts from all five conferences. These references were not included in the analysis, primarily because they were given to provide information in addition to that found in the abstract about the author's work. Regardless of the low percent frequencies, however, we believe this is an emerging technology in the field of GeoComputation.
Time is an important concept at all five conferences. Three words that deal with time, which occurred in the abstracts of three or more conferences, were selected for analysis: time, spatiotemporal, and temporal (Figure 10). Percent frequencies for all words, except time, are low, with a maximum of just under 17% for temporal at Fredericksburg. Three of these words were used in the Leeds, Fredericksburg, and Chatham abstracts; and two in the Dunedin and Bristol abstracts. The highest percent frequencies were achieved at Leeds. However, mean percent frequencies have decreased over time, possibly suggesting diminishing interest. Only spatiotemporal is increasing in use. Time itself is decreasing (!!). Only temporal may be stable; the pattern for this word is irregular.
Finally, as noted by Brooks and Anderson (1998), data quality is a key issue in GeoComputation. So before moving to the analysis of phrases, we look at how GeoComputation researchers treat data with reference to data quality, accuracy, errors, and uncertainty (Figure 11). Five words were selected for analysis - quality, accurate, error, errors, and uncertainty - although the word quality may not be unequivocal in this context. The five words were used at the 1996, 1998, 1999, and 2000 conferences; only four were used in 1997. In general, quality is decreasing over time. However, percent frequencies for uncertainty and accurate increased from 1996 through 1999, although they decrease in 2000. Error appears relatively stable, but percent frequencies for errors are decreasing. With the emphasis on modelling in GeoComputation, often using synthetic data or data over which the modeller has little to no control (e.g., data from various Internet sites, digital elevation data, or census data), an understanding of data quality, and the ability to address accuracy, error, and uncertainty quantitatively is often of crucial importance. One hopes the decreases in the percent frequencies for these words at Chatham are aberrations.
Several caveats are required before we begin analysis of phrases. First, spot checks of the phrase source data show that what the software identified as a phrase is not necessarily. Examples of such errors include "biochemistry exhibiting reflectance," "dimensional topological," and "important research remains." This problem is at least partly due to the complexity of the English language: "remains," for example, can be either a noun or a verb, and in this case the software identified a verb as a noun, producing a meaningless phrase. Second, we have found that certain phrases of interest to us, such as high performance computing and exploratory data analysis, were not identified by the software. Finally, the software is arbitrary in its identification process. For example, we are interested in the phrase "artificial intelligence." Let us say that the software identified this phrase in two abstracts at one conference. But it also identified the words "artificial" and "intelligence" as parts of a different phrase, e.g., "artificial intelligence technologies," in two abstracts. We cannot combine these and say the phrase "artificial intelligence" occurs in four abstracts because we do not know whether the two abstracts in which "artificial intelligence" occurs are the same as the two in which "artificial intelligence technologies" occur. We could determine this, by checking every single phrase of interest in every abstract, but because of the laborious nature of this task, we chose to use the results of phrase analysis as is. In this example, then, we would say there were only two occurrences of the phrase "artificial intelligence." Finally, it must be remembered that our data set includes only those phrases that occur in two or more abstracts at each conference. The results of phrase analysis discussed below must thus take these caveats into consideration.
Table 5A. Most Frequently Used Phrases, 1996, 1997, and 2000
|
|
|
|
|||
|
Phrase |
|
Phrase |
|
Phrase |
|
|
spatial data |
|
spatial data |
|
spatial data |
|
|
data set(s) |
|
spatial information |
|
neural networks |
|
|
neural network(s) |
|
spatial analysis |
|
data set(s) |
|
|
spatial analysis |
|
neural networks |
|
spatial analysis |
|
|
spatial object(s) |
|
quantitative revolution |
|
genetic algorithm(s) |
|
|
spatial distribution |
|
digital elevation |
|
GIS software |
|
|
knowledge base |
|
aerial photographs |
|
Voronoi diagram(s) |
|
|
digital elevation model(s) |
|
geographic space |
|
census data |
|
|
geographic data |
|
geographic data |
|
cluster analysis |
|
|
genetic algorithm |
|
spatial dimensions |
|
data structure |
|
|
expert system(s) |
|
computational geography |
|
digital elevation models |
|
|
cellular automata |
|
resource management |
|
geographic information |
|
|
data model(s) |
|
expert systems |
|
time series |
|
|
data analysis |
|
|
|
catchment area |
|
|
data structure(s) |
|
|
|
cellular automata |
|
|
spatial relations |
|
|
|
computational tool(s) |
|
|
sensitivity analysis |
|
|
|
computer simulation |
|
|
fractal dimension |
|
|
|
correlation coefficient |
|
|
genetic programming |
|
|
|
data mining |
|
|
statistical analysis |
|
|
|
digital photogrammetry |
|
|
elevation model(s) |
|
|
|
earth=s surface |
|
|
time series |
|
|
|
economic variables |
|
|
raster GIS |
|
|
|
elevation model(s) |
|
|
mathematical model(s) |
|
|
|
GeoComputation techniques |
|
|
artificial intelligence |
|
|
|
GIS technology |
|
|
analytical tool(s) |
|
|
|
GIS application |
|
|
modelling tools |
|
|
|
grid modelling |
|
|
physical processes |
|
|
|
predictive models |
|
|
regression analysis |
|
|
|
proximity relations |
|
|
spatial information |
|
|
|
satellite images |
|
|
spatial reasoning |
|
|
|
self-organizing map |
|
|
spatial relationship(s) |
|
|
|
spatial object(s) |
|
|
statistical model(s) |
|
|
|
spatial distribution |
|
|
mathematical modelling |
|
|
|
spatiotemporal data |
|
|
regional scale(s) |
|
|
|
SPOT image(s) |
|
|
|
|
|
|
statistical analysis |
|
|
|
|
|
|
structural analysis |
|
|
tools |
|
|
temporal scale(s) |
|
|
|
analysis and modelling |
|
|
transition rules |
|
|
|
data |
|
|
urban system(s) |
|
|
|
results |
|
|
vector data |
|
|
|
products |
|
|
visual basic |
|
|
|
applications |
|
|
Voronoi modelling |
|
|
|
|
||
|
Phrase |
|
Phrase |
|
|
spatial data |
|
data set(s) |
|
|
spatial analysis |
|
spatial data |
|
|
data set(s) |
|
digital elevation model(s) |
|
|
neural network(s) |
|
neural network(s) |
|
|
spatial relations |
|
remote sensing |
|
|
spatial distribution(s) |
|
spatial distribution |
|
|
data model(s) |
|
spatial analysis |
|
|
digital elevation model(s) |
|
fuzzy logic |
|
|
cellular automata |
|
elevation models |
|
|
enumeration district(s) |
|
hydrologic modelling |
|
|
spatial interaction models |
|
triangulated irregular network |
|
|
spatial pattern(s) |
|
geographic space |
|
|
spatial resolution |
|
spatial statistic(s) |
|
|
spatial scale(s) |
|
data analysis |
|
|
spatial accuracy |
|
urban areas |
|
|
rainfall amounts |
|
cellular automata |
|
|
geographic analysis |
|
land cover |
|
|
catchment area(s) |
|
time period(s) |
|
|
urban area(s) |
|
time step(s) |
|
|
drainage basin |
|
operational system(s) |
|
|
climate change(s) |
|
decision tree(s) |
|
|
social characteristics |
|
data warehouse |
|
|
land cover |
|
economic activity |
|
|
attribute data |
|
genetic algorithm |
|
|
census data |
|
propagation algorithm(s) |
|
|
geographic data |
|
spatial autocorrelation |
|
|
spatial databases |
|
population census |
|
|
fractal dimension(s) |
|
fuzzy classification |
|
|
GIS environment |
|
statistical classifier(s) |
|
|
soil erosion |
|
supervised classifier |
|
|
geometric feature(s) |
|
thin clients |
|
|
spatial features |
|
GIS community |
|
|
visibility index(ices) |
|
environmental conditions |
|
|
spatial information |
|
spatial correlation |
|
|
data integration |
|
feature data |
|
|
human intervention |
|
geographic data |
|
|
visual intrusion |
|
GIS data |
|
|
process laws |
|
hyperspectral data |
|
|
elevation model |
|
rainfall data |
|
|
computer models |
|
sample data |
|
|
hydrologic model(s) |
|
sensed data |
|
|
numerical model(s) |
|
simulated data |
|
|
spatial model(s) |
|
training data |
|
|
hydrologic modelling |
|
drainage density |
|
|
numerical modelling |
|
urban development |
|
|
spatial objects |
|
standard deviation |
|
|
GIS packages |
|
spatial domain |
|
|
river flow prediction |
|
natural environment(s) |
|
|
pore water pressure |
|
geological features |
|
|
house prices |
|
linear feature(s) |
|
|
new road(s) |
|
surface features |
|
|
transition rules |
|
water flow |
|
|
factor of safety |
|
vector format |
|
|
catchment scale |
|
power function(s) |
|
|
geographic space |
|
data fusion |
|
|
urban space(s) |
|
hyperspectral imagery |
|
|
gauging stations |
|
spatial information |
|
|
time step |
|
spatial interpolation techniques |
|
|
earth's surface |
|
ordinary kriging |
|
|
support system |
|
hidden layer(s) |
|
|
graph theory |
|
edge length(s) |
|
|
visualization tools |
|
maximum likelihood |
|
|
sediment transport |
|
decision maker(s) |
|
|
soil type(s) |
|
binary map(s) |
|
|
spatial variability |
|
geologic maps |
|
|
spatial variation(s) |
|
topographic maps |
|
|
|
|
error matrix(ces) |
|
|
|
|
statistical methods |
|
|
|
|
cellular model |
|
|
|
|
data model(s) |
|
|
|
|
dimensional models |
|
|
|
|
environmental models |
|
|
tools | spatial modelling |
|
|
|
analysis and modelling | road network(s) |
|
|
|
data | mineral occurrence |
|
|
|
results | field of view |
|
|
|
products | predictor pattern(s) |
|
|
|
applications | spatial patterns |
|
|
|
|
|
training phase |
|
|
|
|
gold potential |
|
|
|
|
classification process |
|
|
|
|
spatial processes |
|
|
|
|
surface processes |
|
|
|
|
physical properties |
|
|
|
|
temporal resolution(s) |
|
|
|
|
temporal scale(s) |
|
|
|
|
training set |
|
|
|
|
cell space |
|
|
|
|
slope stability |
|
|
|
|
cell states |
|
|
|
|
topological structure |
|
|
|
|
spatial structure(s) |
|
|
|
|
decision support |
|
|
|
|
earth's surface |
|
|
|
|
self-organizing system |
|
|
|
|
GIS system(s) |
|
|
|
|
information system(s) |
|
|
|
|
GIS technology |
|
|
|
|
classification tree(s) |
|
|
|
|
Delaunay triangulation |
|
|
|
|
ground truth |
|
|
|
|
data values |
|
|
|
|
pixel values |
|
Eleven tools - neural networks, knowledge-based (systems), genetic algorithms, expert systems, cellular automata, genetic programming, raster GIS, artificial intelligence, various unspecified analytical and modelling tools, and spatial reasoning - were used at the Leeds conference. The types of analysis used were spatial, data, sensitivity, statistical, and regression analysis; and mathematical modelling. The data used comprised spatial data, data sets, spatial objects, digital elevation models, geographic data, data models and structures, elevation models, time series, and mathematical and statistical models. Results obtained included spatial distributions and relations, fractal dimensions, and spatial information and relationships. This research was applied to physical processes and done at regional scales. If we look at each of the categories in Table 6, we can gain some understanding from these phrases with respect to emphasis at the 1996 conference. The mean ranks (ranks are given in ascending order, so the lowest mean rank equates to the lowest rank, and is thus "best" suggest that the 1996 conference was a data-based conference. Next in importance to the data used were tools and analysis and modelling. No products occur among the 35 phrases, and only one application, physical processes.
Table 6. Comparison of Categories, Leeds, 1996
|
|
|
||
|
|
|
|
|
Tool |
|
|
|
|
Analysis and modelling |
|
|
|
|
Data |
|
|
|
|
Results |
|
|
|
|
Products |
|
|
|
|
Applications |
|
|
|
|
Table 7: Comparison of Categories, Dunedin, 1997
|
|
|
||
|
|
|
|
|
Tools |
|
|
|
|
Analysis and modelling |
|
|
|
|
Data |
|
|
|
|
Results |
|
|
|
|
Products |
|
|
|
|
Applications |
|
|
|
|
Table 8. Comparison of Categories, Bristol, 1998
|
|
|
||
|
|
|
|
|
Tool |
|
|
|
|
Analysis and modelling |
|
|
|
|
Data |
|
|
|
|
Results |
|
|
|
|
Products |
|
|
|
|
Applications |
|
|
|
|
Table 9. Comparison of Categories, Fredericksburg, 1999
|
|
|
||
|
|
|
|
|
Tool |
|
|
|
|
Analysis and modelling |
|
|
|
|
Data |
|
|
|
|
Results |
|
|
|
|
Products |
|
|
|
|
Applications |
|
|
|
|
Table 10. Comparison of Categories, Chatham, 2000
|
|
|
||
|
|
|
|
|
Tools |
|
|
|
|
Analysis and modelling |
|
|
|
|
Data |
|
|
|
|
Results |
|
|
|
|
Products |
|
|
|
|
Applications |
|
|
|
|
Table 11: Percent Frequencies for Common Phrase
Phrase |
|
|
|
|
|
spatial data |
|
|
|
|
|
spatial analysis |
|
|
|
|
|
neural network(s) |
|
|
|
|
|
Table 12. Percent Frequencies for the Types of Data Used by GeoComputation Researchers
|
|
|
|
|
|
spatial data |
|
|
|
|
|
geographic data |
|
|
|
|
|
feature data |
|
|
|
|
|
census data |
|
|
|
|
|
training data |
|
|
|
|
|
vector data |
|
|
|
|
|
GIS data |
|
|
|
|
|
attribute data |
|
|
|
|
|
spatiotemporal data |
|
|
|
|
|
socioeconomic data |
|
|
|
|
|
raster data |
|
|
|
|
|
simulated data |
|
|
|
|
|
sample data |
|
|
|
|
|
rainfall data |
|
|
|
|
|
hyperspectral data |
|
|
|
|
|
(remotely) sensed data |
|
|
|
|
|
synthetic data |
|
|
|
|
|
qualitative data |
|
|
|
|
|
multivariate data |
|
|
|
|
|
missing data |
|
|
|
|
|
digital data |
|
|
|
|
|
aggregate data |
|
|
|
|
|
(geo)referenced data |
|
|
|
|
|
|
|
|
|
|
|
spatial databases |
|
|
|
|
|
geographic databases |
|
|
|
|
|
relational databases |
|
|
|
|
|
|
|
|
|
|
|
data sets |
|
|
|
|
|
training (data) set | - |
|
|
|
|
geographic data sets |
|
|
|
|
|
We also deal with models and modelling - a large number of different types - as shown in Table 13. It is obvious from these data that the word model means very different things to different people. No references to specific model(s) and/or modelling were made in more than one abstract in 1997. The most common types of models at the other four conferences are digital elevation and data models, which occur in the abstracts of four conferences, followed closely by elevation models, which occur in the abstracts of three. No pattern emerges from this data, however, because of the absence of references in the Dunedin abstracts and very small number of references in the Chatham abstracts and the overall low percent frequencies. In addition to these three, the only types of models mentioned in more than one set of abstracts are computer models, spatial interaction models, spatial models, and hydrologic models. With respect to modelling, hydrologic modelling is more important than any other type, and its importance appears to be increasing, with the exception of Chatham. Percent frequencies for all types of modelling are typically low. In addition to hydrologic modelling, only spatial modelling appears in more than one set of conference abstracts. No trends can thus be identified.
Table 13. Percent Frequencies for the Types of Models and Modelling
Used by GeoComputation Researchers
|
|
|
|
|
|
digital elevation model(s) |
|
|
|
|
|
data model(s) |
|
|
|
|
|
elevation model(s) |
|
|
|
|
|
computer model(s) |
|
|
|
|
|
spatial interaction models |
|
|
|
|
|
spatial model(s) |
|
|
|
|
|
hydrologic models |
|
|
|
|
|
mathematical model(s) |
|
|
|
|
|
statistical model(s) |
|
|
|
|
|
grid models |
|
|
|
|
|
geographic model(s) |
|
|
|
|
|
hybrid model(s) |
|
|
|
|
|
cellular model |
|
|
|
|
|
dimensional models |
|
|
|
|
|
environmental models |
|
|
|
|
|
numerical model(s) |
|
|
|
|
|
forest fire model |
|
|
|
|
|
dynamic model(s) |
|
|
|
|
|
fractal model |
|
|
|
|
|
fuzzy model |
|
|
|
|
|
fuzzy logic model(s) |
|
|
|
|
|
regression models |
|
|
|
|
|
tessellation models |
|
|
|
|
|
|
|
|
|
|
|
hydrologic modelling |
|
|
|
|
|
spatial modelling |
|
|
|
|
|
mathematical modelling |
|
|
|
|
|
voronoi modeling |
|
|
|
|
|
elevation modelling |
|
|
|
|
|
computer modelling |
|
|
|
|
|
numerical modelling |
|
|
|
|
|
spatial process modelling |
|
|
|
|
|
environmental modelling |
|
|
|
|
|
fuzzy modelling |
|
|
|
|
|
dimensional modelling |
|
|
|
|
|
fuzzy logic modelling |
|
|
|
|
|
spatial interaction modelling |
|
|
|
|
|
Table 14. Percent Frequencies for the Types of Systems Used by
GeoComputation Researchers
|
|
|
|
|
|
expert system(s) |
|
|
|
|
|
information system(s) |
|
|
|
|
|
decision support system |
|
|
|
|
|
global positioning system |
|
|
|
|
|
operational system(s) |
|
|
|
|
|
earths system(s) |
|
|
|
|
|
urban systems |
|
|
|
|
|
prototype system |
|
|
|
|
|
dynamical system(s) |
|
|
|
|
|
GIS system(s) |
|
|
|
|
|
self-organizing system |
|
|
|
|
|
computer system(s) |
|
|
|
|
|
geographic systems |
|
|
|
|
|
integrated system |
|
|
|
|
|
spatial information system |
|
|
|
|
|
Table 15. Percent Frequencies for the Tools Used by GeoComputation Researchers
|
|
|
|
|
|
spatial analysis |
|
|
|
|
|
GIS |
|
|
|
|
|
neural networks |
|
|
|
|
|
statistical analysis |
|
|
|
|
|
remote sensing |
|
|
|
|
|
algorithms (various types) |
|
|
|
|
|
knowledge base & expert (systems) |
|
|
|
|
|
maps (various types) |
|
|
|
|
|
cellular automata |
|
|
|
|
|
classification (various types) |
|
|
|
|
|
fractal analysis |
|
|
|
|
|
fuzzy methods |
|
|
|
|
|
rules |
|
|
|
|
|
visualization |
|
|
|
|
|
web software |
|
|
|
|
|
artificial intelligence |
|
|
|
|
|
cluster analysis |
|
|
|
|
|
geographic/data mining |
|
|
|
|
|
sensitivity analysis |
|
|
|
|
|
(object)-oriented programming |
|
|
|
|
|
geocomputational technology |
|
|
|
|
|
numerical automata |
|
|
|
|
|
computational tools |
|
|
|
|
|
inference engine |
|
|
|
|
|
ordinary kriging |
|
|
|
|
|
intelligent agents |
|
|
|
|
|
model breeding |
|
|
|
|
|
computer graphics |
|
|
|
|
|
genetic programming |
|
|
|
|
|
virtual reality |
|
|
|
|
|
computer systems |
|
|
|
|
|
Spatial analysis and GIS are by far the most important tools used in GeoComputation. References to spatial analysis include spatial correlation, distributions and statistics; spatial operators, processes, and reasoning; and spatial interpolation techniques. Specific phrase references to GIS include GIS functions; ArcInfo, MapInfo; commercial, (two- and three-) dimensional, and raster GIS; the GIS literature; and GIS packages, software, systems, tools, techniques, and technologies. Next in importance are neural networks and statistical analysis, which rank 3 and 4, respectively. References to statistical analysis include (linear) regression; statistical and frequency distributions; power functions; maximum likelihood classifiers; statistical methods and modelling; and summary statistics. The types of algorithms referred to include genetic, neural, parallel, propagation, and serial. Remote sensing references include satellite information; SPOT and hyperspectral imagery; and aerial photographs. Classification tools include fuzzy and statistical classifiers, supervised classification, the classification process; and the various types of maps are binary, two- and three-dimensional, geologic, self-organizing, systemic, and topographic. Fractal analysis includes fractal dimension and fractal geometry; fuzzy methods include fuzzy logic and fuzzy set theory; and rules are either simple or transition. Visualization includes tools and methods and dynamic visualizations. Mean percent frequencies for each tool or tool category are listed in Table 15; those with mean percent frequencies equal to or greater than 1% are shown graphically in Figure 15.
Figure 16 shows the percent frequencies for the tool phrases and categories that occur in two or more sets of abstracts from the five conferences. The only tool phrases or categories that occur at all five conferences are spatial analysis and neural networks; references to GIS, remote sensing, and cellular automata occur at four conferences; and statistical analysis, and various algorithms and maps were mentioned at three conferences. The remaining 21 tools on the list are mentioned in two sets of abstracts, suggesting a large amount of diversity in tool use among geocomputationalists.
Table 16. Applications of GeoComputation, 1996-1999
|
|
|
|
|
hydrology |
|
|
|
|
time |
|
|
|
|
the environment |
|
|
|
|
earth sciences |
|
|
|
|
urban areas |
|
|
|
|
decision support |
|
|
|
|
soil |
|
|
|
|
population census |
|
|
|
|
processes |
|
|
|
|
land cover |
|
|
|
|
climate changes |
|
|
|
|
ecological studies |
|
|
|
|
social sciences |
|
|
|
|
roads |
|
|
|
|
human activities |
|
|
|
|
economic activity |
|
|
|
|
decision maker(s) |
|
|
|
|
visual intrusion |
|
|
|
|
artificial life |
|
|
|
|
agricultural production |
|
|
|
|
The most frequent applications of GeoComputation research involve hydrology and time, followed by environmental applications and applications in the earth sciences, excluding soil. The frequency of phrases relating to hydrology, which include catchment areas and scale; drainage basins, density and networks; water flow; hydrologic modelling; river flow prediction; and rainfall amounts, increases from 1996 to 1998, then decreases in 1999. References to time, which include the right time; real time; time step, series, periods, and scales; time-consuming process; and time complexity, are most frequent at Leeds in 1996, decrease in frequency at Bristol in 1998, and then increase at Fredericksburg in 1999, giving some support to Macmillan's (1998) contention that space-time is increasing in importance in geography. Only two phrases show an increase in interest with respect to application: land cover, which includes land cover and land cover classification, and the earth sciences, including sediment transport, the earth's surface, gold potential, mineral occurrences and deposits, physical properties, and slope stability and angle. References to the environment, which include environmental change, impacts, conditions, and modelling; the natural environment; environmental resource management; and environmental processes, decrease from 1996 to 1999, suggesting either that interest in the environment is decreasing or that research interests are becoming more specific. The frequency of phrases referring to soil, soil properties and types, pore water pressure, and soil erosion, increases from 1996 to 1998, but decreases in 1999. Processes, which include biological, chemical, physical, and surface, and climate change decrease from 1996 to 1999.
Frequencies for applications in human geography generally decrease from 1996 to 1999. References to urban areas, which include urban density, development, and planning, follow a pattern similar to that for time - they are most frequent in 1996, decrease in 1998, and then increase in 1999. References to decision support increase from 1996 to 1999, but occur only in the Leeds and Fredericksburg abstracts. References to roads, which include new roads and road networks, are not present in the 1996 and 1997 abstracts, and frequencies increase from 1998 to 1999. The frequency of phrases referring to the social sciences and social characteristics decrease from 1996 to 1998, and do not occur in the 1999 abstracts. References to the census also follow this pattern, which is surprising with the US census in 2000 and the UK census in 2001. And lastly, the frequency of human activities, which is very low at both Leeds and Bristol, reduces to zero at Fredericksburg.
Interest in human geography, based on the five words analyzed, has decreased slightly over time, but human geography is obviously an important and continuing presence in GeoComputation. Interest in hydrology has been somewhat erratic, but hydrology has more or less maintained its status as the most important application area in GeoComputation since 1996. The use of statistics has varied from conference to conference, but emphasis appears to be increasing slightly over time. Artificial intelligence has an important presence at all five conference. Although interest appears to have waned a bit between 1996 and 1998, based on the five words we analyzed, it appears to be increasing since then. The Internet in particular appears to be an emerging tool in GeoComputation with percent frequencies generally increasing since 1996 for the four words we analyzed. Interest in time appears to be decreasing between 1996 and 2000. Interest in data quality generally increased from 1996 to 1999, but unfortunately decreased in 2000.
No trends are evident among the different types of data used at the five conferences; there is, nonetheless, some variation among them. Spatial data, geographic data, census data, and feature data were the most frequently used data phrases, although only spatial data were referred to in all five sets of abstracts. Different types of data were typically mentioned at each of conference, most occurring in only one or two sets of abstracts. Emphasis on modelling has remained steady since the GeoComputation conference series was initiated in 1996, although very few types of models or modelling were mentioned in more than one abstract, again emphasizing the diversity in GeoComputation. Although no one type of system has been in continuing use since 1996, reference to various types of systems has remained steady.
Interest in tools has been of utmost importance at all five conferences - two of the five conferences were tool based. Spatial analysis is by far the most frequently used tool in GeoComputation, although percent frequencies have varied considerably from conference to conference. GIS, neural networks, and remote sensing were also important, and each was mentioned in four sets of abstracts. Of the remaining tools identified by the phrase analysis software, only statistical analysis, various algorithms, maps, and cellular automata were mentioned in the abstracts of three conferences. Only two trends are suggested for tools that occurred in three sets of abstracts or more: interest in remote sensing appears to be increasing since 1996, and interest in statistical analysis, although continuing, appears to be decreasing.
With respect to applications, no application phrases were identified by the phrase analysis software among the 2000 abstracts, so the following refers only to 1996 through 1999. And only hydrology, time, the environment, earth sciences, and urban areas, all of which are categories, were referred to in three sets of abstracts. The patterns of percent frequencies for all applications are irregular, with the possible exception of the environment, which may be decreasing. No general trends can be identified for applications.
Finally, it is difficult to relate Openshaw's (2000) definition to the findings reported here: his definition is abstract and futuristic, and is thus more difficult to relate to our results. We agree that GeoComputation researchers deal with a wide range of geographic problems (and this range appears to be increasing), but whether or not this work is being done within the computational science paradigm cannot be directly addressed using our approach. An additional problem is that his definition is predicated upon the use of high performance computing now and in the future: no such phrase was identified by the software as occurring in two or more abstracts at any conference. So perhaps Openshaw's futuristic view of GeoComputation is just that: futuristic.
So how do we describe the scope of GeoComputation? Based on analyses of words and phrases in the abstracts of papers presented at the five conferences between 1996 and 2000, GeoComputation is the practice of analyzing and modelling spatial data often with GIS and/or neural networks, but using both traditional and newly developed (and developing) computer-based tools as well. These procedures are applied primarily to complex problems involving analysis of large and complicated data sets or databases, e.g., census data, digital elevation models. Many of the problems addressed were heretofore incapable of analysis because of their complexity and the ever-increasing requirement for massive computing power to handle the data. Overall, diversity and breadth is increasing and evolving over time and will continue to do so in the future.
Couclelis, H., 1998, Geocomputation in context: in Geocomputation, A Primer, edited by Longley, P.A., Brooks, S.M., McDonnell, R., and Macmillan, B., Chichester, John Wiley and Sons, pp 17-29.
Diaz, J., Tynes, R., Caldwell, D., and Ehlen, J., eds., 1999, GeoComputation 99, Proceedings of the 4th International Conference on GeoComputation, 25-28 July, 1999, Mary Washington College, Fredericksburg, Virginia: Greenwich, U.K., GeoComputation CD-ROM.
Feng, F. and Croft, W.B., 2000, Probabilistic Techniques for Phrase Extraction: Amherst, MA, University of Massachusetts, Center for Intelligent Information Retrieval, Department of Computer Science. Technical Report IR-187.
Gahegan, M., 1999, What is Geocomputation? Transactions in GIS, vol. 3 (3), pp 203-206.
Longley, P.A., 1998, Foundations: in Geocomputation, A Primer, edited by Longley, P.A., Brooks, S.M., McDonnell, R., and Macmillan, B.: Chichester, John Wiley and Sons, pp 3-15.
Longley, P.A., Brooks, S.M, McDonnell, R., and Macmillan, B. (eds.), 1998, Geocomputation, A Primer: Chichester, John Wiley and Sons, 278 p.
Macmillan, B. 1998, Epilogue: in Geocomputation, A Primer, edited by Longley, P.A., Brooks, S.M., McDonnell, R., and Macmillan, B., Chichester, John Wiley and Sons, pp 257-264.
Openshaw, S., 2000, GeoComputation: in GeoComputation, edited by Openshaw, S. and Abrahart, R.J., London, Taylor and Francis, pp 1-31.
Openshaw, S. and Abrahart, R.J., Preface: in GeoComputation, edited by Openshaw, S. and Abrahart, R.J., London, Taylor and Francis, pp ix-xii.
Pascoe, R.T., ed., 1997, GeoComputation 97,
Proceedings of the second annual conference, 26-29 August, 1997, University
of Otago, Dunedin, New Zealand: Dunedin, University of Otago, 411 p.