An inductive attack on spatial scale

Nate Currit
The Pennsylvania State University
Department of Geography
302 Walker Building
University Park, PA 16802
E-mail: currit@essc.psu.edu

Abstract

Geocomputation is emerging as an area of emphasis within analytical geography in which inhibitory data assumptions are not made, as they are in traditional inferential statistics (Gould 1970; Gahegan 1999). Separating the data model from assumptions of data structure allows geocomputation techniques to be molded, to a certain degree, by the characteristics of the data themselves. Running parallel to the geocomputation emphasis is an emphasis on examining the driving forces affecting land use and land cover changes (LUCC) (Meyer and Turner II 1994; Liverman, Moran et al. 1998). Driving forces such as population pressures, political institutions, and cultural values, as well as the biophysical land transformations these forces influence, occur at varying scales (Fischer, Rojkov et al. 1995; Easterling 1997; Easterling and et. al. 1998; Kull 1998). Although Ordinary Least Squares regression has been used to model relationships between variables operating at different scales, several drawbacks are incurred due to the inherently rigid nature of the model. This paper presents a framework for addressing scalar dynamics through the implementation of a General Regression Neural Network (GRNN). The first half of the paper outlines why traditional parametric techniques are inappropriate for the analysis of multi-scale problems, and demonstrates why the GRNN is appropriate. Finally, the robustness of the GRNN approach to attacking spatial scale is made explicit through a multi-scale analysis of US Great Plains maize production.

Keywords: General Regression Neural Network, Scalar Dynamics, Geocomputation, US Great Plains, Maize Production

1. Scalar dynamics of land transformation

Geocomputation and land use/land cover change are two sub-fields within the discipline of geography that are currently in vogue. Geocomputation, a fairly new arrival to the geography arena, is interested in implementing inductive, non-parametric, computer intensive techniques to solve geographic problems (Gahegan 1999). The techniques employed fall along a spectrum with totally data driven techniques at one end (Openshaw, Turton et al. 1999) and expert or model driven techniques at the other end (Robinson and Frank 1987). Numerous techniques, each with its own set of strengths and weaknesses, are being applied to solve real world problems (Hewitson and Crane 1994). This article demonstrates the utility of applying a type of artificial neural network, called the General Regression Neural Network (GRNN) to solving traditional regression type problems, even when those problems involve variables operating at different scales.

Stating that the GRNN can be used in modeling processes operating at different scales is more than a trivial statement, especially when considering the current literature on land-use and land cover change. Scale issues are inherent in studies examining the physical and human forces driving land use and land cover changes (Fischer, Rojkov et al. 1995; Easterling 1997; Easterling and et. al. 1998; Kull 1998; TurnerII, Skole et al. 1995; Polsky and EasterlingIII 1999). A recognized need for the analysis of variables operating at multiple scales prompted the definition of scalar dynamics as defined by the Land Use/Cover Change (LUCC) program of the joint International Geosphere-Biosphere and International Human Dimensions of Global Environmental Change Programmes. Environmental problems like deforestation, soil erosion, or decreasing soil fertility result from a combination of forces. Some of these forces, such as state environmental policies, may affect large regions. Other forces, like topography, rainfall, or individual land use practices, vary locally - and it is well understood that these forces do not act alone, but in combination with each other to produce land transformations (Turner II 1995). Another type of scale issue is that of resolution, which in turn leads into data reliability or data certainty issues. Data collected at a gross scale (course resolution) is considered less reliable in aiding interpretation of events operating at fine scales (fine resolution) (Goodchild 1999). In order to effectively answer questions regarding land use/land cover changes it is necessary to account for the scale at which variables operate and to account for the scale or resolution (and thus certainty) at which the data where collected.

Given the apparent value of geocomputational techniques in aiding the interpretation of large, complex datasets, and the need to address spatial scale in modeling, this paper will discuss how the two can be integrated. Throughout the discussion the multi-scale GRNN technique will be compared to traditional OLS regression since it has been the traditional method of determining a variable's direct effects on an outcome. Three main steps will be taken to present the advantages the GRNN has over OLS regression. First, the method in which land use/land cover data are generally stored will be discussed in order to understand why such a data storage technique is not appropriate for OLS regression, but is appropriate for the GRNN. GRNN organizational structure will then be outlined. Three methods of interpreting GRNN results will also be examined in this section. Finally, strengths and weaknesses of the GRNN and possible methods of dealing with uncertainty in data are analyzed. The concluding section provides an example in which the GRNN is applied to the multi-scale analysis of corn production in the US Great Plains, making concrete the issues explored in the previous sections.

2. Cartographic Data Structure

One of the tried and true tools of the geographer is the map, which represents data about the real world in its spatial context. Understanding the form in which georeferenced tabular data is stored is necessary in order to evaluate the different ways the GRNN and OLS regression use the data. When all variables are assumed to operate within the bounds of certain areal units, it is appropriate to store the values from each unique, areal unit as a single record in the data table. Analysis involving this type of data would be a single scale analysis. Generally the same storage technique is employed even when variables operate at different scales. Let us assume a macro-variable, A, whose operation does not vary within an entire region (Figure 1a). Such variables, often bounded by political boundaries, may include policies, institutions, or trading zones. Within the same bounded region, however, are smaller sub-units (Figure 1b). Certain variables may operate differently within each of the sub-units. These micro-scale variables might include the sub-units implementation of a given macro-scale policy, or they may be variables representing local variations of the physical environment. A simple map overlay of the different scale variables would produce a map like that seen in (Figure 1c), which cartographically looks like 1b, but which contains a value representative of the macro-scale variable for each individual polygon. The typical structure for storing the data representing each of the polygons is shown in (Figure 1d) - each record representing one of the micro-scale units. A unique record for each of the smaller scale units is justified because each of the smaller scale units is unique. Even though polygons "A9" and "A8" share qualities (both are influenced by the same "A" variable) they are different (in their micro-scale variation). The same is true for all other combinations of the sub-units - they are unique. Analysis involving this type of data would be multi-scale analysis - where certain variables operate at micro scales and others at macro scales.

Although simplistic, the above description of multi-scale variable interaction and representation is important because that data representation causes problems when used in conjunction with OLS. When implemented in OLS regression that data structure violates a number of the assumptions of the OLS model, the most important of which is the assumption of independence of observations. The assumption of independence of observations means that knowing the value at one location does not provide a better than random possibility of knowing any other locations values based on the known value. In the above example, knowing the temperature value for one micro-region would allow someone to guess that near micro-regions had the same temperature value (Figure 1d). Furthermore, repeating macro-scale values for each recorded micro-scale value skews the calculated means, variances, and covariances of the independent (x) variables - values necessary for calculating regression coefficients (Hamilton 1992). In other words, traditional OLS regression cannot be used to analyze variables operating at multiple scales.

It must be stated that development efforts are underway in creating data storage methods that take into account a hierarchical data structure (Peuquet 1994; Papadias and Egenhofer 1997). Such data storage techniques are not methods of analyzing multi-scale relationships, however. Because there is a recognized need for multi-scale analysis, techniques like Hierarchical Linear Modeling (HLM) are being developed. HLM was born out of OLS regression and thus inherits most its assumptions. It does, however, effectively analyze multi-scale data by calculating separate error terms for each scale variable, (Polsky and Easterling III 1999). The GRNN, on the other hand, was developed to work under a different set of conditions than those built into OLS or HLM. The GRNN makes no assumptions about the underlying data distributions, does not require independence of observations, and requires no a priori determination of the type of function operating between variables (i.e. linear, exponential, polynomial, etc.). Until now the GRNN has not been explicitly conceptualized as a multi-scale data analysis tool (Figure 2).

3. General Regression Neural Network (GRNN)

3.1 Generalized Regression

The General Regression Neural Network (GRNN) is similar to all artificial neural networks (ANN) in its analogy to the human brain. Like the human brain, ANN's are composed of numerous small neurons (termed nodes in ANN literature) that receive input from certain nodes and in turn stimulate other nodes. A process of learning takes place in the nodes as they are presented with stimuli over a period of time. Artificial neural networks act in much the same way by receiving input (data values) and processing them through a series of nodes that organize themselves so as to best predict a certain output. Receiving stimuli (inputs) numerous times and arriving at the best association between the stimuli and an output is termed learning. The exact method in which the human brain learns is not known, and thus cannot be replicated in a computer, but the idea of interconnected nodes that perform relatively simple tasks, and organize themselves so as to best predict an output can be replicated.

Basically there are two types of ANN's - those that predict categorical output, and those that predict continuous output. The GRNN predicts continuous outputs. Two main functions are required of GRNN nodes. Those functions are 1) to calculate the difference between all pairs of input pattern vectors, and 2) to estimate the probability density function of the input variables. Calculation of the difference between input vectors is the simple Euclidean distance between data values in attribute space. Weighing the calculated distance of any point by the probability of other points occurring in that area yields a predicted output value. With those two tasks accomplished it is a rather straightforward process of predicting the output. Equation 1

Determining the joint probability density function (pdf = fXY(x, y)dy) of the variables is the main difficulty in utilizing equation 1. Before explaining how an estimate of the joint pdf is made, a graphic example of the process accomplished by equation 1 is needed. Figure 3 is a simple bivariate example where the x-axis represents an input (independent) variable, and the y-axis represents an output (dependent) variable. Given the scatterplot displayed, one might determine a predicted y value for the new x value as shown.

The predicted y is reasonable because it is similar to the y values which have x values similar to the new x value. The colors of the scatterplot represent the similarity of the sample data points to the new x. Weights, or levels of similarity, are assigned to each point as determined by their distance from the new x, designated in the scatterplot by level of shading - dark colored points are more similar to the new x than light colored points. Little weight is given to those points that have x values very different from that of the new x value because it is not very probable that the predicted y is affected by the more distant points. Calculating all the y values for the entire range of x values would yield a curve like the one shown above. No a priori assumption was made about the shape of that curve, nor was one made about the distribution of the data. We did, however, visualize the joint probability of x and y values.

3.2 PDF Estimation

A pdf can be defined as the population proportion of any given value (Hamilton 1992). In the univariate case, it is the probability of x occurring within the population. The integral across the range of x's in a population is unity, which means that the pdf is equal to the population distribution of variable x multiplied by a constant. Since it is impossible to obtain population data for geographic analysis problems, as well as most other types of analysis problems, a method of estimating the pdf is necessary in order to compute equation 1 (above). Whereas the GRNN does not make most of the assumptions that OLS regression makes, it does assume that the sample data set is representative of the population from which the sample was drawn. Working under that assumption, Parzen (1962) developed a simple, yet robust method of determining the probability density function of a population from a sample. Each sample data point is assigned a sphere of influence (sigma, known as the Parzen window or smoothing factor), similar to a variance, which is centered over the point. The most common sphere of influence given to each data point is the gaussian curve (which has nothing to do with assumptions of normally distributed data). The summed influence at any given point, multiplied by a constant, provides the probability density function at that point (Parzen 1962; Specht 1991; Masters 1993). Imagine a single variable with each data value from that variable displayed along the x-axis. Further imagine that each data value has a curve centered over it that represents its sphere of influence. Many spheres of influence would be found where data values were concentrated, and points in this area would have a high probability density. Graphing the sum of the spheres of influence, multiplied by a constant, for each point along the x-axis range yields an estimate of the pdf for that variable.

Although not difficult to understand conceptually, finding the appropriate sphere of influence for a variable can be extremely difficult. Since there is no "true" pdf with which to compare the estimated pdf another measure of the appropriate sphere of influence must be used. The appropriate sphere of influence is defined as the one that produces the smallest mean square error between the actual and predicted output values. Determination of the appropriate sphere of influence (smoothing factor, sigma) is where the "learning" takes place in the GRNN. The first smoothing factor weights assigned to each variable can take on any value. Only by varying the smoothing factor weights, based on some type of error minimization procedure, can the least square error between predicted and actual outputs be calculated. Numerous algorithms exist for optimizing sigma, ranging from steepest descent algorithms like backpropagation (which will find a local minimum) to stochastic methods like genetic algorithms (which will find a global minimum - not necessarily the global minimum, however). Although in depth discussion of optimization techniques is beyond the scope of this paper, a word about model verification is necessary. It is very possible to over train the GRNN so that is predicts extremely well the sample actual output values based on the sample input values. This occurs when the sigma weights (spheres of influence) are so small that each sample point has a very high pdf, but all pdf values between points are miniscule - the estimated pdf does not match the "true" pdf. In order to generalize about the population from which the sample was drawn, over training needs to be avoided. To accomplish this the entire sample data set is divided into two sets - a training set and a test set. The GRNN is trained and sigma is optimized on the training set. As training proceeds the error between actual and predicted values becomes smaller and smaller, and at a certain point generalization of the entire data set is compromised. In order to prevent the GRNN from over training, the test set data values are run through the GRNN after each training iteration. When over training begins, the square error of the test set begins to rise, and training stops.

With a general understanding of pdf estimation and how that relates to equation 1, the mathematics behind Parzen's density estimator can be substituted into equation 1 to produces the following two equations.

Equation 2; Equation 3

Combining these two equations produces a predicted y in exactly the same way as was done visually in the scatterplot example (Figure 2).

3.3 Implications for Multi-Scale Analysis

The above-explained conceptualization and accompanying mathematics have been shown in numerous situations to be better predictors than OLS regression in single scale analyses. Conceptualization of GRNN suitability in multi-scale analyses has not been examined, however. Since the mathematics of this model do not require the calculation of a variable's mean or variance, the cartographic data structure described in the first section is appropriate for multi-scale GRNN analysis. Given that each areal unit for which data values are collected is unique, and that only the difference between units of analysis is needed for the GRNN to function properly, results of multi-scale analysis are not skewed by this method as they are in OLS regression. Another form of skewed results that occurs in OLS regression is when variables are not independent of one another. In such cases autocorrelation is a problem. Relationships between proximate data observations (in either time or space) only become a problem if the analysis technique used assumes they are not related. OLS regression assumes that all observations are independent of one another, and when that assumption is violated OLS regression fails. The GRNN does not assume that all observations are independent (it never calculates correlations); which is another reason why the cartographic data structure is appropriate for GRNN analysis. Also in contrast to OLS regression, the GRNN does not assume that data are normally distributed - it has the ability to estimate any type of data distribution through optimization of the smoothing factor. Finally, the GRNN is not confined to modeling linear relationships between variables. These advantages over OLS regression, and parametric, deductive techniques in general, make the GRNN a formidable contender for consideration in any modeling exercise.

3.4 Network Structure

The mathematics was developed for the GRNN decades ago, but was soon forgotten because of limitations in computing power. Not until Specht (Specht 1991) rediscovered the mathematics were they implemented, and this time in neural network form. Remember that neural networks are made up of multiple nodes that individually perform relatively simple tasks, but in combination are able to model complex problems. The GRNN is composed of four layers of nodes (Figure 4). The input layer (A) contains a number of nodes equal to the number of input variables in the model. Each node in the input layer is fully interconnected to each node in the summation layer (B). The number of summation layer nodes is always equal to the number of cases in the training dataset. Each node in the summation layer is assigned a unique training vector. The equation 2-distance measure is calculated between the summation layer node vector and the input layer vector, the exponent then being taken of that distance. All summation layer nodes fully connect to the pattern layer nodes (C). Those summation layer nodes that connect to the numerator node are multiplied by the actual y output value associated with each summation layer node's vector, those that connect to the denominator node are not multiplied by the actual y value. Both pattern layer nodes sum the inputs from the summation layer. The predicted y value is finally calculated in the output layer node (D) by dividing the numerator node value by the denominator node value. After determining the error between actual and predicted y values (comparing D and E), and depending on the optimization technique used to minimize the square error between those values, the above calculation may be run numerous times with a different smoothing factor (sigma) each time. Training stops once a threshold minimum square error value is reached, or when the test set square error begins to rise.

4. GRNN Interpretation

Interpretation of GRNN analysis takes on three main forms - comparison of each variables final smoothing factor (sigma), an overall model goodness of fit measure (R-squared), and visual analysis (Specht 1991; Masters 1993). As stated earlier, the smoothing factor is a measure of each sample data point's sphere of influence. When sample data points for one variable have greater spheres of influence than sample data points for a second variable, the first variable is said to be more important in predicting an outcome than the second variable. Examination of the relative ranking of sigma weights reveals which input variables are most important in determining the output. Such an examination is analogous to comparing the standardized beta values in OLS regression.

When comparing GRNN results to OLS regression results, analyzing the visual relationship between predictions is also very useful. OLS regression assumes a linear relationship between data values - the GRNN does not. Although viewing a scatterplot of variable relationships prior to running OLS regression allows one to transform the variables so that they are linearly related, bias is added to the model by doing so. Likewise, other types of relationships can be modeled with regression techniques (i.e. exponential, logarithmic, polynomial), but the type of relationship must be assumed prior to running the regression. The GRNN, on the other hand, fits the relationship between variables regardless of the form of their relationship because its predictions are a result of the joint probability density function between variables. No a priori assumption about that relationship is made. Thus, plotting data points, regression lines, and GRNN lines makes possible a visual determination of which technique best fits the data. While in some cases the GRNN will not improve upon standard regression predictions, it nearly always will when the relationships are non-linear.

The final method of analyzing output from the GRNN is to calculate the percent of the variance in the output variable that is explained by the input variables (R-squared). R-squared in the GRNN is the same as in OLS regression analysis - it is the coefficient of multiple determination. In many cases R-squared is the final measure of which model, the GRNN or OLS regression, is the better predictor of the output. Also, multiple GRNN analyses may be run in which the same variables are used, but their scale of analysis is varied. In that situation the model whose R-squared value is highest represents the model with the most appropriate combination of scales.

5. GRNN Limitations

Even though the GRNN is in many cases a better predictor than OLS regression, numerous statistics that OLS regression calculates to aid model interpretation are not present in the GRNN. OLS regression produces unstandardized beta values that represent a unit change in the output, given a unit change in the input. Since GRNN function approximation can be highly non-linear, calculation of such a statistics is unreasonable if not impossible. To a certain point, however, visual analysis of predicted values substitutes for the linear mathematical understanding given in OLS regression. OLS regression also provides significance levels for all of its statistics, another descriptor absent from GRNN analysis. Possible smoothing factors range from 0 to infinity, depending on variable importance and sample size. Without such significance levels, or confidence intervals, it is not possible to know how a small change in the smoothing factors would affect the predicted value. A standardized method of assessing confidence intervals associated with each variable would certainly make GRNN interpretation more robust.

5.1 Data Certainty

Although data certainty is related to the above-discussed issue of confidence intervals, it also deserves its own section. Confidence in overall model goodness of fit, as well as confidence in individual smoothing factor weights, needs to be related to the certainty of the data. All areal units are treated as equal in GRNN analysis because each unit is unique, as explained in Figure 1. However, when multi-scale data from different sources is used to understand the driving forces of land-use/land cover change (i.e. soil maps and census data) the scales at which the data were collected is different, in addition to the scales of operation being different. As an extreme example, imagine a national level soil map being used in combination with population data gathered at the county level to predict land use. The manner in which a parcel of land is put to use is influenced by the soil type, and although the national soil map contains useful soil type information, it usefulness is hindered because of the scale represented by the data. A GRNN may find soil type to be very important in determining land use (high smoothing factor weight), but a land-use expert would realize that because of the scale at which the data were collected, its importance might be overstated. The scale at which the data were gathered becomes a form of certainty, or conversely, uncertainty. Currently the GRNN lacks a method to take into account uncertainty based on the scale at which the data were collected. Constraining the way in which smoothing factor weights change in the learning process, based on data certainty, could provide one method of dealing with this type of scale issue.

6. Reifying the Concepts: Modeling maize production in the US Great Plains

Understanding land use and resulting land cover changes is necessary for the equitable and efficient management of natural resources, including agricultural production. For purposes of this article, corn production serves as a surrogate for land use, as it is a product of agricultural input, output, and land management. The purpose is to model those factors that influence the production of maize in the Great Plains region of the US. A single year of production, 1995, will be analyzed, but further studies may include multiple years, as well as multiple scales, in order to examine change over time and space. The analysis presented here is simplified for purposes of demonstrating the ability of the GRNN to analyze multi-scale analysis problems. Variable's effects at multiple scales will be analyzed to compare changing interaction between those variables. Scales of operation will be determined by finding the combination of scales that produces the best prediction of actual maize yield per acre.

6.1 Data types and hierarchies

Climatic and soils data, available at the county level, are used to predict per acre maize yields. While all data exist at the county level, it is hypothesized that certain variables operate at scales larger than that of the county. By jointly analyzing those variables that operate at broad spatial scales and those that operate at fine spatial scales the best possible prediction of maize yield per acre can be obtained. If a multi-scale analysis predicts better than a single-scale analysis, the scalar dynamics hypothesis will be supported in this case. Likewise, the GRNN is hypothesized to outperform traditional OLS regression in the joint analysis of multiple scales, further demonstrating the appropriateness of the GRNN, and inappropriateness of OLS, in multi-scale analysis problems.

It is well understood that maize production is the result of numerous biophysical constraints as well as human management of the land. Human management is subject to economic, social, and political pressures that are not taken into account here. In this simplified example the variables used to predict corn production are mean annual temperature, mean annual precipitation, and soil pH. US Agricultural Statistic Districts (ASDs), those federal government defined regions that purport to bound similarly characterized regions, form the macro-scale level of analysis. Each ASD is composed of contiguous counties, counties forming the micro-scale level of analysis. The input, or independent, variables are interpolated spatial averages for each county or ASD. The examples that follow each predict maize yield per acre at the county level, but the input variables are analyzed at both county and ASD scales.

6.2 Results and interpretation

The first steps taken in the analysis of corn production were to examine bi-variate relationships between the input variables and maize yield per acre, at both county and district levels. Besides providing an initial indication of the scale of operation for each variable, the resulting graphs allow people new to multi-scale analysis to visually examine the results of both county and district analyses. Each point in the scatterplots represents a county, and certain counties are part of the same district. Those counties compromising districts 4811 (Northern Texas) and 2030 (Southwestern Kansas) are shown as examples.

Mean annual temperature, at the county level (Figure 5), is linearly related with maize yield per acre. Both OLS and the GRNN perform similarly, visually and in terms of R-squared (Table 1). In this case, OLS regression may be used because both the input and output variables are defined at the same scale, the county, and the relationship is nearly linear. There are, however, a number of outliers (those counties that reported no corn production) that skew the line of best fit. The outliers do not as heavily influence the GRNN as they do OLS and the GRNN explains slightly more variance than OLS. At the district level (Figure 6), mean annual temperature predicts maize yield per acre similarly to the county level, with OLS explaining slightly more variance than before and the GRNN slightly less. The more interesting aspect of this graph is the arrangement of the district values along the x-axis. Since prediction of maize yield per acre is done at the county level, the number of points on the scatterplot is the same as in figure 5, but the number of x-axis values is substantially decreased. All district values are arranged in columnar form. The result is that OLS is not an appropriate technique for the analysis of this type of problem, in particular because the observations are not independent of one another. As such the line of best fit and statistical descriptors of the model cannot be trusted. Nevertheless, even with temperature values aggregated to the district level, the relationship between maize yield and mean annual temperature remains strongly linear, and both models explain a large percentage of the variance of maize production.

Bi-variate predictions based on mean annual precipitation are a more interesting case than those based on temperature. At the county level (Figure 7), neither the GRNN nor OLS are very good predictors of maize yield per acre (Table 1). The scatterplot reveals only a slight linear relationship, and both model types make similar predictions. One important point to notice before departing to the district level analysis is the arrangement of districts 4811 and 2030. Both districts are found in similar areas of the graph and in fact overlap. The district level analysis (Figure 8) reveals an interesting scatterplot pattern. Nearly all districts, including 4811 and 2030, are separated very well - distinguishing between districts is easily accomplished. Even before trying to model the relationships it appears that the district boundaries are effective boundaries for defining regions of similar precipitation because visually the districts are easily separated. Evidence supporting the hypothesis of effective regional delineations comes in the form of a greatly increased R-squared value for the GRNN. OLS, suffering from its limitations discussed above, improves only slightly, and it is understood that its results are skewed.

Other bi-variate relationships were analyzed in similar fashion to the two done above and from them the following joint analysis of multiple scales was devised as one that would be effective at modeling maize yield per acre. The input variables for this analysis problem are mean annual temperature at the district level, mean annual precipitation at the district level and soil pH at the county level. Naturally more complex and complete analyses would include more explanatory variables. Of the models examined, this final model is by far the best at predicting maize yield per acre, with an R-squared value of 0.755. Comparing sigma values reveals that mean annual temperature at the district level was the most important variables in predicting maize yield per acre, followed by mean annual precipitation at the district level and finally soil pH at the county level. While soil pH necessarily plays a role in maize production, examination of individual counties throughout the region reveals that nearly all the counties have pH levels suitable to maize production, which would explain why pH is less important in predicting maize yield for individual counties. As revealed by figure 9, however, it is the combination of high district temperatures and high pH that produces some of the greatest amounts of maize per acre. High pH levels in combination with low mean annual precipitation at the district level also produce high yields of corn (Figure 10). Spatially, the residual yields per acre for each county were distributed throughout the region, except for some gross over predictions in the southern and northwestern ends of the region (Figure 11). The high residuals can be explained by the fact that most of those counties did not report any corn production in the year 1995. Because the GRNN attempted to generalize over the entire region, it predicted yields in all areas, even when the actual yields for some areas were null. Overall, however, the predictions were quite accurate and outperformed the predictions of OLS regression.

7. Conclusions

With numerous studies throughout the world dealing with land-use/land cover change issues, all of which involve modeling processes that operate at different scales, a technique that allows for incorporating multi-scale data is necessary. The GRNN shows promise as a tool of choice for multi-scale modeling situations. It is frequently a much better predictor than traditional OLS regression. It does not suffer from numerous data assumptions, such as data normality, linear relationships, or independence of observations. Besides being a reliable predictor, it is also capable of determining which input variables in a multivariate analysis are most influential in determining an output. Even with all of its advantages, the GRNN has some limitations in not being able to provide confidence intervals for its output, not providing standardized smoothing factors, and not being able to take into account uncertainty in the data. Development of methods to improve upon the GRNN weaknesses is possible, however. Incorporation of improved diagnostics will make the GRNN an even stronger method of analyzing the multi-scale relationships so common in land-use/land cover change studies.

Acknowledgements

I would like to thank the Fall 1999 Geocomputation class members for the insightful discussions that forced me to think deeper about inductive computer learning. Special thanks to my instructor Mark Gahegan for helpful comments on the initial draft of this paper, and to my advisor Bill Easterling for further comments and support.

References

Easterling, W. E. (1997). "Why regional studies are needed in the development of full scale integrated assessment modelling of global change processes." Global Environmental Change 7(4): 337-356.

Easterling, W. E. and et. al. (1998). "Spatial scales of climate information for simulating wheat and maize productivity: the case of the US Great Plains." Agricultural and Forest Meteorology 90: 51-63.

Fischer, G., V. A. Rojkov, et al. (1995). "The Role of Case Studies in Integrated Modeling at Global and National Scales." Task Force Meeting on Modeling Land-Use and Land-Cover Changes in Europe and Northern Asia March: 0-19.

Gahegan, M. (1999). What is GeoComputation?, http://www.ashville.demon.co.uk/geocomp/definition.htm. 1999.

Goodchild, M. F. (1999). "Future Directions in Geographic Information Science." Geographic Information Sciences 5(1): 1-8.

Gould, P. (1970). "Is Statistix Inferens the Geographical Name for a Wild Goose?" Economic Geography 46: 439-448.

Hamilton, L. C. (1992). Regression with Graphics: A Second Course in Applied Statistics. Belmont, California, Wadsworth, Inc.

Hewitson, B. C. and R. G. Crane, Eds. (1994). Neural Nets: Applications in Geography, Kluwer Academic Publishers.

Kull, C. A. (1998). "Leimavo Revisited: Agrarian Land-Use Change in the Highlands of Madagascar." Professional Geographer 50(2): 163-167.

Liverman, D., E. F. Moran, et al., Eds. (1998). People and Pixels: Linking Remote Sensing and Social Science. Washington D.C., National Academy Press.

Masters, T. (1993). Advanced Algorithms for Neural Networks - A C++ Sourcebook. New York, John Wiley and Sons, Inc.

Meyer, W. B. and B. L. Turner II, Eds. (1994). Changes in Land Use and Land Cover: A Global Perspective. Cambridge, Cambridge University Press.

Openshaw, S., I. Turton, et al. (1999). "Using the Geographical Analysis Machine to Analyze Limiting Long-term Illness Census Data." Geographical and Environmental Modelling 3(1): 83-99.

Papadias, D. and M. J. Egenhofer (1997). "Algorithms for hierarchical spatial reasoning." GeoInformatica 1(3): 251-273.

Parzen, E. (1962). "On Estimation of Probability Density Function and Mode." Annals of Mathematical Statistics 33: 1065-1076.

Peuquet, D. (1994). "It's About Time: A Conceptual Framework for the Representation of Temporal Dynamics in Geographic Information Systems." Annals of the Association of American Geographers 84(3): 441-461.

Polsky, C. and W. E. Easterling III (1999). "A Methodology for Multi-Scale Analysis of Land Use with an Application to the U.S. Great Plains." Submitted to Agriculture, Ecosystems, and Environment: 21.

Robinson, V. B. and A. U. Frank (1987). "On Expert Systems for Geographic Information Systems." Photogrammetric Engineering and Remote Sensing 53(10): 1435-1441.

Specht, D. F. (1991). "A General Regression Neural Network." IEEE Transactions on Neural Networks 2(6): 568-576.

Turner II, B. L., D. Skole, et al. (1995). Land-Use and Land-Cover Change Science Research Plan. IGBP Report No. 35 and HDP Report No. 7. Stokholm and Geneva, International Geosphere-Biosphere Programme and the Human Dimensions of Global Environmental Change Programme: 132.