Modelling Ranunculus Presence in the Rivers Test and Itchen Using Artificial Neural Networks

Christian W. Dawson
Department of Computer Science, Loughborough University, Leicestershire, LE11 3TU, UK
E-mail: C.W.Dawson1@lboro.ac.uk

Robert L. Wilby
Division of Geography, University of Derby, Derby, DE22 1GB, UK
E-mail: R.L.Wilby@Derby.ac.uk

Colin Harpham and Martin R. Brown
Division of Computing, University of Derby, Derby, DE22 1GB, UK
E-mail: C.Harpham@Derby.ac.uk
E-mail: M.R.Brown@Derby.ac.uk

Elspeth Cranston and Edmund J. Darby
Loughborough Ecologists, 226 Charnwood Road, Shepshed, Leicestershire, LE12 9NR, UK
E-mail: LboroughEcols@cs.com

Abstract

Between 1991 and 1998 several macrophyte surveys were made of the Rivers Test and Itchen, UK. Alongside these surveys data were also gathered on environmental, hydrological, and meteorological factors. In an attempt to model macrophyte abundance, several linear models were developed based on the data gathered. While these models provided a useful basis on which factors influencing macrophyte presence could be investigated, it was felt that alternative methods should still be explored. As an alternative, inductive learning methods (for example, artificial neural networks) have been used to model macrophyte presence - specifically Ranunculus spp. Inductive learning algorithms are well suited to problems of this nature due to their ability to handle non-linear, noisy, and inconsistent data. This paper discusses how data were gathered, processed and used to develop the inductive learning models. This paper presents the results of the models developed and compares them with linear statistical techniques.

1. Introduction

This paper presents preliminary results from work that is currently in progress on the development of inductive learning algorithms (for example, artificial neural networks) for predicting the presence of Ranunculus spp. in the Rivers Test and Itchen, Hampshire. During the 1990s a number of surveys were carried out in these rivers by the Loughborough Ecologists (funded by the Environment Agency and the Test and Itchen Association) to establish the presence of various macrophyte species. Preliminary analyses of these data involved the development of standard statistical models (stepwise multiple linear regression and factor analysis) in an attempt to predict the presence of these species from both spatial characteristics (for example, river water depth, width, amount of shading) and temporal variations (for example, meteorological variations, water quality factors etc.). In addition, these models were used to identify those factors having most influence on each macrophyte species (Wilby et al., 1998).

In order to evaluate the influence of environmental factors it was necessary to build models using all the available data. However, although this is useful in terms of developing the most accurate model from those data and identifying key biological responses, there is no means of determining how accurate these models are at predicting species abundance in previously unseen situations. It is only by excluding data from a model's 'training' that one can determine how good that model is at generalising to unknown situations and avoid 'over fitting' the models to the training data.

Although the initial models provided the required results in the first instance it was felt that inductive learning models could provide an alternative, non-linear solution. For example, the statistical models that were developed assumed that the underlying relationship between each species and environmental factors was linear. This is not necessarily the most appropriate relationship in certain cases and alternative non-linear relationships were explored.

Although neural networks have only been used in a limited number of ecological studies in the past (for example, Recknagel et al. [1997] and Whitehead et al. [1997] investigated the use of neural networks for modelling algal blooms, and Lek et al. [1996] developed models linking trout spawning sites and habitat characteristics), they have been used more frequently in environmental and hydrological contexts (for example, Maier and Dandy, 2000; Dawson and Wilby 2000). They would appear to be well suited to problems of this nature due to their ability to handle non-linear, noisy, and inconsistent data.

This paper presents preliminary results from this work. Section 2 introduces inductive learning algorithms and discusses, in more detail, those used in this study. Section 3 discusses how the data were gathered and manipulated, Section 4 presents the results from this preliminary study and Section 5 presents conclusions and recommendations for future work.

2. Inductive Learning

2.1 Introduction

Inductive learning is concerned with inducing general functions from specific training examples. Although there are many varied inductive learning algorithms there is no universal algorithm that can take a sample of training examples from an arbitrary unknown function and produce a good approximation to that function. Each algorithm searches the hypothesis space defined by some underlying representation (for example, a decision tree, a neural network, or a linear or a non-linear function) and each uses a different structure to organise the search through the hypothesis space.

In this paper the performance of three types of inductive learning algorithm are compared. First, the See5 algorithm, which assumes the unknown function can be represented by a decision tree. Second, by a multi layer perceptron (artificial neural network) which assumes that the unknown function can be represented by a multilayer, feed forward network of sigmoid units. Third, by a radial basis function network which models the unknown function using a network of Gaussian basis functions.

2.2 Artificial neural networks

Artificial neural networks (ANNs) have experienced a renaissance in the last fifteen years, since the rediscovery and popularisation of the back propogation algorithm by Rumelhart and McClelland in 1986. Before this time it was very difficult to train neural networks of any practical size and they were generally limited to solving academic rather than 'real world' problems.

ANNs mimic the behaviour of their biological counterparts. They are represented by a number of neurons (c.f. brain cells) connected together in a network by a number of links (c.f. axons). By adjusting the strength, or weight, of the links between the neurons (c.f. synapses) one is able to 'train' an artificial neural network to behave in different ways.

From this simple representation, the neural network engineer is able to implement countless ANN types. For example, ANNs can be constructed in different ways by arranging the neurons into layers or groups. The links joining neurons can be configured to allow data to flow in different directions through a network. Different types of neurons can be used in a network that 'activate' in different ways, and different training algorithms can be employed in an attempt to 'optimise' a network's performance.

Although neural networks come in various forms, by far the most popular structure is the feed forward network architecture shown in Figure 1. In this configuration, numeric inputs to the network are passed forward from an input layer, through one or more hidden layers (one hidden layer is shown in Figure 1) to an output layer. As these data pass through a network they are modified according to 'weights' on each connecting link. At each neuron the value of its inputs are combined and an appropriate transfer function applied (for example, sigmoid, hyperbolic tangent, etc.). A neuron then produces an output response that is passed on to the next neuron it is linked to (or out of the network). Thus, for a given set of inputs to a neural network, a particular output response is produced. A network can be trained to respond in different ways to different inputs by adjusting the weights linking its neurons together - thus a functional mapping is produced from a set of predictors to a set of predictands.

This architecture represents both the multi layer perceptron (MLP) and the radial basis function network (RBF) - the two neural network types used within this study.

Figure 1 Feedforward ANN Architecture

2.2.1 The multi layer perceptron

The MLP is perhaps the most popular neural network architecture in use today. It is a feed forward network of interconnected neurons (see Figure 1) usually trained using the error backpropogation algorithm. This popular algorithm works by iteratively changing a network's interconnecting weights (in proportion to a 'training rate' set by the ANN engineer) such that the overall error (i.e., between observed values and modelled network outputs) is reduced. The error backpropogation algorithm can be adapted in various ways. For example, a momentum term can be introduced in an attempt to speed convergence to an error minimum. In addition, the training rate can be adjusted dynamically to prevent the optimisation process becoming caught in a local error minimum (see Dai and McBeth, 1997; Dawson, 1996; Magoulas et al., 1997).

Further detail of this training algorithm and the MLP are beyond the intended scope of this paper as they have been documented many times before. The interested reader is directed towards texts such as Gallant (1993) and Bishop (1995) for more details.

2.2.2 The radial basis function network

The radial basis function network (RBF) has been used in a limited number of environmental studies in the past (e.g. Dawson and Wilby, 1999; Jayawardena et al., 1997; Jayawardena and Fernando, 1998; Mason et al., 1996). While some might argue that the RBF is not strictly a neural network, it can be represented in an identical structure to that of the MLP (Figure 1). In this case the RBF simulates the unknown function using a 'network' of Gaussian basis functions in the hidden layer and (usually) linear activation functions in the output layer.

Training an RBF involves two stages. First, the basis functions must be established using an algorithm to cluster data in the training set. Typical ways to do this include Kohohen self organising maps (Kohohen, 1984; 1990), k-means clustering, decision trees (Kubat, 1998), genetic algorithms or orthogonal least squares and Max-Min algorithms (Song, 1996). In this study both K-means clustering and genetic algorithms (discussed below) have been used. K-means clustering involves sorting all objects into a predefined number of groups by minimising the total squared Euclidian distance for each object with respect to its nearest cluster centre.

Once the basis centres have been established it is necessary to fix the weights linking the hidden and the output layers. If neurons in the output layer contain linear activation functions, these weights can be calculated directly using matrix inversion (using singular value decomposition) and matrix multiplication (as used in this study). Alternatively, if a non-linear activation function is used, the weights can be established using an iterative algorithm such as error backpropogation.

Because of the direct calculation of weights in an RBF it is usually much quicker to train than an equivalent MLP. However, in some cases accuracy is compromised and an RBF can be less able to generalise so trade-offs must be made. For more information on the radial basis function the interested reader is directed towards texts such as Schalkoff (1997) and Orr (1996).

2.2.3 Genetic algorithms

The pioneering work of Holland (1975) illustrated how the Darwinian evolution process can be applied, in the form of an algorithm, to solve a wide variety of problems. Due to the biological motivation behind this highly parallel adaptive system it is now called the genetic algorithm (GA).

The GA has a population of individuals competing against one another in relation to a measure of fitness. At each stage of the process some individuals will breed, others will die off, and new individuals will arise through combination and mutation. In its simplest form the GA utilises fixed length character strings and steps through the following algorithm (Goldberg, 1989):

1 Randomly create an initial population of individual character strings (for example, this can be achieved by generating a binary string using computer simulated successive tosses of an unbiased coin).

2 Assign a 'fitness' value to each individual in the population using a chosen measure (for example, mean squared error).

3 Create a new population by applying reproduction, crossover and mutation operations to the individual strings. These operations are applied to chosen population strings with a probability based on each string's fitness. Reproduction simply involves copying an existing individual into the new population. Crossover involves swapping characters between two strings from a randomly chosen starting point in those strings. Mutation involves creating a new string from an existing one by randomly mutating a character.

4 Evaluate the fitness of the new population.

5 If the convergence criteria has not been reached go to step 3.

Following a fixed number of generations the string with the best overall fitness value is chosen. Convergence is not achieved in the usual sense since there is always an element of mutation which reintroduces an individual string to the search space.

Studies of GAs for function optimisation (De Jong, 1975) have indicated that good performance requires a high probability of crossover, a low probability of mutation and a moderate population size. Generally, the probability of crossover is set between 0.4 and 0.9 and for the purpose of these experiments a value of 0.7 was used. For the probability of mutation (Pmutation) the empirically derived formula (Schaffer et al., 1989) was used:

in which L is the string length (varying from 1440 for an RBF with 15 centres to 3840 for an RBF with 40 centres) and N the population size (50 in this case). The maximum number of generations was set at 200 since initial tests showed no improvement using a greater number.

In this application the GA is utilised to evolve an optimal set of basis centres for the RBF network. Singular value decomposition is still used to calculate the second layer weights and the fitness value is the MSE of the training set.

2.3 Decision tree inductive learning algorithms

Decision tree learning algorithms, for example, ID3, See5 and ASSISTANT (Cestnik et al., 1987), search a completely expressive hypothesis space and are used to approximate discrete valued target functions represented by a decision tree. In general, a decision tree represents a set of conjunctions of constraints on the attribute values of an instance.

In these experiments the See5 inductive learning decision tree algorithm was used. This is a revised version of C4.5 and ID3 (Quinlan 1986, 1993) and includes a number of additional options for implementation. For example, the Boosting option causes a number of classifiers to be constructed - when a case is classified, all of these classifiers are consulted before a decision is made. Boosting will often give a higher predictive accuracy at the expense of increased classifier construction time. For these experiments however, data set boosting was not found to give any improvement in prediction accuracy.

In applications with differential misclassification costs, it is sometimes desirable to see what affect costs have on the construction of the classifier. In this case all misclassification costs were the same so this option was not implemented.

The macrophyte data set uses continuous attributes. When a continuous attribute is tested in a decision tree, there are branches corresponding to the conditions:

attribute value <= threshold and attribute value > threshold

for some threshold chosen by See5.

As a result, small movements in the attribute value near the threshold can change the branch taken from the test. There have been many methods proposed to deal with continuous attributes (Quinlan, 1988; Chan et al., 1992; Ching et al., 1995). An option available in See5 uses fuzzy thresholds to soften this knife-edge behaviour for decision trees by constructing an interval close to the threshold. Within this interval, both branches of the tree are explored and the results combined to give a predicted class.

Decision trees constructed by See5 are post pruned before they are presented to the user. The Pruning Certainty Factor governs the extent of this simplification. A higher value produces more elaborate decision trees and rule sets, while a lower value causes more extensive simplification. For this experiment a certainty factor of 25% was used.

3. Data

3.1 Introduction

Between 1991 and 1998 a number of macrophyte surveys were made by the Loughborough Ecologists in the Rivers Test and Itchen. Two different survey types were undertaken - two single downstream longitudinal surveys in 1991 and 1998 (covering reaches of up to 100m) and periodic, temporal surveys (covering reaches of approximately 20m). To assess macrophyte growth with respect to hydrological, meteorological and biological factors those data gathered from the temporal surveys are discussed here.

3.2 Parameter set

The temporal surveys involved establishing six base reference sites on the River Test and three sites on the River Itchen. Each site was visited approximately twenty five times over the seven year period - not necessarily at the same time(s) each year and not necessarily at equal time intervals. During these visits data were gathered on the presence of a number of macrophyte species - Ranunculus, Berula, Callitriche, Zannichellia, Hippuris, and Schoenoplectus. Presence of each species was recorded using a 0 to 8 scale from observations taken in each square metre at each reference site (see Table 2).

Previous research has assumed that the growth of Ranunculus in chalk streams relies on a function of several meteorological, hydrological and physiochemical processes (Wilby et al., 1998). Therefore, in parallel with the survey visits, data were obtained from the Environment Agency (Southern Region) from gauging stations close to the reference sites. These data included water quality data (for example, pH, conductivity, nitrates, phosphate levels, etc.) and hydrological data (daily cumecs). Meteorological data were also obtained from the Weather journal weather logs covering the same period and included factors such as rainfall, sunshine, air temperature etc. In total, 47 parameters were gathered. In order to reduce the complexities of the models developed, and to ease the analyses, it was felt that this parameter set should be reduced in some way. With this in mind a principal components analysis was undertaken to distil the parameters while still retaining as much information from the data as possible.

Before extracting the principle components from the data they were first split into five categories; hydrological, pulse, meteorological, water quality, and precursor water quality (this is a similar categorisation to that used by Wilby et al. [1998]). Hydrological factors included data such as monthly mean flow, three, six, nine and twelve month moving averages of flow. Pulse factors included measures of the number of days in a month that one day's flow exceeded the previous day's flow by +5% and +10%. Meteorological data included monthly mean air temperature, maximum and minimum air temperature, hours of sunshine, etc. Water quality data were recorded both for the current month and the precursor month. Table 1 presents a summary of the total number of parameters in each category and how, through the principal components analysis, these data were reduced from 47 parameters to 13 key factors. Table 1 also shows the percentage of variance explained by the new factors. In this case only 14% of the variance has been discarded when reducing the number of parameters by approximately 72%.

Three additional factors (that could not be comfortable categorised as per Table 1) were included in the development of the inductive learning models - monthly rainfall total (mm), month number (1-12) and site identification number (1 to 9). This led to a total of 16 predictors available for developing inductive learning models of Ranunculus.

Variable	Initial Number of Parameters	Principal Components	Percentage of Initial Variance
Hydrological	11	2	92%
Pulse	6	2	85%
Meteorological	7	3	89%
Water Quality	10	2	80%
Precursor Water Quality	10	4	82%
Total	44	13	86%

Table 1 Parameter classifications used in principal components analysis

3.3 Ranunculus abundance

During the surveys of the Rivers Test and Itchen, Ranunculus spp. was classified using a non-linear, 0 to 8 cover category scale at each of the reference sites. Each category represented the percentage cover of Ranunculus as shown in Table 2 below:

Cover Category	% Cover	Mean % Cover	Thermometer Scale
0	0	0	00000000
1	< 0.1%	0.05%	00000001
2	0.1 - 1%	0.55%	00000011
3	1% - 5%	3%	00000111
4	5% - 10%	7.5%	00001111
5	10% - 25%	17.5%	00011111
6	25% - 50%	37.5%	00111111
7	50% - 75%	62.5%	01111111
8	75% +	87.5%	11111111

Table 2 Classification of Ranunculus percentage cover categories

Although categories are the de facto means of representation for See5 and discriminant analyses techniques, there are different ways that categories can be represented as outputs from feed forward neural networks. Three techniques were therefore explored to represent the cover category of Ranunculus spp in the ANN models. In the first instance the categories 0 - 8 were used as they stood, generated from a single network output. In this case 0 would be represented by output values less than 0.5, cover category 1 by values 0.5 to 1.5, 2 by 1.5 to 2.5, and so on. However, because these categories represent an underlying non-linear scale, it was felt that better results might be obtained by using the mean percentage cover of each category. For example, category 1 was represented by 0.05%, category 2 by 0.55%, category 3 by 3% and so on - see Table 2.

Better results, however, came from a third alternative - the use of a thermometer scale after that used by Gallant (1993). In this case the neural network has eight output nodes, each of which is trained to produce an output of approximately 0 (values less than 0.2) or approximately 1 (values greater than 0.8). The thermometer scaling shown in Table 2 identifies how each of the cover categories is represented by the output from all eight neurons. For example, cover category 3 would be represented by outputs of one (values greater than 0.8) from nodes 1,2,3 and zeros (values less than 0.2) from the remaining five nodes (nodes 4,5,6,7,8). In this way changes from one cover category to the next are represented by a change in only one node's output. In the example given so far, by changing the output from node 4 from zero to one, the network would represent a cover category of 4 (as 00001111). Thus, a network is able to map slight changes in inputs to only minor changes in the output for each change in cover category. For binary and unitary scales this would not be the case.

3.4 Cross training

Having arranged the data into appropriate predictors and predictands it was then necessary to split the data into appropriate training and test sets. As the macrophyte surveys of nine sites were undertaken on only a limited number of occasions, the total number of data points available were limited. Preliminary work involved splitting the data according to river (Test and Itchen) and/or reference site. However, this led to rather weak inductive learning models that were developed using only a very small number of points. As an alternative, the data were lumped together (to produce 210 data points) and a cross training technique used. Using cross training the data were split so that 93% of the data (196 data points) were used for training the models and the remaining 7% of the data (14 data points) were used for testing. This approach was repeated 15 times so that the models were tested against all of the (unseen) data at some point. Combining the results from these models resulted in a set of 210 modelled Ranunculus cover categories that could be compared directly with observed values.

4. Results

All models developed were used to predict Ranunculus cover from those factors identified in Section 3.2. For evaluation purposes it was necessary to compare observed (i.e. surveyed) classification with modelled classification for all 210 data points. Because of potential discrepancies between actual cover in the rivers and surveyed cover at the reference sites (observer error), it was decided that a fair test would be to accept those modelled cover classifications that were within one category of the surveyed values. The following table summarises these results (Table 3):

Model	Correct Classifications	% of Total	No. of Parameters
SWMLR	128	60.95	8
MLP	152	72.38	133
RBF (KM)	146	69.52	3600
RBF (GA)	148	70.48	255
See5	137	65.23	43

Table 3 Classification results of all models

The MLP was trained with 5,10, and 20 hidden nodes for 200, 400, 600, 800, 1000, 1500, 2000 epochs. The momentum rate set to 0.9, training rate to 0.1. The results presented are from a network trained for 400 epochs with 5 hidden nodes.

For RBFs trained using KMeans clustering (RBF (KM)), the best result was obtained from a network containing 150 basis centres (i.e. 150 hidden nodes). RBF networks with 5, 10, 20, 50, and 100 basis centres were also developed and tested. When GA training was used the best results were produced by a network containing 15 basis centres.

For comparative purposes a number of other tests were also undertaken. First, in order to assess the ability of the techniques to model the underlying function and compare results with earlier linear models, all the data were used for training and testing. In this case it was very easy to develop a model that could correctly classify all 210 data points. For example, an RBF was constructed within minutes that could correctly classify all the data. This emphasises the accuracy one can achieve by using non-linear techniques on data sets such as these.

In order to assess the impact of precursor macrophyte abundance, additional models were also produced containing these parameters as predictors. In this case model accuracy could be improved by over 10%. For example, an RBF with 50 basis centres was able to correctly classify 172 (81.9%) observations. However, although precursor observations could be included in the models it was felt that a more rigorous test would involve excluding these parameters. If Ranunculus abundance at particular reference sites was relatively stable, the models developed would focus on this predictor at the detriment of identifying influential factors on Ranunculus growth.

Another test involved using all 47 parameters to develop the models rather than the reduced parameter set obtained following principal components analysis. Using all 47 parameters led to poorer models than those presented here. For example, the best results obtained from an RBF with all 47 predictors was 142 (68%) correct classifications. This shows that any potential loss in information from the reduced data set is compensated by the simplification of the resultant models.

In Table 3 the number of parameters in each model has been presented giving an indication of their parsimony. For the SWMLR model the number of parameters is based on the average number of parameters used for all 15 models produced. The MLP and RBF models are calculated according to the number of links and, where appropriate, node biases in each network. The number of parameters in See5 is calculated using the mean number of leaf nodes on the decision trees for the 15 folds. This gives the number of conjunctions that are used for making decisions. This approach for comparing the complexity of decision trees with that used by neural networks was used by Dietterich et al. (1995). However, their comparison used decision tree leaf nodes and neural network hidden nodes.

As can be seen, the SWMLR is the most parsimonious model even though its accuracy is around 10% lower than that of other models (and See5 is not too far behind). The RBF(KM) model is particularly complex - the result of selecting a network containing 150 basis centres. It is worth noting that similar results were obtained with an RBF network with only 20 nodes (143 correct classifications) - in which case the model contained 'only' 480 parameters.

5. Conclusions

This paper has presented some preliminary results investigating the abilities of inductive learning algorithms to model Ranunculus presence in two UK chalk streams. Although this work is still in its preliminary stages a number of conclusions are beginning to emerge.

1) Although less accurate in terms of predictive abilities, the SWMLR and See5 models are much more parsimonious in terms of parameter usage.

2) If one wishes to identify those factors that have most influence on species abundance, SWMLR and See5 are appropriate models to choose. The SWMLR procedure clearly identifies those parameters as weighted factors in the final model. However, one must remember that SWMLR identifies the strongest combined linear relationship and care must be taken not to overlook excluded factors from the stepwise procedure. See5 identifies significant factors in terms of those with the most entropy i.e. those which are the best discriminators. In contrast, due to the black-box nature of the neural network models, it is not easy to identify significant predictors of species growth from them.

3) In terms of development time, the MLP takes much longer to train than equivalent RBF, SWMLR and See5 models.

In conclusion, the purpose of the model will influence its choice. If one is interested in exploring those factors most influencing species abundance, See5 is the most appropriate of the inductive learning algorithms presented. If one requires an accurate model quickly, an RBF is most suitable. If one needs the most accurate model available then an MLP is shown to be the most appropriate in this case.

Acknowledgements

The authors are grateful to the Environment Agency (Southern Region) and the Test and Itchen Association for funding the original macrophyte surveys.

References

Bishop, C.M. 1995. Neural Networks for Pattern Recognition, Clarendon Press, Oxford, UK.

Cestnik, B. Konenenko, I. Bratko, I. 1987. ASSISTANT 86: A Knowledge Elicitation Tool for Sophisticated Users, in Bratko, I. and Navrac, N. (eds), Progress in Machine Learning, Sigma Press, UK.

Chan, K.C.C. Ching, J.Y. and Wong, A.K.C. 1992. A Probabilistic Inductive Learning Approach to the Acquisition of Knowledge in Medical Expert Systems. Proc. 5th IEEE Computer Based Medical Systems Symp. Durham NC.

Ching, J.Y. Wong, A.K.C. and Chan, C.C. 1995. Class Dependent Discretisation for Inductive Learning from Continuous and Mixed-mode Data. IEEE Trans. PAMI, 17(7) 641 - 645.

Dai, H. and MacBeth, C. 1997. Effects of learning parameters on learning procedure and performance of a BPNN, Neural Networks, 10, 1505 - 1521.

Dawson, C.W. 1996. A Neural Network Approach to Software Project Effort Estimation, Applications of Artificial Intelligence in Engineering, 1, 229 - 237.

Dawson, C.W. and Wilby, R. 1999. A comparison of artificial neural networks used for river flow forecasting, Hydrology and Earth System Sciences, 3(4), 529 - 540.

Dawson, C.W. and Wilby, R.L. 2000. Hydrological modelling using artificial neural networks', Progress in Physical Geography, in press.

De Jong, K.A. 1975. An analysis of the behaviour of a class of genetic adaptive systems. PhD dissertation, University of Michigan. Dissertational Abstracts International Vol 36(10), 5140B.

Dietterich, T.G. Hild, H. and Bakiri, G. 1995. A Comparison of ID3 and Back Propogation for English Text to Speech Mapping. Machine Learning,18, 51 - 80

Gallant, S.I. 1993. Neural Network Learning and Expert System, MIT Press, London

Goldberg, D.E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning. Reading (MA), Addison-Wesley.

Holland, J. 1975. Adaptation in Natural and Artificial Systems. Ann Arbor: University of Michigan Press.

Jayawardena, A.W. Fernando, D.A.K. and Zhou, M.C. 1997. Comparison of Multilayer Perceptron and Radial Basis Function networks as tools for flood forecasting, Destructive Water: Water-Caused Natural Disaster, their Abatement and Control (Proceedings of the Conference at Anaheim, CA, June), IAHS Publication Number 239, 173 - 181.

Jayawardena, A.W. and Fernando, D.A.K. 1998. Use of Radial Basis Function Type Artificial Neural Networks for Runoff simulation, Computer-aided Civil and Infrastructure Engineering, 13(2), 91 - 99.

Kohohen, T. 1984. Self-organization and associative memory, Springer-Verlag, New York.

Kohohen, T. 1990. The self-organizing map, Proceedings of the IEEE, 78(9), 1464 - 1480.

Kubat, M. 1998. Decision tress can initialize radial-basis function networks, IEEE Transactions on Neural Networks, 9(5), 813 - 821.

Lek, S. Delacoste, M. Baran, P. Dimopoulos, I. Lauga, J. and Aulagnier, S. 1996. Application of neural networks to modelling nonlinear relationships in ecology. Ecological Modelling, 90(1), 39 - 52.

Magoulas, G.D. Vrahatis, M.N. and Androulakis, G.S. 1997. Effective backpropagation training with variable stepsize, Neural Networks, 10, 69 - 82.

Maier H.R. and Dandy G.C. 2000. Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications, Environmental Modelling and Software, 15(1), 101 - 123.

Mason, J.C. Tem'me, A. and Price, R.K. 1996. A Neural Network Model of Rainfall-Runoff Using Radial Basis Functions, Journal of Hydraulic Research, 34(4), 537 - 548.

Orr, M.J.L. 1996. Introduction to Radial Basis Function Networks, Centre for Cognitive Science, University of Edinburgh.

Quinlan, J.R. 1986. Induction of Decision Trees. Machine Learning, 1, 81 - 106.

Quinlan, J.R. 1988. Decision Trees and Multi-valued Attributes. In Hayes, J.E. Michie, D. and Richards, J. (eds), Machine Intelligence 11. Oxford University Press.

Quinlan, J.R. 1993. C4.5 Programs for machine learning. San Mateo CA: Morgan Kauffmann.

Recknagel, F. French, M. Harkonen, P. and Yabunaka, K. 1997. Artificial neural network approach for modelling and prediction of algal blooms, Ecological Modelling, 96, 11 - 28.

Rumelhart, D.E. and McClelland, J.L. (eds). 1986. Parallel Distributed Processing: Explorations in the Microstructures of Cognition, 1, MIT Press, Cambridge.

Schaffer, J.D., Caruana,, R.A., Eshelman, L.J. and Das, R. 1989. A study of control parameters affecting online performance of genetic algorithms for function optimisation. In Schaffer, J.D. (ed) Proceedings of the 11th International Joint Conference on Artificial Intelligence. Morgan Kaufmann, San Mateo.

Schalkoff, R.J. 1997. Artificial Neural Networks, McGraw-Hill.

Song, X.M. 1996. Radial Basis Function Networks for Empirical Modeling of Chemical Process, MSc Thesis, University of Helsinki, http://www.cs.Helsinki.FI/~xianming (28 January 1999).

Whitehead, P.G. Howard, A. and Arulmani, C. 1997. Modelling algal growth and transport in rivers: a comparison of time series analysis, dynamic mass balance and neural network techniques, Hydrobiologia, 349, 39 - 47.

Wilby, R.L. Cranston, L.E. and Darby, E.J. 1998. Factors governing macrophyte status in Hampshire chalk streams: implication for catchment management, Journal of the Chartered Institute of Water and Environmental Management, 12, 179 - 187.

1	Randomly create an initial population of individual character strings (for example, this can be achieved by generating a binary string using computer simulated successive tosses of an unbiased coin).
2	Assign a 'fitness' value to each individual in the population using a chosen measure (for example, mean squared error).
3	Create a new population by applying reproduction, crossover and mutation operations to the individual strings. These operations are applied to chosen population strings with a probability based on each string's fitness. Reproduction simply involves copying an existing individual into the new population. Crossover involves swapping characters between two strings from a randomly chosen starting point in those strings. Mutation involves creating a new string from an existing one by randomly mutating a character.
4	Evaluate the fitness of the new population.
5	If the convergence criteria has not been reached go to step 3.