Empirical CA simulation from a high resolution population surface

David Martin and Fulong Wu
Department of Geography, University of Southampton, Southampton, SO17 1BJ United Kingdom
E-Mail: D.J.Martin@soton.ac.uk, F.Wu@soton.ac.uk

Abstract

This paper deals with the application of a cellular automata simulation to urban growth using empirical data for the south east region of the UK. The paper discusses the trade-off between empirical and theoretical considerations when implementing CA simulations, and the nature of the empirical constraints which may be applied. Our case study uses a population surface modelling technique to construct an initial land use model based on 1997 postcode data, and demonstrates the application of two simulation models using alternative growth constraints, based on a variety of empirical data sources for the region. The models are implemented using Arc/Info and AML programs, and the paper includes some exploratory investigation of the resulting urban growth scenarios, demonstrating ways in which simulation in the GIS environment can aid interpretation of the CA results in a real world context.

1. Introduction

There is a growing literature on the application of cellular automata (CA) to simulate the growth of urban settlement forms (eg. Batty, 1998). CA allow researchers to view the city as a self-organizing system in which the basic land parcels are developed into various land use types. A model of the urban system is thus constructed by the aggregation of uncoordinated local decision-making processes. One of the most important potential uses for such simulations is their ability to model the impact of alternative planning regimes on the development process. CA applications based on hypothetical urban forms can provide valuable insights, but the interpretation of such modelling is hampered by difficulties in relating the modelled form to empirical combinations of settlement and constraints. The use of CA methods to model the future development of real urban systems is made particularly complex by the tension between self-organization and the application of empirical constraints.

This paper describes the application of CA to the simulation of urban growth in the south-east region of the UK, an area currently subject to considerable development pressure. The actual settlement pattern is initially modelled as a fine resolution grid using a population surface modelling technique originally developed for use with census area centroid data, (Bracken and Martin, 1995) but here applied to unit postcode information which offers greater spatial and temporal resolution than that available from the population census or conventional land use mapping. This application is a further development of SimLand (Wu, 1998a) which makes use of Arc/Info for spatial data management, with AML programs to permit the evaluation of a range of alternative local and regional constraints on the development process. We classify the wide variety of factors affecting development into static and dynamic ones. The success or failure of a seed becoming a developed land use depends on their combined effect on the self-organised process of local growth. This is further dependent upon the threshold which allows such a process to proceed. From the observation of land use states in two time periods, the distribution of land use changes is identified. In general, the threshold and its transformation are used to reflect three types of inputs: the growth rate which is related to economic activity, regional variation and policy control. With different thresholds applied, simulation can generate a series of scenarios of urban development. Development scenarios are treated not as place-specific predictions, but as possible realizations of the development process, from which a number of structural indicators can be derived. This research has a number of original features: it uses detailed empirical spatial data on a fine resolution grid; it integrates global effects with the local self-organisation mechanism of urban growth in a more explicit and parameterised way; it utilizes GIS functionality and thus provides a closer integration with other decision-making tasks; it searches the parameter space through a computationally intensive approach; and finally it uses structural indicators to compare the simulation results with the reality.

2. Cellular automata in urban development simulation

There has been a long tradition of urban modelling but conventional urban models built upon the neo-classical concept of equilibrium are static. Only recently have there emerged microscopic simulation approaches to understand urban dynamics (Batty, et al. 1997). Dating back to the spatial diffusion phenomenon modelled by Hägerstrand (1965), cellular models are drawing increasing research attention because the approach is essentially dynamic and thus appropriately characterising the urban changes. As a modelling framework, cellular automata (CA) have wide appeal due to their simplicity, intuitiveness, flexibility, and transparency. In essence, CA models adopt a computational approach, in the sense that they are only "solvable" through computation, in particular in the medium of the computer. The recent emergence of computational power has contributed to the popularity of CA. Especially, graphic-based systems like GIS provide huge potential for implementing CA models to simulate the changes of urban built environment.

Simulations using CA have been widely applied. However, early attempts were typically more in the nature of metaphors of urban growth with little explicit relationship to underlying behaviour theory (Couclelis, 1985; Batty and Xie, 1994). It is now becoming clear that the CA approach is essentially heuristic and therefore attention should be drawn to the plausibility rather than the 'correctness' of models. With a better understanding of the technique, CA simulation is at the stage of exploring more complex behaviours. In the literature, a variety of ways of defining the transition rules of CA models have been reported (White and Engelen, 1993; Batty and Xie, 1994; Wu and Webster, 1998). These exercises highlight the need for an integrated approach which combines CA's relatively simple abstraction with the behaviourally richer models of urban processes found in the social sciences (Webster, et al., 1998). The obstacle to incorporating the richness of urban models is not due to technical difficulty but rather the theoretical justification for a complex model. The notion that CA reveals complex global patterns as they 'emerge' from a set of simple local transition rules is absolutely right but urban development, for example land use conversions, are unlikely governed by simple rules. This becomes increasingly apparent as shown in the studies of political economy of urban land uses. With all these CA approaches a critical question remains regarding the way the transition rules should be interpreted in economic or other behavioural terms. The behavioural aspect of most CA simulation is still weak.

Despite a variety of modelling approaches, we can envisage a two dimensional matrix to generalise a general framework (Figure 1). The two dimensions are global vs. local rules adopted and the empirical vs. theoretical configuration. At each extreme, there are some well-known models. The strictly local rule associated with a pure theoretical configuration is often used in simple metaphoric models. These models follow some well-defined physical processes such as diffusion-limited aggregation (DLA) to simulate generic urban forms. Fractal properties are often presented as they can be seen in real-world cities (Batty and Longley, 1994). The insights generated from these models are not directly applied in the control of urban growth because of the abstraction of the model but they are useful in the sense of analogy - the fundamental similarity between the morphology of theoretical and real cities suggests a similar process might in fact provide a plausible characterisation of urban growth. Lying at the other extreme are empirical and global models, which are often "operational" and based on GIS. These typical cartographic models use methods such as map overlay and buffering. Factors affecting urban development such as access to roads, distance from the city centre, topography, and land uses are presented as may layers and superimposed and manipulated to generate the final "suitability" index. Often such a process is applied to the whole map area. The method is essentially static. More sophisticated cartographic models can be formalised through map-algebra. At this extreme are also empirical population density models. These models are built up from disaggregated spatial units and calibrated through statistical methods. While they may provide insights into how the population density is related, in a regression sense, to a bundle of locational factors, they do not describe the processes of population density change because a simple extrapolation of existing relationships is problematic. As shown in recent studies on the dynamics of urban spatial structure, the population density surface is of emergence nature, in the sense that a monocentric structure can evolve into a polycentric one if the same relationship is applied repeatedly (Wu, 1998b). The theoretical foundation of conventional urban models, however, is neoclassical urban economics that assumes the urban systems are always at equilibrium. The well-known theoretical model based on this assumption is the Alonso (1964) urban land use model. Most models in this category are theoretical ones with limited connections to the practices of urban and regional planning. But the city is a self-organising system, as shown by the pioneering work of Peter Allen in the 1970s (Allen, 1997). Between these extremes there are a vast number of hybrid models that mix the global and local rules and use different spatial resolutions and hence different realism of urban configurations.

Figure 1: A framework for the classification of urban simulation models

We believe that the design of an appropriate simulation strategy should consider the purpose of modelling and that the urban growth processes can be best articulated at a certain level of abstraction and balance of empirical vs. theoretical, global vs. local dimensions. In this research we propose a framework which allows such an appropriate simulation strategy to be implemented. We give explicit considerations to the parameterised global and local factors affecting urban growth.

In this work, we assess the development potentiality according to a number of key factors which will be discussed in detail below. Based on this assessment we use the Monte Carlo method to generate development seeds. This seeding process reflects the positive effect, i.e. stimulation of development factors on urban growth. As this is calculated globally, the emergence of seeds can be seen as the process used in conventional modelling but subject to stochastic effects. The multi- regional development factors, however, also play a constraint role. In this simulation, the constraint effect is only based on regional population growth projections. If simulated growth exceeds target growth, sites subject to development pressure will be randomly selected so as to fit the overall constrained rate of growth.

The seeding process reflects the spontaneous nature of urban growth but the history and existing land uses affect the development as well through local interactions. This is the self-organised aspect of urban growth. To reflect such a characteristic, the development situation is evaluated in a 3 x 3 local kernel, based on the development state at time t. This is a standard CA rule definition. The strength of local growth is calculated as the ratio of developed land to undeveloped sites but in the second model we introduce a more sophisticated measure to reflect the combined global and local effects. The local development pressure is then compared with a threshold to decide whether the site in question can be chosen as a potential development site. The threshold is a critical parameter which reflects how self-organisation might be started. An initial value is used but it is then adjusted according to the projected rate of growth. If the potential growth is lower than the projected one, the threshold will be lowered which means that it is easier for a self-organised process to start off. This adjustment is made during the simulation in which the threshold value varies from iteration to iteration. In theory, the threshold adjustment can reflect three types of effects: the growth rate which is related to population and economic activities, regional variations, and policy control. As a general framework, more rules can be defined separately for failed sites and successful sites but in this simulation we do not consider them to be controlled independently.

It is worth noting that the development factors in this simulation are updated according to the changed land uses at time t. In particular for the second model which will be elaborated later, development factors such as the local attractiveness are measured according to a gravity type of equation. The size of city clusters changes during the simulation and so does the distance to the nearest edge of settlement.

3. Population surface models

In this work, we have chosen to use a population surface model as the basis for our simulations. Conventional choropleth (shaded area) representations of population distribution suffer from the significant disadvantage that they imply a population density is present at every location on the map, whereas the actual settlement pattern comprises relatively dense clusters of populated land separated by extensive unpopulated regions. Various approaches to population surface construction have been proposed since Tobler (1979), but the variable kernel redistribution algorithm used here was first presented in Martin (1989), and developed with the particular characteristics of UK Census data in mind. This approach has been used to create a series of national surface models, described by Bracken and Martin (1995), and accessible to registered users via the World Wide Web at http://census.ac.uk/cdu/surpop/, which also provides more detailed background to the modelling technique than is appropriate here. The surface construction technique is most recently reviewed in Martin (1996a), which also examines the conceptual and technical differences between surface and zonal models of population distribution.

This method depends on the presence of centroid points for each small area for which population counts are available, and which may be considered to be 'population-weighted'. In the UK Census context, population-weighted centroids are provided by the census offices for each enumeration district (ED), the smallest zone used for the publication of census data in recent censuses. The surface construction algorithm visits each centroid in turn and examines its distance from other local centroids. This distance can be used as an indication of ED size in that region, and a distance decay function is calibrated and used to redistribute the population total at the centroid into the surrounding cells of a raster output matrix. Thus in areas of high population density, with centroids located close together, population may be spread very short distances from the centroid, reflecting small ED sizes. In remote rural areas, population may be spread over larger distances up to a predetermined maximum which is a parameter of the model. Thus many cells which in urban areas may recieve population from several different centroids, while others which are remote from any centroid remain unpopulated. The resulting model embodies a representation of the settlement geography which is one of its most important advantages over zone-based representations, and it is for this reason that we have chosen to use this approach to construct the initial model from which to run our urban development simulations. Although total population has been used in this example, the model may be applied to any count data present at the zone centroid locations. Other applications of census-based surface models produced using this approach, and again taking advantage of the reconstruction of the settlement pattern, may be found in Brainard et al. (1997), Lovett et al. (1997) and Mesev et al. (1995).

4. Data sources

The surface modelling technique described above has been applied to UK census data in a number of contexts, but the representation of the detailed settlement pattern is limited by the geographical resolution of the census data. Further, the decennial nature of the census makes it difficult to capture the continually evolving pattern of urban development. Although the census provides a rich range of socioeconomic variables, it is considered that spatial detail and timeliness are of more importance to the current study, and this has led us to consider alternative data sources which might be used with the same modelling procedure. In the UK, the postal geography is the most widely used georeferencing system for socioeconomic data outside the census, and we have therefore opted to apply the technique described above to the postcode system. In order to provide the initial model of population distribution, data have been taken from the directory of enumeration districts and postcodes. Unit postcodes are the smallest component of the UK postcode system. EDs typically contain 200 households and 400 residents, while unit postcodes each refer to around 15 postal addresses. The directory of enumeration districts and postcodes was originally created in association with the 1991 census. The file contains a record for each unique ED/postcode intersection, containing a 100m grid reference for the postcode, a household count and some additional information. Although these grid references are not population-weighted, they represent the location of at least one of the properties known to fall within the smallest unit of the postal system, and the far greater number of data points permits a far more detailed representation residential residential geography than is possible with the ED centroid data.

As the postcode geography has changed, the directory has been kept up to date and periodic revisions published which relate 1991 census EDs to contemporary postcode geography. No postcodes are removed from the file, but terminated and reused codes are flagged. New codes are assigned a grid reference and may hence be associated with an ED. The 1995 and 1997 versions of the file also contain a large or small user indicator for each postcode. Large user postcodes typically receive over 25 items of mail per day and are usually commercial addresses. Only the current postcodes have been used from each directory, allowing the resulting models to approximate to postal geography in 1995 and 1997. A household count of 15 has been assigned to any new small user postcodes for which household counts are unavailable, representing the typical number of addresses per household. There are thus around 850,000 data points available for surface modelling. These locations have been used to provide the centroids for the construction of surface models for a 300km x 300km area of the South-East of England, centred on the London metropolitan area, with a cell size of 200m, resulting in models with 1500 x 1500 cells. Household counts have been used as the variable for redistribution, so that each output model is effectively a household density surface. The basic household surface for 1997 is shown in Figure 2, in which unpopulated regions are shown as white, and increasing population density is represented by darker shades of grey.

Figure 2: 1997 Household density surface

In order to constrain the amount of population growth occurring in the simulation models, a series of official population estimates based on 1993 mid-year population estimates at county level have been used. Overall, the area is due to experience around an 8% growth in population to 2016, with the largest growth in a band running from East Anglia, to the north east of the area, around the outside of the metropolitan area to Berkshire in the west. The major urban areas experience static population totals or a small decline. The study area covers all or part of 30 counties, for which population projections to 1996, 2001, 2006, 2011 and 2016 are available. A county map has therefore been created with associated growth rates to constrain the simulation within the officially projected population growth levels. A map of projected population change by county is shown in Figure 2. White represents no projected chage, with increasing green intensity representing projected growth, and red intensity projected fall in population.

Figure 3: Projected population change by county 1993-2016

Major factors affecting the attractiveness of residential areas throughout the region include commuter travel times to London, and accessibility to the national motorway network. Commuters travel to London from everywhere within the study region, and the dominant mode of transport for this long-distance commuting is by train. A London travel time surface has therefore been devised by taking fastest travel times to the appropriate London terminus arriving between 0800 and 0900 on a weekday, as this represents the timing most likely to be used by commuters. Times have been obtained from 130 major stations. Commuter route information was extracted from McGhie (1992) and times were obtained from the Railtrack travel enquiry site at http://www.railtrack.co.uk/travel/. Travel times for each individual cell have been calculated by estimating travel time to the nearest station by crow-fly distance assuming a mean travel speed of 60 km/h, and these times have been added to the fastest available train time. Special treatment has been given to the Isle of Wight, in the centre of the South Coast, which is the only genuine Island in the study area containing a significant population, and for which travel times have been increased by an amount equivalent to the necessary passenger ferry crossings. In deriving the full travel time surface, the assumption has been adopted that if a major station is located within 10 minutes' drive time of a given cell, the commuter journey will be routed via that station. At greater distances, the shortest travel time to London is applied, regardless of the distance to the station which must be used. These general assumptions are felt to better reflect commuter behaviour than a simple Euclidean allocation of cells to stations. The travel time surface is illustrated in Figure 3, with increasing red intensity indicating travel time proximity to a London rail terminus, and regional stations shown as green point symbols.

Figure 4: London commuting travel time surface

A further aspect affecting development potential in this region is accessibility to the national motorway network, and a complete set of motorway access points has been digitized, and distances derived to each cell. Only sections of motorway which are connected to the principal motorway network are included (thus no account is taken of short isolated lengths of urban motorway). The motorway access surface is shown as Figure 4, with increasing blue intensity indicating proximity to a motorway junction, and junctions shown as yellow point symbols.

Figure 5: Distance from access points to national motorway network

5. Implementation

As mentioned above, this simulation model uses a high resolution population surface as the data source. The resolution of the surface is 200m. The study area covers an area of 300 x 300 km, resulting a grid size of 1500 x 1500. Therefore, in order to maintain spatial details, the link to a GIS is strongly recommended, although this may lead to a less efficient algorithm from a computational point of view. But the loss in computation efficiency is compensated by easy graphic display and the wide range of geo-processing functions available in GIS.

The initial state of the land use is derived from the processed 1997 postcode coverage. We aggregated the large user postcodes, typically commercial users, with the small user postcodes, suggesting the boundaries of existing urban built-up areas. The surface has been constructed using custom Fortran programs, and contains estimated household counts in each cell, derived from the directory of enumeration districts and postcodes. This image provides the initial land use map on which we base our simulation. We assume that the expansion of urban built-up area is at the same rate of projected population growth, which is an 8% growth over the South-east region. In the model, however, we use the projected growth rates on the basis of counties. The target growth rates are then translated into the increase of the number of cells per iteration. The number of cells for new developments is then used as the constraint. The simulations are run over 23 years, which gives a final scenario of year 2020. The area that is not suitable for urban development such as sea and coastal water areas are coded as nodata and thus excluded from the site selection process.

The two major factors affecting the attractiveness of residential areas throughout the region are commuter travel times to London and accessibility to the national motorway network. The two surfaces are standardised according to the maximum and minimum values. Here we use a dynamic version, that is, we flag potential development sites on each iteration and then standardise these cells. Therefore, the attributes of these two factors always range from 0 to 1. These two attributes are then added to give the overall attractiveness for residential use. This can be seen as a standard land evaluation process in the form of the multicriteria evaluation (MCE). Basically, an evaluation score can be calculated by weighted summation of standardised development factors. But here we just use equal weights of these two development factors. This suitability score, however, is seen as a measurement of global attractiveness. The development of a site is further dependent on the local evaluation.

We use the Arc/Info GIS as a data management tool, together with AML programs to permit the evaluation of a range of alternative local and regional constraints on the development processes. The simulation model is a development of the SimLand (Wu, 1998a), but differs in that it treats the global and local evaluation in a more explicit way. In SimLand previously developed, the local evaluation is considered just as one development factor in the weighted summation of the suitability score. In this model, however, we treat the effect of local growth as a critical factor that finally decides whether the land use can be successfully converted. The general procedure is outlined in Figure 6. The strength of local evaluation is compared with a threshold to determine whether the self-organisation process can start off. The score calculated from the global development factors only determines the probability of seeding new development sites.

Figure 6: Procedure for the simulation of urban growth in SE England

The seeding probability is based on the summation of commuter times to London and the distance to motorway junctions. However, it is obvious that the relationship between the score and probability should be non-linear in a similar form of logistic or Poisson distribution. Here, the site generating the highest score at the time of development is thought as a benchmark. The probability of this highest score site being developed is 1. The probability of development decreases with decreasing scores. The non-linear transformation is then used to depress the probability away from the maximum score in order to achieve greater discrimination between cells in any one simulation. In fact, this can be thought as some sort of distance decay - the probability of development decays along with the distance to the ideal site. The equation used is therefore:

[1]

where p^t_ij is the probability of land conversion from vacant to urban use at the location ij at time t; R^t_ij is the land suitability score at the same location at time t; R^t_max is the maximum score of land suitability at the simulation time t of calculation; and alpha is the dispersion parameter. The value of the dispersion parameter governs the stringency of site selection, with a higher value reflecting a more stringent selection process. We have tested the value in the previous simulation and thus in this model the value is chosen as 5.0.

The seeding sites are then generated according to the Monte Carlo process in which the probability score is compared with a random value. To control the seeding rate, not all sites will be considered. Instead, a fraction of developable sites are chosen. The ratio of this fraction to the total developable sites is another parameter which can be controlled in simulation, thus varying the model from a sort of CA to a conventional model. The sites generated from the seeding process, albeit non-deterministic, follow general global factors.

In this simulation, we consider the open space within the existing built-up area to be a special type of use, for example, cemeteries, parks and other protected urban spaces. These areas surrounded by existing development in the city area are frequently subject to planning controls and are therefore unlikely to be developed. We use a GRID function called "fill", often used in hydrological modelling to fill the sink of a surface, thus identifying these areas and preserving them during the simulation. Thus, although these areas are very attractive according to the multicriteria evaluation function discussed above, they are excluded from the consideration of further development. Two different approaches of computing local growth strength have been evaluated. The first model simply counts the number of developed sites in a 3x3 neighbourhood. The second model incorporates a spatial interaction equation to calculate the local attractiveness. This considers the size of each continuously extended settlement and the shortest distance to the edge of that settlement. To reflect a non-linear relationship, we use the logarithm of the size of settlements. The two methods will both lead to an attribute describing the strength of local growth. The value is then standardised on these sites available for development. The distribution of the score, however, is not controllable at each iteration because the attribute is calculated on the basis of a local kernel. The value further depends on the distribution of developed sites or the shape of settlements which differs from iteration to iteration.

The initial threshold adopted is 0.8 which means that sites seeing a local growth strength exceeding 80% of the strongest sites can be developed. This is then used to generate a temporary land use grid. The distribution of land development is then evaluated and the number of converted cells is summed over counties. For the counties in which the actual growth is lowered than expected, a lower threshold value is adopted to generate the final land use; but for the counties in which the actual growth exceeds the projected growth, these sites will be randomly selected. This is a simplified method of constraining the development rate to the target one, in consideration of the computation involved. Ideally, we can use a loop to adjust the threshold in small steps until it generates the target growth rate. The concept lying behind this adjustment of the threshold is that the process of self-organised growth is dependent upon the demand for land. When the demand is very high, in the sense that there is a gap between the supply from the existing method of development and the demand due to population growth, a slight growth might lead to more growth in the neighbourhood and finally much stronger agglomeration

The two methods, representing how development should agglomerate, may lead to quite different spatial forms. For a simple count of the local number of developed sites, a seed can lead to a more dispersed growth, because the seed is considered as equally important as the existing urban land plot. In the second model, however, the local attractiveness considers further the location of developed sites and the size of settlements. Thus, the area nearby a larger settlement is more attractive than the one locating nearby a small settlement or a scattered seed.

6. Simulation results

The land uses at the year 2020 according to the different way of perceiving how the local agglomeration effects the urban growth are presented in Figures 7 and 8. The results are illustrated for this smaller area in order to show the nature of the differences between the models, which are not so clearly visible if the entire region is viewed at once. The first model results in small clusters of development which are widely dispersed across the entire study region, while the second produces development which is much more heavily concentrated around the edges of the existing major metropolitan areas.

Figure 7: London area results for simulation 1 (blue: developed at start; green: developed by model; red: undeveloped; black: sea)

Figure 8: London area results for simulation 2 (blue: developed at start; green: developed by model; red: undeveloped; black: sea)

The initial 300 x 300 cell land use model derived from 1997 postcode information has 1.4 million cells which are excluded as sea. Of these, 297827 have urban land uses, spread across 37815 separate settlements (where a settlement is defined as one or more adjacent urban cells surrounded by undeveloped cells). In simulation 1, 12570 new cells are developed, compared to 10265 cells in simulation 2. This difference is accounted for by the use of county-level population projections to constrain the amount of development which is allowed to take place. The second model does not provide sufficient highly attractive cells for development to meet the population requirements of every county, and counties not adjacent to major urban areas remain underpopulated at the end of the simulation.

The two simulations represent rather different development constraints, resulting in alternative development patterns. Our primary concern in this paper is with an implementation of the proposed methodology, rather than with substantive urban development issues, but a brief exploratory analysis is given here, with some further comments on the ways in which the GIS environment could be used to interrogate the results. These models are not intended to predict whether or not development will take place at a particular site, but rather to provide a means of understanding the general nature of the development pattern which would result from a particular combination of constraints. Some initial approaches to the description of change between population surfaces is presented for the same region using 1981 and 1991 census data in Martin (1996b), and similar tools are used here.

Table 1 shows the distribution of the sizes of developments. The table shows for both simulations, the frequencies of settlements of different sizes (mesured by numbers of 200m x 200m cells). The continuously built up area of London comprises 36976 cells in the initial population model. In simulation 2, this area grows to a final size of 46934 cells, compared to 41005 cells in simulation 1. Similarly, the other largest urban areas grow to larger final sizes in simulation 2. It is important to note that the total number of cells developed is not the same between the two models, due to the working of the county-level constraints. It is not possible to provide direct measures of the population sizes of settlements changing at different rates, as the development process involves a high degree of merger of adjacent small settlements, and the absorbtion of small settlements into nearby large ones. Cross-tabulation with population estimates or a neighbourhood classification scheme would permit a more detailed analysis of the types of places most affected by new development under any given scenario.

Table 1: Size distribution of settlements at end of simulations

Number of additional cells	Simulation 1	Simulation 2
1-3	25613	27496
4-15	10274	9645
16-63	1552	1612
64-255	274	309
256-1023	81	82
1024-4095	19	22
over 4096	2	2

One way to examine the difference between the two simulation models is to consider the distribution of development by distance from Central London. Trafalgar Square has been taken as an arbitrary central point, and 25km distance bands computed. The number of developed cells in each band may then be compared for the initial model and each simulation. Table 2 clearly reveals the different distribution of development across the London region resulting from the two simulations. In simulation 1 development occurs across all distance bands, while in simulation 2 it is heavily concentrated in those areas less than 50km from the central area.

Table 2: Percentage increase in area of developed land by 25km distance bands from Central London

Distance band (km)	Developed cells at start	Simulation 1 (%)	Simulation 2 (%)
0-25	34250	1.5	6.3
25-50	45924	6.4	8.6
50-75	49150	5.4	2.3
75-100	46857	4.8	2.2
100-125	38281	4.6	2.1
125-150	36258	3.5	1.5
150-175	34223	2.8	1.3
175-200	13625	3.5	1.6

7. Discussion and conclusion

Only the simplest characteristics of the available datasets have been exploited here. In particular, the distinction between large and small user postcodes has not been used in any attempt to separate out residential and commercial land uses. The postcode dataset is not ideal for this purpose, but it would be possible to generate an acceptable geographical model for commercial properties if a differentiated land use map were required. Separate treatment of commercial land in this way would require the acquisition of plausible county-level growth scenarios for commercial activity, as this is not directly governed by population change. Assumptions have also been made in the present models about population density, with all new (greenfield) development taking place at uniform density. A more sophisticated model could involve varying the development density according to the location and size of development taking place, in order to capture a more realistic range of values. Clearly, a more sophisticated use of these types of simulation tool would require more careful formulation of the empirical constraints on development in various ways. Empirical research on the determinants of land prices for development, and the attractiveness of competing sites would allow a more direct quantification of the effects of commuting distance and local property markets, for example. One aspect of this work which would need to be addressed in attempting a more thorough real world application would be to scale the simulations in order to meet the projected population totals in the different counties. This would produce a more realistic distribution of new development across the entire region.

The objective of these simulations has not been to provide a single predictive model of urban development in the SE region of England to the year 2020. Rather, our interest has been in exploring the ability of CA simulation to produce plausible development scenarios in a real world situation, and to incorporate a range of both theoretical and empirical development constraints. We suggest that there is a valuable role for CA simulation in presenting alternative scenarios for future urban growth, which are based on detailed real-world data and constraints. This represents a combination of theoretical and empirical considerations.

This kind of application of simulation to real world scenarios must be linked to GIS in order to maintain spatial details and to take advantage of easy graphical manipulation and display. It would be quite difficult if not impossible to build the whole process of simulation outside GIS. For example, the special treatment of open spaces within existing urban areas makes use of a hydrologic surface function which is readily available in the GIS environment. Our exploratory analysis, which relates the resulting urban growth patterns to settlement sizes and distances from the urban centre are also dependent on the spatial functionality avialable within the grid modelling module of the GIS software. These examples illustrate how the available GIS functions can be readily accessed during simulation to avoid the need for reprogramming of spatial analysis functions. Considering the computational efficiency, a hybrid system that combines the GIS functionality and dynamic process models seems to be an appropriate solution.

References

Allen, P. (1997) Cities and regions as self-organizing systems: models of complexity. Gordon and Breach Science Publishers, Amsterdam.

Alonso, W. (1964) Location and land use. Harvard University Press: Cambridge, Mass.

Batty, M. (1998) Urban evolution on the desktop: simulation with the use of extended cellular automata Environment and Planning A 30, 1943-1967

Batty, M., Couclelis, H. and Eichen, M. (1997) Urban systems as cellular automata, Environment and Planning B 24 159-164

Batty, M. and Longley P. A., (1994) Fractal Cities. Academic Press, London.

Batty, M. and Xie, Y. (1994) From cells to cities Environment and Planning B 21 531-548

Bracken, I. and Martin, D. (1995) Linkage of the 1981 and 1991 UK Censuses using surface modelling concepts Environment and Planning A 27, 379-390

Brainard, J. S., Lovett, A. A. and Bateman, I. J. (1997) Using isochrone surfaces in travel-cost models Journal of Transport Geography 5 (2), 117-126

Couclelis, H. (1985) Cellular worlds: a framework for modelling micro-macro dynamics Environment and Planning A 17 585-596

Hägerstrand, T. (1965) A Monte Carlo approach to diffusion Archive of European Sociological, VI, 43-67.

McGhie, C (1992) The Royal Insurance London Commuter Guide Whitley, Good Books

Lovett, A. A., Parfitt, J. P. and Brainard, J. S. (1997) Using GIS in risk analysis: a case study of hazardous waste transport Risk Analysis 17 (5), 625-633

Martin, D. (1989) Mapping population data from zone centroid locations Transactions of the Institute of British Geographers NS, 14, 90-97

Martin, D. (1996a) An assessment of surface and zonal models of population International Journal of Geographical Information Systems 10, 973-989

Martin, D. (1996b) Depicting changing distributions through surface estimation in Longley, P. and Batty, M. (eds.) Spatial analysis: modelling in a GIS environment Cambridge, GeoInformation International 105-122

Mesev, V., Longley, P., Batty, M. and Xie, Y. (1995) Morphology from imagery - detecting and measuring the density of urban land-use Environment and Planning A 27 (5), 759-780

OPCS (1995) 1993-based subnational population projections PP3 No. 9, London, OPCS

Tobler, W. R. (1979) Smooth pycnophylactic interpolation for geographical regions Journal of the American Statistical Association 74, 519-530

Webster, C. J., Wu, F. and Zhou, S. (1998) An object-based simulation model for interactive visualisation Proceedings of the 3^rd International Conference on Geocomputation, University of Bristol, 17-19 September 1998. http://www.ggy.bris.ac.uk/geocomp/cdrom/32/gc_32.htm

White, R. and Engelen, G. (1993) Cellular automata and fractal urban form: a cellular modelling approach to the evolution of urban land-use patterns Environment and Planning A 25 1175-1189

Wu, F. (1998a) SimLand: a prototype to simulate land conversion through the integrated GIS and CA with AHP-derived transition rules International Journal of Geographical Information Science 12, 63-82

Wu, F. (1998b) An experiment on generic polycentricity of urban growth in a cellular automatic city Environment and Planning B 731-752

Wu, F. and Webster, C. J. (1998) Simulation of land development through the integration of cellular automata and multi-criteria evaluation Environment and Planning B 25, 103-126.