USING NEUROCOMPUTING METHODS TO CLASSIFY BRITAIN'S RESIDENTIAL AREAS

Stan Openshaw, Marcus Blake School of Geography, Leeds University

Colin Wymer, Department of Town Planning, Newcastle University

1. Introduction

The explosion of spatial information occasioned by the GIS revolution and the ease by which individual databases can now be aggregated to small zones (i.e. census ED's, postcode sectors, wards) emphasises the importance of being able to simplify the resulting multivariate complexity. The Small Areas Statistics (SAS) can be regarded as providing a good example of a generic problem of world wide significance. An ability to apply a multivariate classification procedure to reduce census data, for the 145,716 1991 census EDs in the UK, to a relatively small number of major types of residential area is extremely useful. For instance, most commercial geodemographic classifications segment Britain's residential areas into about 50 types based on a mix of census and non-census data for large numbers of small areas as a means of adding value to data that otherwise could not be used in an area profiling and targetting context.. The objective here is somewhat simpler in that only census data are of interest, although the challenge is harder in that the aim is to provide for research and academic purposes the best possible classification of 1991 census EDs for Britain. Previously only the 1981 Super Profiles classification (obtainable from the Essex Data Archive) was freely available for academic purposes.

In general, the quality of any area based census classification reflects three major factors. First, the classification algorithm that is used, second, the manner and extent to which knowledge about socio-spatial structure is used and is represented in the classification, and third, the sensitivity of the technology to what can be condensed to be the geographical realities of the spatial census data classification problem. In much previous classification research the quality and utility of the end product has been regarded as being critically dependent mainly upon the performance of the classification algorithm that is employed. Under this assumption, it is quite reasonable to believe that better classifications can only be produced by the application of improved classifiers; see for instance, Openshaw and Wymer (1994) and Evans and Webber (1994). There is, however, one major flaw in this approach; namely, there are limits to the degrees of improvement likely to be possible solely by developing or using improved classifiers. Maybe the possibility exists to become more intelligent in the way these classifications are developed and thus seek a quantum leap in performance rather than just a marginal gain of dubious value; Openshaw (1994a). Indeed, this goal of seeking to inject more intelligence into the total classification process rather than to continue down a narrowly focused purely classifier algorithm route is very important. It may also help explain the anomaly whereby the differences in end-user performance between a hastily cobbled together classification and one rigorously produced after a massive expenditure of research effort are not particularly noticeable (see Charlton et al, 1985). In short, it has long been apparent, but perhaps not sufficiently clearly recognised, that the really critical limitation on the performance of small area census classification is not the classification algorithm per se but the other steps in the process. Developing Artificial Intelligence based classifiers is only part of a wider process of being more intelligent in how we go about building better spatial data classifications.

2. The Census Classification Process

The classification of census data usually involves the following steps:

Step 1. Decide upon the purpose of the classification

Step 2. Select variables and data to meet this purpose

Step 3. Apply a classifier to the data

Step 4. Evaluate the results and select the most appropriate number of clusters

Step 5. Label the clusters

Step 6. Embed the results in some easy to use end-user system that allows the classification to be linked to the postcode geography.

It is noted that the critical stages are highly subjective and operational decisions made there may substantially determine the utility of the results (Openshaw and Gillard, 1978). Clearly there is no easy way of turning the whole process into an optimisation problem since there is no single global function that can be optimised which would simultaneously meet all the goals. For example, there is no simple relationship between optimising a statistical measure of classification performance such as the within cluster sum of squares and the end-users perception of classification performance in a particular context. Indeed, the quality of the results is dependent in some little understood way on the performance of the classifier in step 3, on the usefulness of the results in step 6, on the extent to which the step 5 cluster labels "make sense" or correspond to what is known about socio-economic and demographic structure, and on the degree to which the variables and data selected in step 2 deliver results that are perceived to be useful at the end of the process. Note also that the perceivers of usefulness of the results may be a number of different users and not just one possessed of a homogeneous point of view.

One solution is to create 50 or 100 classifications based on different numbers of clusters (and variables), and then select whatever works best in a particular application. Various cross classification validation methods can be used to automate this decision, see Openshaw (1994b). However this `intelligent geodemographic' approach is probably more relevant to applied commercial uses of census classifications than in a research context, although the principles are transferable. It is also clearly an extreme response and that before commending its universal adoption it is necessary to investigate whether or not the needs of many research users would be better satisfied by developing a single good classification. This again re-emphasises the design of the classification algorithm (Step 3).

The conventional best practice is to employ a K-means nearest neighbour classifier to spatial census data that has been orthonormalised. Table 1 outlines some of the problems associated with this approach. The next question now is whether or not it is possible to greatly improve this technology? There are three ways of becoming smarter in developing a census classification.

(1). Improve the classification procedure by switching to superior algorithms, for example, simulated annealing approaches and neuroclassification methods;

(2). develop a means of incorporating knowledge into the classification process; and

(3). discover how to make the cluster assignment or allocation stage more sensitive to the nature of the task.

Until fairly recently, insufficient computing power was available to allow much or any progress to be made on any of these themes; for instance, a key requirement in designing new classifiers is computational tractability when faced with 145,716 or so zones to process. There has been, therefore, a tendency to simply continue using old methods dating from the early 1960's on larger and larger data sets. As a result this has lead to a failure to evolve new approaches that are really needed to ensure that the census classification challenge is being properly addressed. This has also been a failure to properly appreciate the complexity of the problem. The relative ease by which virtually any multivariate classifier when fed with virtually any set of census variables produces plausible results has tended to disguise the inherent difficulty of the task.

3. A Neuroclassification Procedure

Some of these problems might be avoidable by using a more spatially sensitive census data classifier that can also provide a good representation of the natural levels of fuzzyness that seem to characterise spatial data in general and census data in particular. One of the key characteristics of census data is that EDs vary in size and thus the level of the precision and resolution of the data varies geographically. This aspect is often ignored. The problem is that this variation is not random but is spatially structured as it tends to reflect systematic urban rural differences and population density. This is partly due to a small number of problems which result in the most extreme results being found in the smallest areas which will tend to be more homogeneous and rural rather than urban. On the other hand, the largest EDs tend to be urban and often have very mixed characteristics, but their size produces data values which are much more accurate than is the case for small areas. The conventional classifier gives equal weight to each ED and thus will tend to focus on the more extreme results that will tend to represent best those areas for which the census data are least reliable. This is the opposite of what the geographer might wish to happen. As a result many of the larger urban EDs with mixed characteristics will tend to be "poorly" classified and there may well be a number of different possible allocations. Of course this may turn out to be a geographical fact of life with census data; however, it is worth investigating whether or not it might be reduced by using either much larger numbers of clusters in a conventional classifier or by switching to a more data sensitive classification process.

Another feature of UK census data concerns the mix of 100 and 10 percent data coding and the data blurring employed by the census agency to ensure confidentiality. These effects are partly handled by taking into account ED size variation but they also operate on an individual variable level. A small area may have highly accurate data for some variables and highly uncertain data for others. It would seem important that these data uncertainties are taken into account by the classifier rather than simply ignored.

It is with these factors in mind that Openshaw (1994) argues that the use of an unsupervised neural net based on Kohonen's self organising map (1984) provides the basis for a much more sophisticated approach to spatial classification that reduces the number of assumptions that have to be made and neatly incorporates many of the sources of data uncertainty; see Openshaw (1994a). A basic algorithm is as follows:

Step 1. Define the geometry of the self-organising map to be used and its dimensions. Here a grid with 8 rows and 8 columns is used.

Step 2. Initialise a vector of M weights (one for each variable) for each of the 8 by 8 neuronal processing units.

Step 3. Define the parameters that control the training process: block neighbourhood size, training rate, and number of training iterations.

Step 4. Select a census ED at random but with a probability proportional to its population size.

Step 5. Randomise the vector of M variable values to incorporate data uncertainty computed for each variable separately (optional).

Step 6. Identify the neuron which is "closest" to the input data.

Step 7. Update the winning neuron weights and those of all other neurons in its block neighbourhood or vicinity.

Step 8. Reduce slightly the training parameter and the block neighbourhood size.

Step 9. Repeat steps 4 to 8 a very large number of times.

If Step 4 is replaced by a sequential selection process and Step 5 is ignored then the algorithm is essentially the same as a K means classifier; with a few differences due to the neighbouring training which might well be regarded as a form of simulated annealing and it maywell provide better results and avoid some local optima. However, from a geographical perspective Step 4 is extremely important because it provides a means of explicitly incorporating spatial data uncertainty into the classification process. The method also provides a very natural means of handling cluster fuzzyness without having to impose an arbitrary metric; since the distance between the best and the next best neurons can be readily measured.

The simplicity of the self-organising map approach readily lends itself to ad hoc modification designed to improve the quality of the geographic representation offered by the classification. There are various ways of meeting this objective. the simplest is to select an ED, as in the standard algorithm described previously, but then to use a distance weighted average value for the k nearest ED's. This neighbourhood in geographic ED space is gradually reduced as the block neighbourhood in the self-organising map's topological space is also reduced, slowly over many millions of iterations. The logic is to incorporate some notion of local geographical neighbourhood structure into the classification. Here the geographic neighbourhood is limited to the 10th nearest neighbour of each ED.

Another way of attempting the same objective is to change the updating mechanism (see Steps 6 and 7 in the basic algorithm) to update neurons assigned to the k th nearest geographical neighbours of the ED being used for training at any particular instance, irrespective of whether these neurons are within the block neighbourhood of the winning neuron. Experimentation suggests that the `OR' rule is slightly better than the `AND' rule. Equally, restricting the neuron updating to only the geographical neighbour related neurons also yielded slightly poorer results. However, the resulting classifications seemed to offer levels of descriptive resolution equivalent to conventional cluster systems with many more cluster in them.

The principle disadvantage of neuroclassification concerns the computationally intensive nature of the method. If the technique is to properly handle and represent the 150,000 cases then large numbers of training iterations (Step 3) are required. In a census application an ability to represent the data is much more important than any generalisation to unseen data; since there is none. This requires many millions of training iterations; indeed runs of up to one billion iterations have been investigated. In practice this means that parallel implementations are required and a parallel supercomputing version is under development. However, it is worth noting that a conventional classification of 150,000 ED's may well require 200 passes through the data. This does not seem much but nevertheless it would be equivalent to 30 million training iterations and this conventional classifier is much harder to parallelise or vectorise in any worthwhile manner

Finally, Tables 2 and 3 briefly summarise some of the strengths and weaknesses of a spatial neuoclassification approach.

4. Some Empirical results

Following the census classification process as described in section 2, a set of 85 broadly representative 1991 census variables were derived; see Blake and Openshaw (1994) for a full description. These variables are listed in Appendix 1. A conventional iterative relocation procedure is then used to create flock of cluster systems with between 2 and 2,000 cluster in them. The CCP (Census Classification Program) software is described in Openshaw (1983) and is still available at MCC for research uses. Figure 1 shows a plot of the average percentage within cluster sum of squares for these classifications. The resulting is very smooth and would apparently confirm the general view that somewhere between 40 and 70 clusters is needed to provide a useful classification of Britain's residential areas; indeed most 1991 census geodemographic systems offer less than 60 cluster solutions. However this disguises the fact that some variables are much better represented than others; for example, variables such as older couples (35-54) without children and couples aged 55-74+ (denoted as A and B in Figure 1) are not well represented. It is this application or data specificity that Openshaw (1994a)'s Intelligent Geodemographic Targetting System (IGT/1) attempts to exploit. In the present context it merely means that a general purpose census classification with a fixed number of clusters will not satisfy all purposes equally well, but maybe in a research context with the cluster codes being used as a simple index summarising multivariate complexity, it is of a little consequence.

For current purposes the neuroclassifier is run with an 8 by 8 matrix of neurons for the 85 variables listed in Appendix 1. A total of 200 million training iterations were used. Step 5 was omitted to reduce the run-time on a workstation to 5 days. The labels that were derived for the resulting clusters are listed in Appendix 2. Comparisons with conventional classifications suggest that the differences appear to be slight in a qualitative sense. Quantitative comparisons are more difficult because it is not clear as to what the performance measure should be.

It seems then that any preference for a neuroclassifier requires both a significant amount of faith and a judgement about the relative merits along the lines of Table 1 to 3. This can be back-ed up by an assessment of whether the results are plausible. Figure 2 shows the distribution of the principle residential areas types in Sheffield. This stands up well to both local knowledge and previous research (Haining, Wise and Blake, 1992) on area types in Sheffield. For example, on a broad scale, the classic east-west division found in many industrial cities can be seen, with the affluent west and south-west of the city being dominated by the affluent and climbing categories while the city centre, and east of the city has more struggling and aspiring areas, see Figure 2.

5. Fuzzyness in the classification of small area census data

An illustration of the apparent complexity of the census data classification process is provided by allowing fuzziness to occur in the cluster assignment stage. Openshaw (1989a & b) suggests that there is a particularly easy way of incorporating spatial data uncertainty into the spatial classification process. This illustrates the two principal sources of uncertainty; fuzziness in the geography space and fuzziness in the classification space. Traditionally, neighbourhood effects in geography are regarded as a spatial phenomenon in that people who live "near" to each other tend to share some behavioural characteristics despite other differences. In geodemographic classifications these effects have been implicitly exploited at the enumeration district scale; hence why these classifications are sometimes referred to being neighbourhood classifications. However this is an extremely crude representation of a highly complex and high variable spatial phenomenon. Geographers in the GIS era should really be able to do better than this and regard spatial neighbourhood effects in an elastic fashion rather than at a discrete ED geography space. Similarly fuzziness in the classification space should be exploited rather than ignored. Areas may differ by only very small amounts in the classification but be assigned to very different clusters. This is particularly important with census data because of lack of social homogeneity of the census ED and the tendency of the classification process to focus on highly distinctive minority characteristics of areas due to small number effects. As a result, it is likely that in many classifications the distinguishing cluster descriptions are minority features that are either created by aggregation effects at the ED scale or represent a profile based on the mixture of different individual household types. It is with great regret that in the UK there is currently no data available which can be used to measure these effects. The ecological fallacy problem needs to be handled rather than ignored in census classifications. Openshaw (1994a) provides a specification of a fuzzy geodemographics system to try and handle these problems. This can be demonstrated by using the results of the neuroclassification procedure.

The first aspect to consider is the structure of the K th nearest neighbour distances in the classification. Figure 3 shows the histogram of the number of different clusters "near" to each ED. It suggests that perhaps a surprising number of EDs are "near" to more than one cluster and could in fact be assigned with only a relatively small degree of error to a different cluster all together. Figure 4 shows a map that identifies the location of these "uncertain" EDs in Sheffield. Relatively few areas seem to be without some classification uncertainty. This measure of fuzziness is however only partial in that geographic neighbourhood or distance effects are excluded. It perhaps matters less if an ED can belong to two or more different clusters if these clusters are located nearby than if they are a long way off. The converse may also be important; that is neighbouring EDs should perhaps tend to belong to the same or similar cluster types.

To illustrate the further effects of fuzziness, Table 4 provides a cross tabulation of the census EDs in Britain by different levels of uncertainty in both the geography space and the classification space for a few illustrative cluster types. It is immediately apparent that a small amount of fuzziness soon introduces a number of other EDs that could be considered as belonging to each of the clusters. In fact it seems that the all or nothing nature of the conventional census classification is hiding considerable degrees of uncertainty. A surprisingly large numbers of EDs can in fact be assigned to different clusters. This may well reflect the heterogeneity of the census ED as a geographical entity, Openshaw (1984). However, not all this fuzziness is harmful to the classification as it can be used to improve the local fit of a classification by using geographic neighbourliness as a kind of smoothing operator. In fact, the first column in Table 4 shows the distribution of nearest neighbour geographic distances for ED's in the selected cluster. The distribution varies according to the nature of the cluster. Some are very closely related; for example council multi-storey housing and others much less so; for example poor semi-detached.

6. Disseminating the Results: GB Profiles `91

Finally, one of the objectives of the present research is to constructed a geodemographic profiling system that researchers can easily use. Using Microsoft Visual Basic an easy-to-used windows based system called GB Profiles `91 has been developed. This allows the classification of the underlying ED of every unit postcode in Great Britain to be accessed. Its primary use is to allow the academic community easy access to the results of the neuroclassification research.

It has two modes of use, an interactive single postcode search(or Single Search Mode - SSM) which instantly provides the cluster information on the screen and a multiple postcode search (Multiple Search Mode - MSM) which allows the user to batch process postcodes stored in a file. Some of the Windows associated with the MS mode are shown in Figure 5. The Search Setup Window allows the user to select a particular classification and determine which mode of operation to use, SS or MS mode. If MS mode is selected and a file loaded then this is stored in the list box of the Search Window where postcodes can either be added or removed. When these are processed a record is kept of postcode which have failed to be found and those which are duplicates. These statistics are provided in the Search Statistics Window. The results of the search are stored in a set of arrays which can be viewed on screen or saved to a file. Further information on the frequency distribution of the clusters found and a more detailed description of the clusters is also provided.

The underlying data structure is modular and this will allow different classifications to be loaded and then selected from the interface. Modules that provide photgraphic images and summary statistics are also developed.

7. Conclusions

The paper has argued that the use of a neuroclassifier provides a much more flexible and potentially superior means of generating census classifications. However, the substantially improved results are unlikely until it is possible to improve all aspects of the classification process so that the classification better represents both the complex nature of spatial data and incorporates meta knowledge that exists about the nature of residential areas in Britain. A start has been made but the really definitive results have yet to be produced.

References

Blake, M. & Openshaw, S., 1994, `Selecting census variables for use in classification research', Working Paper, School of Geography, Leeds University.

Charlton, M., Openshaw, S., & Wymer, C., 1985, `Some new classifications of census enumeration districts in Britain: a poor man's ACORN', Journal of Economic and Social Measuremnt, 13, 69-98.

Evans , N., and Webber, R., 1994, `Advances in geodemographic classification techniques for target marketing', Journal of Targeting, Measurement and Analysis for Marketing, 2, 313-321.

Kohenon, T., 1984, Self-organization and associative memory , Springeer-Verlag, Berlin.

Openshaw, S., 1983, `Multivariate analysis of census data: the classification of areas', in D. Rhind (ED) An Census User's Handbook, Methuen, London, 243-264.

Openshaw, S., 1984, `Ecological Fallacies and the analysis of areal census data', Environment and Planning A, 16, 17 - 31.

Openshaw, S., 1989a, `Learning to live with spatial databases', in M. Goodchild & S. Gopal (ED's) The Accuracy of Spatial Databases, Talyor & Francis, London, 264 - 276.

Openshaw, S., 1989b, `Making geodemographics more sophisticated', Journal of the Market Research Society, 31, 111 - 131.

Openshaw, S., 1994a, `Developing smart and intelligent target marketing systems: part I', Journal of Targeting, Measurement and Analysis for Marketing, 2, 289-301.

Openshaw, S., 1994b, `Developing smart and intelligent target marketing system', Working Paper 94/3. School of Geography, Leeds University.

Openshaw, S., 1994c, `Neuroclassification of spatial data', in D.C. Hewitson and R.G. Craneleds, Neural Nets: Applications in Geography, Kluwer, Boston, 53-70.

Openshaw, S. and Gillard, A. A., 1978, `On the stability of a spatial classification of census enumeration district data', in P.W.S. Batey (ED) Theory and Methods in Urban and Regional Analysis, Pion, London, 101-119.

Openshaw, S., & Wymer, C., 1994, `Classification and regionalisation', in S. Openshaw (ed), Census User's Handbook, Longmans, London.

Table 1: Problems with a conventional classification procedure

1.    Use of a correlation matrix which acts as a linear       
      filter.                                                  
2.    Use of principal component scores which use Z score      
      transformation of the data, emphasizing non-normal       
      distributions and affected by spatial dependency.        
3.    All or nothing nature of the classification assignment.  
4.    Single move heuristic which might become stuck in        
      sub-optimum locations.                                   
5.    Global function that is being optimised but with no      
      basis for knowing whether the results are better than    
      random.                                                  
6.    Imposes arbitrary structure (viz. minimum variance) on   
      the data.                                                
7.    No way of handling data outliers and variations in       
      data precision.                                          
8.    No means of including prior knowledge into the           
      classification process

Table 2: Some of the benefits of a neurocomputing spatial classifier

1.    Use of raw data removes the need for an                  
      orthonormalising linear filter.                          
2.    The self-organising nature of a Kohonen map allows       
      structure to emerge rather than be imposed from the      
      top.                                                     
3.    Incorporation of data uncertainty into the               
      classification.                                          
4.    Simplicity and greatly reduced number of source code     
      lines.                                                   
5.    Possible to incorporate prior knowledge into the         
      classification process making it more intelligent.       
6.    Fuzziness of the results are preserved in a              
      particularly easy to use form.                           
7.    Reduction in importance of knowing precisely how many    
      clusters are needed.                                     
8.    Cluster interpretation is easier because the             
      classification takes place in the data space rather      
      than in some transform space.                            
9.    Non-linear technology.                                   
10.   Less likely to be trapped in a local sub-optimum.

Table 3: Some of the problems of a neurocomputing spatial classifier

1.    You need to prove that the potentially superior          
      technology yields improved results by comparison with    
      conventional benchmarks.                                 
2.    Extensive computer run times are needed requiring the    
      use of parallel supercomputing to adequately train on    
      large data sets.                                         
3.    A number of design aspects are entirely subjective, in   
      particular; the number of training iterations, the       
      architecture of the net, the updating process, and the   
      choice of metric for the classification.                 
4.    The current absence of an intelligent framework for      
      using the results.                                       
5.    Lack of experience with the technology.

Table4: Neural net classification analysis of fuzziness

Cluster No 1: Multi-ethnic council tenants

Members = 2869

             Cluster Similarity         
             Distances                  
Geog. Dist.  0.00    0.25    0.50    0.75    1.00    1.50    2.00    3.00    
0.000        1       0       0       0       0       0       0       0       
100.000      84      106     63      33      18      24      7       1       
200.000      393     448     320     221     129     138     48      9       
300.000      526     669     617     458     330     438     143     42      
400.000      422     577     758     578     417     587     274     91      
500.000      232     484     698     607     542     817     377     144     
750.000      366     775     1461    1533    1405    2344    1211    541     
1000.000     141     604     1095    1330    1403    2476    1525    633     
2000.000     309     1313    2971    3908    4352    8432    5555    2601    
3000.000     127     684     1841    2499    2568    4895    3486    2039    
.gt.3km      269     3396    12412   12605   9490    12150   6648    4081

Cluster No 5: Poor semi detached housing

Members = 2920

             Cluster Similarity         
             Distances                  
Geog. Dist.  0.00    0.25    0.50    0.75    1.00    1.50    2.00    3.00    
0.000        0       1       0       2       0       0       0       0       
100.000      27      45      42      29      13      15      4       1       
200.000      104     187     181     127      70     78      18      3       
300.000      192     228     293     267     172     181     67      9       
400.000      195     285     399     339     295     361     115     18      
500.000      155     228     364     374     362     463     153     52      
750.000      339     516     881     1011    1003    1433    629     217     
1000.000     155     365     656     936     1015    1731    927     357     
2000.000     505     1107    2003    2738    3387    7123    4640    2337    
3000.000     351     1011    1680    2116    2586    5793    4313    2650    
.gt.3km      897     3808    7585    8437    8883    20244   16240   10546

Cluster No 6: Council multi-storey housing

Members = 3958

             Cluster Similarity         
             Distances                  
Geog. Dist.  0.00    0.25    0.50    0.75    1.00    1.50    2.00    3.00    
0.000        7       3       2       0       0       0       0       0       
100.000      1374    1269    621     253     150     82      27      16      
200.000      994     1692    1486    765     483     449     161     106     
300.000      445     1172    1390    916     635     746     343     255     
400.000      248     684     1054    858     621     794     467     334     
500.000      151     511     821     685     597     794     484     433     
750.000      209     786     1418    1364    1260    1825    1336    1381    
1000.000     121     375     751     896     877     1516    1226    1516    
2000.000     190     556     1087    1546    1717    3700    3395    5074    
3000.000     83      196     467     700     845     1708    1755    3115    
.gt.3km      136     525     1340    2265    2997    8032    9813    20093

Cluster No 19: Well off metro singles

Members = 3849

             Cluster Similarity         
             Distances                  
Geog. Dist.  0.00    0.25    0.50    0.75    1.00    1.50    2.00    3.00    
0.000        4       1       1       0       1       0       0       0       
100.000      1641    883     477     177     146     85      16      3       
200.000      1185    1156    1068    637     429     328     129     51      
300.000      309     701     971     800     565     579     270     163     
400.000      158     505     689     781     544     646     331     245     
500.000      91      313     577     683     546     675     343     307     
750.000      136     449     1128    1316    1151    1510    960     977     
1000.000     58      295     689     884     906     1341    937     1055    
2000.000     116     395     1113    1738    1918    3252    2921    3711    
3000.000     47      161     393     629     812     1769    1690    2823    
.gt.3km      104     416     1415    2590    3532    9139    11076   23331

Figure 1: A Plot of the average percentage within cluster sum of squares

Figure 3: The Distribution of the Number of Different Clusters "Near" to each ED

Figure 2: The Distribution of Major Classes identified using the Neuroclassification procedure

Figure 4: Distribution of "Uncertain" ED's within Sheffield

Figure 5: Layout of GB Profiles `91 in Multiple Search Mode

Appendix 1: The Variables used in both the Classification Procedures

Demographic Variables

Ref#  Description                                    10%   
1     resident persons in the 0-4 age grp.                 
2     resident persons in the 5-14 age grp.                
3     resident persons in the 15-24 age grp.               
4     resident persons in the 25-44 age grp.               
5     resident persons in the 45-64 age grp.               
6     resident persons in the 65-74 age grp.               
7     resident persons in the 75-84 age grp.               
8     resident persons in the 85+ age grp.                 
9     resident persons who are single                      
10    hhlds( with residents) with children, that           
      have two or more adults                              
11    female residents who are between 16 & 45             
12    resident persons that are married                    
13    residents who are single parents                     
14    resident persons who are of pensionable age          
15    persons aged 16+ who are students

Ethnic Variables

16    residents who are white                              
17    residents who are black                              
18    residents who are Indian                             
19    residents who are Pakistani                          
20    residents who are Bangladeshi                        
21    residents who are Chinese & others

Migration Variables

22    residents that moved last year                       
23    residents that are pensioner migrants

Housing Variables

24    all permanent hhlds that are owned outright          
25    all permanent hhlds that are mortgaged               
26    all permanent hhlds that are HA rented               
27    all permanent hhlds that are LA rented               
28    all permanent hhlds that are unfurnished             
      rented                                               
29    all permanent hhlds that are furnished rented        
30    all hhld spaces that are detached                    
31    all hhld spaces that are semi-detached               
32    all hhld spaces that are terraced                    
33    all hhld spaces that are purpose built flats         
34    all hhld spaces that are converted flats             
35    all hhld spaces that are bedsits                     
36    all permanent hhlds with no central heating          
37    all permanent hhlds with no/shared                   
      bath/shower/WC                                       
38    hhlds with residents which are overcrowded           
39    hhlds with residents which are very                  
      overcrowded                                          
40    hhlds with residents which have more than 6          
      rooms                                                
41    Number of rooms per hhld                             
42    Rooms per person                                     
43    Average hhld size (rooms per hhld)                   
44    hhlds with residents with 2 or more cars             
45    Average number of cars per hhld

Household Composition Variables

46    hhlds with residents with 2 or more e.a.             
      persons and no children                              
47    hhlds with residents with a single e.a.              
      person and no children                               
48    hhlds with residents with a married couple           
49    hhlds with residents with children                   
50    hhlds with residents with children and no car        
51    hhlds with residents with a single pensioner         
52    hhlds with residents with a single                   
      non-pensioner                                        
53    hhlds with residents with more than three            
      adults                                               
54    residents aged 16+ in hhlds who are aged             
      16-24 and are without children                       
55    residents aged 16+ in hhlds who are aged             
      16-24 and have children                              
56    residents aged 16+ in hhlds who are aged             
      25-34 and are without children                       
57    residents aged 16+ in hhlds who are aged             
      25-34 and have children                              
58    residents aged 16+ in hhlds who are aged             
      35-54 and are without children                       
59    residents aged 16+ in hhlds who are aged             
      35-54 and have children                              
60    residents aged 16+ in hhlds who are aged             
      55-74 or more

Socio-economic Variables

61    residents aged 16+ and over (employed &        yes   
      self-employed)  that are in SEG 1,2,3 & 4            
62    residents aged 16+ and over (employed &        yes   
      self-employed)  that are in SEG 5 & 6                
63    residents aged 16+ and over (employed &        yes   
      self-employed)  that are in SEG 8, 9 & 12            
64    residents aged 16+ and over (employed &        yes   
      self-employed)  that are in SEG 7 & 8                
65    residents aged 16+ and over (employed &        yes   
      self-employed)  that are in SEG 11                   
66    residents aged 16+ and over (employed &        yes   
      self-employed)  that are in SEG 16 & 17              
67    residents aged 16+ and over (employed &        yes   
      self-employed)  that are in manufacturing &          
      mining                                               
68    residents aged 16+ and over (employed &        yes   
      self-employed)  that are in agriculture              
69    residents aged 16 and over who are self-             
      employed                                             
70    residents aged 16 and over who are unemployed        
71    residents aged 16 and over who are                   
      permanently sick                                     
72    residents aged 16 and over who are working           
      (employers or employees) women                       
73    residents aged 16+ and over (employed &              
      self-employed)  that are women working in            
      manufacturing (metal etc. not other manuf.)          
74    residents aged 16+ and over (employed &              
      self-employed)  that are women working more          
      than 41 hours per week                               
75    residents aged 16 and over who work part-time        
76    male workers                                         
77    residents aged 16+ in hhlds who are female,          
      married and working                                  
78    Proportion or residents aged 18 and over       yes   
      with a (higher) degree                               
79    hhlds with residents with 2 or more adults           
      in employment

Health Variables

80    residents (S02) with LLI                             
81    residents (S02) economically inactive with           
      LLI

Travel-to-work Variables

82    residents aged 16+ and over who work at home   yes   
83    residents aged 16+ and over who go to work     yes   
      by car                                               
84    residents aged 16+ and over who go to work     yes   
      by train/bus                                         
85    residents aged 16+ and over who walk to work   yes

Appendix 2: Labeling System for the 64 clusters identified using the Neuroclassification Procedure

Group       Sub-group            Name                        Cluster  
                                                             #        
Struggling  Council Tenants      Multi-ethnic council        1        
            with multiple        tenants                              
            social problems                                           
                                 LA rented Semis             24       
                                 Overcrowded Council         33       
                                 Housing                              
                                 Council tenants in Tower    6 & 7    
                                 Blocks                               
                                 Single Parents Council      29 & 34  
                                 tenants                              
                                 Single Parents in Tower     28 & 30  
                                 Blocks                               
                                 Unskilled Council tenants   45       
            Multi-ethnic, low    Bangladeshi Areas           4        
            income areas                                              
                                 Indian Areas                38       
                                 Multi-ethnic Bedsit Areas   8 & 27   
                                                             & 32     
                                 Poor multi-ethnic singles   62       
            Less Well-off        Terraces                    2 & 36   
            Terraces                                                  
                                 LA rented terraces          10 & 35  
            Fading Industrial    Industrial terraces         43 & 61  
            Areas                                                     
                                 Industrial Council tenants  51       
            Less Well-off        Pensioners Council tenants  17& 25   
            Pensioners                                       & 31     
                                 Pensioners in converted     18 & 26  
                                 flats                                
                                 Pensioners in HA rented     57       
                                 terraces                             
Aspiring    Young Singles in     Poor young singles &        3, 55    
            Flats                Students                    & 60     
                                 Singles in PBFs             53       
                                 Better-off singles          14 & 54  
            Better-off Council   Council Semis               13       
            Tenants                                                   
            Rural Communities    Rural areas                 44 & 52  
            Armed Services       Young Armed Services        12       
                                 Families                             
Establishe  Semi-detached        Semis                       56       
d           Suburbia                                                  
                                 Mortgaged Semis             63       
                                 Owner occupied Semis        5        
            Better-off           Pensioner Migrants          15,      
            Pensioners                                       16, 23   
                                                             & 59     
            Comfortable Middle   Middle Class Suburbia       37       
            Agers                                                     
                                 Wholly owned Semis          21       
                                 The average                 20       
Climbing    Metro Singles        Well-off singles in         14       
                                 Bedsits                              
                                 Well-off singles in PBFs    19       
                                 Well-off singles in         50       
                                 converted flats                      
            Academic centers     Students in Bedsits         41       
Prospering  Wealthy Achievers    Middle aged Managers        46 & 58  
                                 Well-off Middle Aged        9 & 47   
                                 Managers                             
                                 Self-employed Managers      48       
                                 Educated Professionals      22       
            Wealthy Rural        Rich Agriculturalists       11,      
            Communities                                      49, 39   
                                                             & 64     
Unclassifi                                                   40       
ed                                                                    
                                                             42