Selecting Variables for Small Area Classifications of 1991 UK Census Data

Marcus Blake & Stan Openshaw.

School of Geography, Leeds University, Leeds. LS2 9JT

INTRODUCTION

Multivariate classifications of small area census data is a well established means of providing a simple to use and understand and therefore useful descriptive summary of the characteristics of residential areas (Openshaw, 1983; Openshaw & Wymer, 1994). In the commercial world this has become the basis for a geodemographic targeting industry with a variety of commercial systems being based on the last three censuses. Products, such as ACORN, MOSAIC, and PiNs have become industry standards (Brown, 1991). In the academic world the 1981 Super Profiles system has also attracted a considerable amount of interest and use (when it was made available in a ready to use form).

A new 1991 census based research classification has been developed under the aegis of an ESRC research project. The new system is called GB-Profiles `91 and has been developed solely for academic purposes. There is no relationship whatsoever with the Super Profiles system of CDMS. Profiles 91 uses the best available computer methods run on the best available computer hardware. It classifies the 150,000 (130,000) smallest areas in Britain for which 1991 census data are available (EDs in England & Wales, OAs in Scotland) into a relatively small number of distinctive residential area types based on an assessment of their multivariate census data profile. It is expected to be of considerable research value; particularly, when linked to unit postcodes. In building such a system there are a number of key design issues, particularly the choice of,

1. variables,

2. classification method,

3. result resolution,

4. and incorporation of prior knowledge.

The choice of variables to use is important because the results are to a large extent determined by it. Sadly in many previous studies there is no clear explanation or audit trail as to how the variables were chosen. An exception is the 1981 Super Profiles system; see Charlton et al (1985). The choice of variables has to reflect the purpose of the exercise and is both laboriously boring and tedious as well as hard because of the careful need to construct sensible indicator variables from a set of almost 10,000 available census counts for each small area.

Here the small areas used are the Census Enumeration Districts (EDs), and the source of data is the Small Area Statistics or SAS. The SAS is "a predefined set of cross tabulations of two or more census variables which are made available by the Census Offices in a machine readable format for a wide range of different areal units throughout the whole of Great Britain" (Cole, 1993, p. 201)

This statement makes two important points. First, that the census covers the whole of Great Britain. Every person must, by law, complete the census questionnaire, therefore the coverage is high. The Census Validation Survey (CVS) checks have indicated that approximately 299,000 (0.6% of the enumerated population) people were missed by the 1991 census, but this itself is estimated to be only a third of the actually under-enumeration (Wriggins, 1993). But even with an estimated 1 million people missing the census still represents a broad and accurate data set.

The second point is that it is available for a wide variety of different spatial units, including the enumeration districts (EDs and OAs) at which the census was taken. The SAS is the only Census data set that allows data to be output at this low level of resolution as confidentiality is maintained by two processes. First, by restricting output to those EDs which have more than 50 persons or 25 households, and second by "blurring" the results (adding +1, 0 or -1 to the census counts in a quasi-random manner). Although these confidentiality enhancing measure do allow data to be released for small areas, they cause severe methodological problems that have to be addressed when classifying census data.

Objectives & Purpose

It is most important to be clear about the purpose of the exercise. The choice of variables and their specification has to reflect the explicit purpose. It would seem that many previous classifications have, at best, been "purpose vague". It is obvious, but important nonetheless to be quite frank. Different variables will almost certainly produce different results, and whatever purpose is reflected, this will probably map onto the available set of 10,000 census variables in many, many, different ways.

The research objective of the census classification project sponsored by the ESRC is to provide a census data representative small area classification of Britain's residential areas. The key phrase here is "census data representative". The aim is not solely to define areas of urban or rural deprivation or area of affluence; or areas dominated by this or that socio-economic mix that happens to be of narrow or specialist attention and importance. There are, of course, valid objectives for special purpose classifications of Britain, but the objective here is to seek a more general (broad based) census data descriptive representation. It is thought that this will appeal to the majority of the potential prospective social science users and is in the best traditions of the original geodemographic systems. It is also likely that this goal will provide maximum added-value to the 1991 database.

In practice, this means that the broadly representative set of census variable based indicators needs to be created. It follows, therefore, that in principle the importance of each general type of variable should broadly reflect the nature and content of the original census questionnaire rather than the 100 or so tables of cross-tabulated counts; the latter are a reflection of the perceived user needs for the census outputs. This desire for coverage and representativeness of the census questionnaire is, in practice, modified by an assessment of the sorts of variables previously and typically used, and by the thought that data redundancy may be induced or removed later via statistical means. A consideration of indicators used by previous classifications is useful also, because it represents a, not insignificant, transfer of intellectual and conceptual thinking that may well be worth re-cycling. Indeed the lineage of some census indicator variables can be traced back to the 1966 census!

The strategy here then, is to review the variables used by others and then supplement and modify this list by reference to the principles of coverage and representativeness that are considered so important.

A REVIEW OF THE VARIABLES USED IN PREVIOUS AND CURRENT CLASSIFICATIONS

The Current Commercial Classifications

With the recent acquisition of PinPoint by CACI there are now only five major companies in Britain that provide geodemographic systems and the services that are associated with them (see Table 1). The release of the statistics for the 1991 Census has stimulated the redevelopment of the 1981 based systems. All five have produced new systems. PinPoint had been the first to launch a geodemographic system based on 1991 Census data, but it is understood that CACI will not be marketing this, only providing it to existing users (Sleight, 1993). This commercial situation is presently in a state of flux and therefore information on the variables that compose these systems is commercially sensitive.

Table 1: Major players in the British Geodemographic Industry

Company                    Associated Products        
CACI (+ PinPoint)          ACORN                      
CDMS                       Super profiles             
CCN                        MOSAIC                     
Infolink                   DEFINE                     
Equifax Europe             IMAGES                     
Euro Direct                Neighbours & PROSPECTS     

Table 2 provides a list of the salient features of both the 1981 and 1991 systems. They can be differentiated by the number and type of data sources they use, the spatial unit they are based on, whether Principle Components Analysis is used to reduce redundancy within the variables, and the number of groups used to classify the data. It is interesting that a number of the systems claim to be based on unit postcodes, whereas the 1991 census only reported population and household counts at the ED level. This mixture of census ED and unit postcode data (units which are typically one tenth the size of EDs) causes all manner of problems and it is by no means clear whether the resulting classifications are actually any improvement over a purely ED based one, although they may often be perceived to be better by the end users.

Table 2: Description of the Major Geodemographic Systems

        Classific  Supplie  Sources of    # of    Area     PCA  # of     
        ation      r        Variables     variab  Unit     Use  grps     
                                          les              d             
                                                                         
1981    ACORN      CACI     Census81      41      ED       no   11, 38   
SYSTEM                      (only)                                       
S                                                                        
                                                                         
        DEFINE     Infolin  Census81      67      Postcod  yes  10;      
                   k        credit data           e             47; 423  
                            electoral                                    
                            roll PAF                                     
                            data                                         
                                                                         
        MOSAIC     CCN      Census81      38 4    Postcod  no   12;      
                            credit        11      e             38; 57   
                            activity      link                           
                            data          only 1                         
                            electoral                                    
                            roll PAF                                     
                            data CCJ                                     
                            data                                         
                                                                         
        PiN        CACI     Census81      104     ED       yes  12;      
                            (only)                              25; 60   
                                                                         
        Superprof  CDMS     Census81      55 +    ED       yes  11       
        iles                              10                    Lifesty  
                                          (afflu                le       
                                          ence)                 grps;    
                                                                37       
                                                                Groups;  
                                                                150      
                                                                Clustse  
                                                                rs       
                            TGI data      25?                            
                                          (small                         
                                          scale)                         
                                                                         
1991    ACORN91    CACI     Census91      79      ED       yes  6        
SYSTEM                      (only)                              Categor  
S                                                               ies;     
                                                                17       
                                                                groups;  
                                                                54       
                                                                Types    
                                                                         
        DEFINE91   Infolin  Census91                       yes           
                   k        credit                                       
                            activity                                     
                            data                                         
                            electoral                                    
                            roll                                         
                            unemployment                                 
                            stats                                        
                                                                         
        IMAGES     Equifax  Census91              Postcod                
                   Europe                         e                      
                   (UK)                                                  
                   Ltd                                                   
                            NDL data                                     
                                                                         
        MOSAIC91   CCN      Census91      87      Postcod  No   11; 62   
                            credit                e                      
                            activity                                     
                            data                                         
                            electoral                                    
                            roll PAF                                     
                            data CCJ                                     
                            data retail                                  
                            access data                                  
                                                                         
        Neighbour  Euro     Census81 +91          ED?      ?             
        s &        Direct                                                
        PROSPECTS  Databas                                               
                   e                                                     
                   Marketi                                               
                   ng Ltd                                                
                                                                         
        PiN91      CACI     Census91      49      ED       No   6, 17,   
                            (only)                              42       

Source: P. Sleight, 1993, Target Market Consultancy

As more and more data is collected in computer readable form so the number of different data sets available to marketing companies increases, Table 2 lists the varied sources used by each of the systems, they range from systems that restrict themselves to solely using census variables, ACORN, to systems were the census forms only a minority of the total data used, MOSAIC. The most commonly used non census data sets used are the electoral roll, credit activity data, county court judgments and data associated with retail activity obtained either from surveys or lifestyle databases.

None of these extra-census data sets are currently available to academics so the use of similar variables is not an option considered here. However, it might be noted that mixing different data sources with different levels of spatial resolution and sampling characteristics is not necessarily or automatically an advantage, whilst it is a further source of methodological problems.

Also, it is recognized by most observers (Sleight, 1993; Openshaw, 1993) that there is now a general trend within the industry for systems to become more specific. Companies now produce a range of products to cover the wide range of situations where these systems can be applied. Such focusing it not relevant here because we seek to develop a general purpose census classification as a descriptive summary (or surrogate) of the multivariate complexity of the 1991 census.

Census Variables Used

In an attempt to incorporate previously hard earned knowledge a detailed review of the census variables incorporated in present geodemographics systems was used to provide an initial idea of which variables to select.

Each of the Census agencies listed in Table 1 was contacted and asked for information on the system concerned, specifically each was asked for a list of the census variables used in their system. Because the present competitive situation has increased the commercial value of this information only one company (Infolink) was prepared to provide this information. Unfortunately, the Infolink variables appear to be based on the total counts from each of the 88 tables in the SAS which restricts their utility. A list of the census variables used in the CCN system MOSAIC was acquired (see Appendix A) and this formed the initial basis of the selected variables.

Useful comparisons were made with the census variables used by the 1981 Super Profiles and ACORN systems (see Appendices B & C) The other agencies would only provide general information and the relevant brochures. A list of the variables

Further, some of these brochures provide enough detail to derive the general census topics involved in each of the geodemographic systems. For example, the CACI ACORN brochure provides short descriptions of each of the 54 ACORN types. Type 10.32 is described as "Home Owning areas with Skilled Workers". To produce such a grouping CACI would have had to have used Census data on Tenure (more specifically Table S20, households owned outright) and data on social class. Indeed, this illustrates another useful principle. The variables defined here as of interest should be sufficient to identify any of the labeled area types described in the various other commercial systems.

There are also situations where the choice of variable is hard to discern. For example, it is unclear how CACI derive how home owning areas are established (Type 9.28) or what data they use to measure affluence (several adjectives are used in the descriptions - Wealthy, Affluent, Well-off and Prosperous). In this situation no attempt was made to guess, the variables derived from these descriptions were only selected when it was obvious from the pen picture that they were included in the classification.

From these three sources, 1991 & 1981 lists and the analysis of pen descriptions of some of the better documented systems, a in-depth knowledge of the various selections of variable selected from the census was acquired. This knowledge was used in selecting the specific variables from the large number available.

DEFINING CENSUS VARIABLE INDICATORS

As stated above, the guiding principles for selecting these variables is that they should be broadly representative of the nature and content of the original census questionnaire so that the classification would be "census data representative". This foundation would then be modified by the assessment of the variables commonly used by commercial systems (described in section 3 above).

The statistics provided by the SAS are those that the OPCS perceived to be needed after an extensive consultation process. For example There is a strong emphasis on dependency within the 1991 census (lone parents, number of dependent children etc.) because of the recent moves by the government to review the structure of the Welfare State.

The content of the 1991 census forms remains fairly similar to that of 1981 Table 3 provides a list of the questions asked by the 1991 census questionnaire. There were five major changes from the 1981 census. These were additional questions on ethnic origin, long-term illness, the existence of central heating, the term-time address of students, and on the number of hours worked in the previous week.

Table 3: 1991 Census Questions

Questions on Households      type of accommodation                      
                             extent of sharing                          
                             tenure                                     
                             number of rooms                            
                             availability of bath & WC                  
                             central heating                            
                             number of cars & vans                      
                             lowest floor level of accommodation        
                             (Scotland only)                            
                                                                        
Questions on the individual  sex                                        
                             date of birth                              
                             marital status                             
                             relationship in household                  
                             ethnic group                               
                             whereabouts on Census night                
                             usual address                              
                             term-time address (for students)           
                             usual address one year ago                 
                             country of birth                           
                             long-term illness                          
                             whether working in week before Census      
                             hours worked weekly                        
                             occupation                                 
                             industry                                   
                             address of place of work                   
                             means of travel to work                    
                             higher qualifications                      
                             Scottish Gaelic (Scotland only)            
                             Welsh language (Wales only)                

These topics have been regrouped under the eight headings listed below.

Table 4: Variable Selection Headings

       Selection Groups         Census Topics                          
          Demographic           age sex marital status                 
            Ethnic              birth place nationality ethnic group   
           Housing              usual residence housing (number)       
                                rooms (number) tenure household        
                                amenities availability of cars & vans  
    Household Composition       a combination of most of the other     
                                individual and household census        
                                topic aggregated to the level of the   
                                household e.g. couple households       
                                with dependent child(ren) and no car.  
        Socio-economic          economic position occupation place     
                                of work industry qualifications        
          Migration             migration                              
            Health              limiting long term illness             
        Travel-to-work          journey to work                        

These eight headings form the framework that structures the selection of potential variables, ensuring that all the important areas of the census are included in the final classification.

In considering suitable variables for a new general residential classification it is important not only to know which variables others have used, but also what the SAS counts represent. The OPCS provide a detailed explanation of the definitions and classifications used to aggregate the census returns into counts and tables (OPCS, 1992).

Equally important with all Census statistics is to remember what population the counts are being counted from. This is especially true of the SAS because the population being counted both between and within each table can vary (Residents in Households, Students and schoolchildren aged 5 and over, or Persons aged 60 and over with limiting long-term illness). The majority of the tables are either based on the usual resident population or the number of households. the full list of denominators used to create the rates are listed in Appendix C

This section describes and explains the selected variables. They are grouped under each of the eight topic heading listed above. A list of the associated SAS reference codes is provided in Appendix C.

Demographic Variables

Identifying variations in the age and sex structure and marital status is the initial step in distinguishing between different types of populations. All the commercial systems include variables in this category, although they vary according to the size of the age groups.

The SAS (Table S02) breaks down the usual resident population into 5 year age groups and provides counts for the total, single/widowed/divorced, and married men and women. So which variables would best represent this mass of data? In general commercial systems aggregate the age groups into five or six unequal groups. Here, seven different generations are identified ranging from infants to the aged.

The population base for these variables is the usually resident population: 1991 base (which is referred to as Residents). This is the most common base used within the SAS and is a count of...

"...all the persons recorded as resident in households in an area, even if they were present elsewhere on census night, plus residents in communal establishments who were present in the establishment on Census night." (OPCS, 1992, p. 7)

Age Structure

Age is an important factor in the life-cycle related behavior of individuals and here different age groups are identified. Here eight age groups were chosen which help to differentiate between several age types e.g. infants, recently retired, & the aged.
1    0 - 4      infants              
2    5 - 14     children             
3    15 - 24    young adults         
4    25 - 44    adults               
5    45 - 64    middle aged          
6    65 - 74    recently retired     
7    75 - 84    elderly              
8    85+        the aged             

Sex & Marital Status

The population as a whole can be split into two groups depending on whether individuals are married or not. Marriage changes the financial and general social situations of individuals and so it is important to differentiate between these two groups. Of course, an increasing number of people also live together as cohabiting couples, these are discussed below.

The increase in the numbers of students, working women and lone parents makes these groups particular important to differentiate within a 1990's classification.

9    Married population                                          
10   Single population S02                                       

From the analysis of the commercial systems four other social groups are generally identified as being important, pensioners, working women, lone parents and students. These groups have grown significantly in the last decade and it will become increasingly necessary to differentiated them from the general population.

Pensioners

Along with marriage, another major change in lifestyle is brought about at retirement. The tendency for pensioners to concentrate in certain regions and the fact that most are economically inactive makes this an important group to distinguish.

Pensioners also form an increasingly significant proportion of the population. The OPCS define pensionable age as the minimum age at which a person may receive a national insurance retirement pension i.e. 60 for women and 65 for men.

11    Resident persons who are of                                           
      pensionable age                                                       

Working Women

The role of women within society has change rapidly over the last two decades, particularly the place in the labour market. This change has been represented in the results of the census in that the addition of the number of hours worked question allows the number of part-time working women to be measured. The inclusion of the more specific variables on working women in couples and their occupations helps to differentiate between households with professionally qualified career orientated women and those who work in more part-tome service industries (i.e. cleaners)
12    Working women                                                         

Lone Parents

The problems associated with lone parents have been highlighted recently and it has been argued that as a group they have a disproportionate negative influence on society. Whatever the arguments, because of their social needs, they are an important group to differentiate. The variables included in this section aim to differentiate those lone parents that are particularly disadvantaged; those with poorly paid manual or no jobs and those who lack amenities (represented here by a lack of central heating).
13    Total 'Lone' Parents                                                  

Students

Students are forming an increasingly large proportion of British society and because of financial constraints they tend to group together in either large university owned accommodation blocks or in low cost rented flats and bedsits. This high concentration allows these areas to be distinguished by including the total number of resident students enumerated at their term-time addresses.
14    Students 16+                                                  

Error

An imputation procedure is used for the first time in this census to estimate the number and characteristics of residents in enumerated wholly absent households. These additional residents are added to this total. This population is therefore an estimate of the numbers of people actually living within an area. A count of how many individuals were imputed in each ED was included to enable clusters with large number of ED's with high proportions of the population estimated by imputation to be identified

15   Total Imputed Residents                                   
     (S19)                                                     

Ethnicity

The 1991 Census is the first to include a question on ethnicity - geodemographic system based on the 1981 Census were restricted to using Country of Birth. The Ethnicity question allows different ethnic groups to be compared in terms of unemployment, amenities (see below). The full classification has 35 classes. For the SAS the OPCS use a six group classification which seems sensible to use here.
16    White                                                                 
17    Black                                                                 
18    Indian                                                                
19    Pakistani                                                             
20    Bangladeshi                                                           
21    Chinese + Other                                                       

Ethnicity forms and important variable in all the commercial systems; there are many clusters characterized by having a large multi-ethnic component. It is important to be able to distinguish between these clusters by also including associated variables.

The SAS provides a large number of different variables associated with Ethnic Groups. One of the main topics is the tenure associated with these different groups. Although many multi-ethnic areas do tend to be associated with a poorer areas there is a danger of labeling all these areas as less well-off when a significant proportion are not. These 18 variables provide the detail required to differentiate between these areas. Tenure is used to distinguish between financial stable and financially stressed multi-ethnic regions.

             Owner      Council    
    Black    22         23         
(grps)                             
Indian,      24         25         
Pakistani,                         
Ban'deshi                          
Chinese &    26         27         
others                             

Migration

An established Census question is whether the address of each person in the household was the same as one year ago. This data allows the OPCS to provide information on the movement of residents. A migrant is therefore defined by the OPCS as 'a person with a different usual address one year ago to that at the time of the Census' (OPCS & GRO(S), 1992b). This is a simple approach and does not distinguish between those that have moved to different countries to those that have moved next door.

A migrant household is defined as a household whose head is a migrant ( the head of household is the first usually resident adult mentioned on the census form). A wholly moving household is a household whose resident members aged one year and over were migrants with the same postcode of usual residence one year before the census.

Only the net result of any moves is recorded so if a person returned to an address after moving within the year he or she would not be recorded as a migrant. Similarly any moves within the year are not recorded.

Different types of move can be distinguished depending on which boundaries are crossed (ward, district, county, standard regions and country). It should be borne in mind that the census includes internal migrants and immigrants to Great Britain, but not of course emigrants from Great Britain who are not enumerated.

Here two migration variables were used; the total number of resident migrants and the number of resident migrants that were also pensioners. The former allowed areas with a large number of new residents to be distinguished, for example new housing estates. While the later identified areas which attracted older residents who have usually recently retired, for example coastal retirement areas and retirement homes.

28   Total migrants                                                  
29   Pensioner migrants                                              

Housing

In any classification of the general residential characteristics of Britain, discrimination between different forms of accommodation is important. There are several variables in the Census that provide information on the type, tenure and amenities of the British housing stock.

Tenure

Tenure describe how the household occupies its accommodation, for example by buying or by renting. The data on tenure is derived from Question H3. In this question there is a basic subdivision between buying and renting a property, with more detailed queries also being asked of these two groups.

Tenure is used by all the commercial classifications in one form or another. Superprofiles includes five variables and ACORN four. Both of these 1981 based systems distinguish between furnished and unfurnished flats. Here more emphasis is placed on home owners because of the increase in their numbers during the 1980's and these two tenures are aggregated under the privately rented category.

Generally the rented category and especially accommodation rented from Local Authorities etc. has been used by researchers as a measure of lack of resources and residential insecurity. In contrast. because of the financial commitment required to purchase a house, house ownership is seen as a surrogate for long term financial stability.

The OPCS classify tenure into 8 categories, here these are aggregated into six percentage variables.

30   Owned Outright                                                  
31   Mortgaged                                                       
32   Private Rented                                                  
33   Rented from HA, LA, NT                                          

Household Space Type

A Household Space is defined as the "accommodation available for the household" (OPCS ,1992, p. 17). The census defines 22 different types of accommodation, but none of this information is included in either the 1981 versions of Superprofiles or ACORN. The following seems important as at least providing a mental image the mix of building types within an area and therefore provides an idea of the general feel of an area.
34    detached                                                        
35    semi-detached                                                   
36    terraced                                                        
37    flats                                                           
38    bedsits                                                         

Amenities

The census records the presence or absence of several household amenities, including bath or shower, and WC. In 1991 the presence or absence of central heating was also added. Two variables were included in this list; all permanent households with no central heating and all permanent households which were either lacking or sharing a bath or shower. Both variables attempt to isolate the household which suffer from a lack of the most basic household amenities. Fortunately, in this country this is a tiny percentage of the housing stock and consequently these variables have become less important over time. The lack of central heating is a more general measure of household deprivation.

Superprofiles and ACORN include variables on households without WCs. Today this is generally agreed to be a universal amenity and is therefore dropped in favour of the central heating variable. But, as a measure of deprivation one problem with this variable is that while a large proportion of household have central heating, many households cannot afford to run it (even more so now there is VAT on fuel).

39   No central heating                                              
40   Lacking bath and shower                                         

Car Availability

The car has become a universal feature in British society despite its expense to both buy and run. Because of the high running costs car availability has been used by many researchers as a measure of short-term financial deprivation. For example, Townsend et al (1986) use the percentage of households lacking a car as a measure of lack of resources. This forms an ingredient in their measure of material deprivation - the Townsend Index. It should be remembered that cars are more important to those living in rural areas than to those in urban areas. Both variables are included in all geodemographic systems.
41    No car                                                          
42    2+ cars                                                         

Density

The number of persons per room (ppr) has also been a common variable in deprivation indices. This provides an indication of overcrowding. Here as in most systems a value of 1.5 persons per room is taken as the standard value over which a household is classified as `overcrowded'. This variable has been heavily used by deprivation indices and commercial classifications as a surrogate measure of poverty. However, it is important to note that the definition and meaning of the term `room' varies both geographically, culturally, and in terms of size.

The number of households which suffer from overcrowding is only 109,000, 0.5% of the total housing stock; therefore the problems caused by the ecological fallacy are likewise increased.

43    More than 1.5 ppr                                               

House Size

This supports the density variables and provides useful information on the amount of space for living available to each household. Large household are associated both with more prosperous and rural areas. Large houses require a high level of income to run (larger heating costs, community charge etc.) and high earning individuals tend not to be restricted by these costs. A partial exception to this trend is in rural areas where historically there is a higher percentage of large houses, but where the levels of economic activity has stagnated leading to isolated areas of rural deprivation.
44    Households with 7+ rooms                                        

Household Composition

The demographic composition of households affects their ability to generate sufficient income to meet their needs. For example a household containing two adults and two dependent children is less likely to be dependent on benefits than one containing only one adult with three dependent children. It is therefore important to include information of the numbers of adults and children within households.

Family Unit Type

The OPCS has used a predefined set of rules to allocate residents to one of 60 family unit types (see OPCS Definitions, 1992, Annex C). Within households with two generations the children are only placed in the family unit if they are single (never married) and have "no obvious partner or offspring". Married or divorced offspring are not included, but children who have had an informal relationship (i.e. a cohabiting partner) are included as this can not be deduced.
45    couple hhld, aged 16-24                                   
      without child(ren)                                        
46    couple hhld, aged 16-24                                   
      with child(ren)                                           
47    couple hhld, aged 25-34                                   
      without child(ren)                                        
48    couple hhld, aged 25-34                                   
      with child(ren)                                           
49    couple hhld, aged 35-54                                   
      without child(ren)                                        
50    couple hhld, aged 35-54                                   
      with child(ren)                                           
51    couple hhld, aged 55-75 plus                              

From this mass of data in Table S87 the following variables have been picked; again tenure is used as a surrogate for financial stability.
                                Owner    Council  
No Family Household             52       53       
1 Couple Households (no         54       55       
children)                                         
1 Couple Households (with       56       57       
children)                                         
2+ Family Households            58       59       

Dependent Households

Dependency is one of the major census topics primarily because of its influence on the State benefit System. Dependency variables therefore form a significant proportion of this list. The census contains data on several different groups of dependents
60    Households with dependents                                

Socio-economic Characteristics

Economic Position

These provide a guide both to affluence and to age. The lack of an income question in the 1991 census makes the use of proxies for income very important.
61    Economically active                                       
62    Self-employed                                             
63    Unemployment                                              

The limitations with these figures are that for certain areas of Great Britain the figures may be substantially out of date because of the extent to which unemployment has increased since April 1991

Also, a change in the level of unemployment in an area may be more related to the local economy than to the quality of the residential neighbourhood. The late 1980's recession has affected many low unemployment areas, thereby reducing the value of unemployment as a indicator of residential characteristics.

Industry

The industry in which an individual is engaged is determined by the activity of his or her employer i.e. the nature of the service or product. The classification used is the Standard Industrial Classification and individual are assigned according to the name and description of their employers business collected in question 16.
64    Agric./Forestry/Fishing                                   
65    Energy & Water                                            
66    Manufacturing                                             
67    Construction                                              
68    Distribution & Catering                                   
69    Transport                                                 
70    Banking & Finance                                         

Socio-economic Group

This groups together jobs of similar social and economic status. Allocation is determined by considering the employment status, i.e. economically active or on a government scheme, and the occupation of the head of the household. It provides most of the information provided by the occupation variables, but in a more relevant framework.
71   Professional  1, 2, 3, 4                                  
72   Intermediate & Junior                                     
     non-manual  5, 6                                          
73   Manual workers 8, 7, 9, 10,                               
     11,12                                                     
74   Farmers & agricultural                                    
     workers 13, 14, 15                                        
75   Armed Forces & Other 16, 17                               

Qualifications

The question the qualifications that individuals hold provide information, unavailable from any other source, which is used in the planning of investment in education. It is limited to degrees, professional and vocational qualifications.

It is used here as a measure of education, which is also associated with a higher earnings, better levels of health and in general a higher standard of living.

76   Workers with higher degrees                               
77   Workers with other                                        
     qualifications                                            

Health

For the first time the 1991 Census provided useful morbidity data. The aim of this question was to provide information on the general incidence of morbidity within the population. The question specifically requests that problems do to old age are included. It has been found in pre census tests that this information correlates well with the use of both GP consultations and in and out patient hospital visits. This question therefore provides a simple, but nationally comparable,

This may well be effectual in picking up broad regional differences in the general level of well-being across Britain. Its usefulness as a local residential area discriminator is as yet unexplored

Limiting Long-term Illness

Here the total number of people that acknowledged that he or she was limited in their daily activities by a long-term illness, health problem, or handicap was taken as a general measure of the level of morbidity within the population.

Those residents that were economically inactive because of long-term illness were also included. This population is dependent on social and health care services and/or families for their way of life.

78   Total persons with LLI (S12)                              

Transport-to-Work

The type of transport generally used to get to work (used on the longest part of the journey) provides an indication of the lifestyle of the individual. Cars are expensive to run and require a certain level of income. This generalization is complicated by other circumstances. The need for a car in rural areas is more acute because of the lack of public transport. The proportion of income that any individual in these circumstances may be willing to spend on a car may well be much higher than average. Rural areas therefore may well have a higher level of car ownership than would be expected give the income levels normally associated with the rural economy. Equally, where the bus or train networks are more dense i.e. in the south-east of England
79   Train Bus                                                 
80   Car                                                       
81   Work at home                                              

Contextual Variables

To provide an indication of some of the more specialized contexts in which people live the following variables were included
V82  Medical & Care Ests.                                       
V83  Detention centres & Defence                                
     Est.                                                       
V84  Education Ests.                                            
V85  Hotels & Other Ests.                                       

CONCLUSIONS

This paper has provided a description or `audit trail' of the process of variable selection for the general residential classifications that have been developed. The objective of the project sponsored by the ESRC is to provide a census data representative small area classification of Britain's residential areas.

It is most important to be clear about the purpose of the exercise. The choice of variables and their specification has to reflect the explicit purpose. It would seem that many previous classifications have, at best, been "purpose vague". It is obvious, but important nonetheless to be quite frank. Different variables will almost certainly produce different results, and whatever purpose is reflected, this will probably map onto the available set of 10,000 census variables in many, many, different ways producing many possible different classifications.

The aim of this classification project has been to be "census data representative". The aim is not solely to define areas of urban or rural deprivation or area of affluence; or areas dominated by this or that socio-economic mix that happens to be of narrow or specialist attention and importance. There are, of course, valid special purpose objectives for a classification of Britain's residential areas, but the objective here is to seek a more general (broad based) census data descriptive representation. It is thought that this will appeal to the majority of the potential prospective social science users and is in the best traditions of the original geodemographic systems. It is also likely that this goal will provide maximum added-value to the 1991 database.

The variables defined here essentially reflect the past experience of the researchers, those used by other commercial organizations, and the desire for coverage of the census topics. Table 6 summarizes what has been achieved. It is inevitable that coverage is uneven and contains possibly high levels of data redundancy. What is done to either reduce or remove or retain redundancy is a subject for a separate study. The object here was to reduce almost 10,000 potential variables to a much more manageable number for subsequent analysis and classification.