GeoComputation Logo

GeoComputation 2000 HomeConference ProgrammeAlphabetical List of Authors

Integrating rank correlation techniques with GIS for marketing analysis

Lihua Zhao
School of Geography, The University of New South Wales, Sydney, NSW 2052, Australia
E-mail: p2182287@geog.unsw.edu.au

Abstract

The application of GIS in business has grown rapidly in the recent years. Research has shown that since more than 80 per cent of all information held by organisations can be geographically referenced, business strategists are finding GIS to be an ideal tool for identifying and expanding markets, and increasing profits. Although most widely used GIS packages perform a range of functions, many of them have weaknesses when it comes to business applications. Combining GIS and other techniques will create appropriate and diverse approaches to problem solving. One advantage of linking statistical methods with GIS is to integrate GIS capabilities with the power of statistical analysis, and to effectively use data from different sources for market analysis. Another advantage of this is that users can visualise spatial data in different forms: visualise the spatial distribution of data on maps prior to further statistical analysis, and visualise spatial data in various statistical graphs and diagrams which may yield more insights into the nature of distributions. In this way, the application of GIS in market analysis may be seen as a tool for reaching a desired solution for the client and not merely as an end in itself.

In this paper, a technique is presented based on rank correlation that is capable of analysing the association between the distribution of customers and background demographic characteristics. The Spearman rank correlation coefficient has been programmed within MapInfo using the Mapbasic language. The method is directed towards business needs rather than towards technical niceties. The analysis of the association between customers purchasing motor vehicles and demographic variables from the Census for a region in Sydney, Australia, is used as a case study to demonstrate the potential of the technique.

1. Introduction

Geographical Information System (GIS) are a new and valuable tool of the 'information revolution' with the capability of combining attribute and spatial data with mapping systems and cartographic modelling tools. They permit the acquisition, storage, analysis, management and presentation of large amounts of geographic or spatial data (Goodchild, 1992; Tomlin, 1990). In the recent years, the application of GIS in business has grown rapidly. Major retailers, automobile dealerships, video rental companies, media organisations, and fast food corporations are just some of the many businesses around the world that have discovered the value of GIS. Research has shown that since more than 80 per cent of all information in an organisation can be geographically referenced, business strategists are finding GIS to be an ideal tool for identifying and expanding markets, and increasing profits.

Tavakoli (1993) has summarised the factors contributing to the increased penetration of GIS in business and marketing, key ones among which include:

  • Widely available software packages, like ArcInfo, ArcView, and MapInfo.
  • The incorporation of Graphic User Interfaces (GUI) with desktop GIS software has made GIS easier to use.
  • The increased availability of various spatial, demographic and social-economic data in digital form.
  • The dramatic reduction in the cost of hardware.

  • Despite the widespread adoption of GIS by the business community, there is an increasing awareness that current proprietary GIS packages are limited in their capability to address business objectives because they lack appropriate analytical tools. One aspect of developments in marketing research is the increased emphasis being placed on the use of statistical methods, particularly in unravelling the relationships between a large number of demographic characteristics from Census data and the profiles of customers in an organisation's database. Most packages for statistical analysis are not particularly relevant to marketing and do nothing to help a largely non-technical user community. There is a need to develop "black-box" automated analysis tools, with robust performance, to investigate the spatial associations between customer characteristics and those of background populations in defined catchment areas to generate additional information to support effective marketing activities.

    In this paper, a technique is presented based on rank correlation that is capable of analysing the association between the distribution of customers and background demographic characteristics. The Spearman rank correlation coefficient has been programmed within MapInfo using the Mapbasic language. The method is directed towards business needs rather than towards technical niceties. The analysis of the association between customers purchasing motor vehicles and demographic variables from the Census for a region in Sydney, Australia, is used as a case study to demonstrate the potential of the technique.

    2. GIS and marketing analysis

    In today's highly competitive environment, marketing is a customer-orientated operation that is essential for business success. Beaumont and Inglis (1989) argue that marketing departments face real problems in fully understanding their markets and the potential customers for their products and services. Marketing analysis is proving relevant and useful in this context.

    Marketing analysis covers a wide range of topics. Beaumont (1991) notes that conventionally the marketing mix can be summarised as the four Ps: product, price, place and promotion. He suggests that these should be supplemented with a fifth P - that of data processing , which makes it possible to integrate GIS and the marketing mix (see Figure 1). This highlights the central role GIS can play as a tool to integrate the various components of the marketing mix to assist strategic decision making.

    GC053_01.GIF
    Figure 1. The integration of the marketing mix ('five Ps')
    [ after Beaumont, 1991, P140]

    Beaumont (1991) has also provided a useful list of the kinds of questions that need to be answered by marketing management (see Figure 2). This shows the various dimensions of a marketing manager's decisions that could be supported using GIS to interrogate various data sets that are typically held by an organisation. These illustrations demonstrate the power of GIS in broadening the perspective of marketing analysis in market research. Birkin (1996) has argued that GIS provide useful technical support for data management in a competitive environment by integrating various sources of information, as well as producing attractive graphic displays of data in map form.

    GC053_02.GIF
    Figure 2. Fundamental questions in marketing management
    [ after Beaumont, 1991, P140 - 141]

    Marketing research exists to serve the information needs of both operations and strategy development in the business environment. The methods developed in marketing research are as diverse as the problems addressed, and methodologies and concepts have been borrowed from a variety of disciplines. Among these, statistical methods have become very important: most of the standard statistical analytical procedures have been widely applied in marketing research as part of what Lehmann (1985) describes as a general trend towards a more quantitative approach to marketing research.

    An important development that has increased the potential for using GIS and statistical techniques in marketing is the increasing availability of relevant Census data in digital form, widely developed geodemographic data bases, as well as Census data that is already integrated with GIS software (Beaumont and Inglis, 1989; Flowerdew and Goldstein, 1989; Batey and Brown, 1995; Martin, 1995; Martin and Longley, 1995; Waters, 1995). Census data, collected on a regular basis, are the source of the most complete, reliable, and widely available demographic information for market research and this has promoted a number of highly successful geodemographic systems in marketing, for example ACRON (A Classification of Residential Neighbourhoods) and MOSAIC in the UK, LIFESTYLES and VISION in the USA. Demographic trends, which form the underlying framework for customer analysis, requires access to and appropriate exploitation of official and other external demographic data. It is also important that businesses make full use of the data internally available, that is the data associated with products and existing customers derived from records of purchase. The power of a GIS in these areas is its ability to manage data from a number of different sources and its capability for interactive visualisation in exploring data and the results of analyses. So GIS software and Census data are often assembled as packages. For example, ArcView may be purchased together with some Census data, and CData96, data from the 1996 Australian Census of Housing and Population on CD Rom, makes full use of the features and tools of MapInfo Professional.

    As the benefits of collecting more customer information are gradually realised, the potential for systems that analysis customer data as well as external data sources has dramatically increased. GIS geo-processing functions provide a means for enhancing databases in two respects (Openshaw, 1995): first by linking with other spatial data (e.g. geodemographics), and secondly by adding value to databases. For example, given the strong association between social status, income and location (e.g. by postcode or suburb), the analysis of customer addresses makes it possible to match these with income and social status. Customer data with addresses are now routinely collected for billing and maintenance purposes, and these spatially based customer files can be used directly to describe the current customer base through profiling. Flowerdew and Goldstein (1989) suggest that the most important market research data are product-purchasing profiles, which give social and demographic profiles of those people most likely to purchase a particular product or use a particular service. The profiles of existing customers can be used to highlight the potential for cross-selling services.

    Although most widely used GIS packages perform a range of functions, many of them have weaknesses when it comes to business applications. Combining GIS and other techniques will create appropriate and diverse approaches to problem solving (Maguire, 1995). One advantage of linking statistical methods with GIS is to integrate GIS capabilities with the power of statistical analysis, and to effectively use data from different sources for market analysis. Another advantage of this is that users can visualise spatial data in different forms, for example visualise the spatial distribution of data on maps prior to further statistical analysis, and visualise spatial data in various statistical graphs and diagrams which may yield more insights into the nature of distributions. In this way, the application of GIS in market analysis may be seen as a tool for reaching a desired solution for the client and not merely as an end in itself (Cresswell, 1995).

    3. Methodology

    3.1 Association and correlation

    Association exists if the distribution of one variable is related to the distribution of another variable. Measures of association indicate, in quantitative terms, the extent to which a change in the value of one variable is related to a change in the value of another. There are a large number of association measures, each has its own peculiarities and limitations (Argyrous,1996). When working with scales that have many distinct values, the word correlation rather than association is used to describe the strength of the relationship that exists between two variables, X and Y, when for each observation the value X i is paired with its corresponding value Y i .

    Therefore they provide a way of determining which demographic variables, taken from Census data, best "agree" with an existing customer profile, taken from a business' records. These Census variables can then be considered the best predictors of potential households within a statistical area that might purchase similar products.

    When searching for such demographic variables, the major concern is to predict the order of pairs of cases. In other words, does the fact that one area has more high income people than another area help to predict whether that area also has more people who might buy a particular brand of product than another? Considering that both census data and business data have a wide range of possible values, the scales could be longer, the non-parametric Spearman rank correlation coefficient ( r s ) is available to measure the strength of correlation(Lehmann and D'Abrera, 1998). The measure of strength of relationship using r s is based on a scale ranging from -1.00 (a perfect negative correlation) to +1.00 (a perfect positive correlation). On this scale, zero indicates no correlation at all.

    The method proposed in this paper uses the Spearman rank correlation coefficient which computes the correlation between two sets of ranks using the following formula:

    GC053_14.GIF (1)

    where d = number of places that an object differs in the two rankings
    n = number of objects ranked

    3.2 The integration of rank techniques within GIS

    There are two basic ways of linking GIS with a statistical procedure. One is based on a loose coupling which relies on an efficient interchange of data input and output between two packages. The other, called close coupling, is designed as a seamless integration which, in this case, links the module used to calculate the rank correlation coefficients with the GIS within a single piece of "menu" driven software that runs both the correlation analysis and generates the graphs and maps from the GIS.

    In the analysis tool reported on in this paper, MapInfo is used for the association analysis since CData96 is designed specifically to be used with this particular software. Mapbasic, MapInfo's Macro language, has been used to develop the analysis tool and to call the Spearman rank correlation within Mapinfo. Using MapInfo's graphical user interface (GUI) and the tool added (Figure 3), the user can select any pair of variables (see Figures 4 and 5) simply by clicking the tool box. The analytical procedure, from the ranking of the variables, the calculation of the correlation coefficients, and the display of the results on graphs, is therefore implemented completely automatically in a user friendly environment.

    GC053_03.GIF
    Figure 3. Screen capture for adding the developed tool

    GC053_04.GIF
    Figure 4. Screen capture of selecting customer point data

    GC053_05.GIF
    Figure 5. Screen capture of selecting Census boundary data

    The Census data stored in the tables in CData96 is referenced to Census Collection Districts (CDs) although the data can be aggregated into any other Census or customer boundaries, e.g. postcodes, Local Government areas, sales territories, or marketing areas. Because most customer spatial distribution data are point data, the developed tool has been designed to count customer points according to the boundaries that contain the Census data (or other data). Then the boundary areas are ranked based on the number of customers (X) and the values of each census variable (Y) respectively, giving rank 1 to the area with the highest value since these will have the most positive predisposition to purchase the product. The rankings are put into a new table (see Figure 6), and based on this new table, calculation of r s is implemented and the values of r s are displayed as MapInfo graphs.

    GC053_06.GIF
    Figure 6. The table of rankings

    Since all of the above procedures are automatically performed during processing, the message window has been designed to display the status of the particular analysis for the user (Figure 7), and let user to understand the performance of this kind of "black-box".

    GC053_07.GIF
    Figure 7. Screen capture of message windows

    Because the point customer data have been aggregated to statistical areas, other maps can be produced showing the extent of product penetration and provide more information about catchment areas.

    The system allows the user to identify relationships in a straightforward way with only a minimal understanding of statistics. The user is not presented with complex and cumbersome choices, and the technical issues are kept in the background. Seeing relationships easily on graphs and maps reinforces the power of the system for interrogating data and contributes to a better understanding of demographic predictors.

    4. Demonstration

    The user-friendly system developed here is used to explore the statistical relationships between unknown demographic variables and known existing customer distribution for purchasers of a particular brand of luxury motor vehicle. The Figures illustrate the way relationships and patterns hidden in the data are revealed to the user.

    Imagine that an automobile dealer wants to identify its potential customer base in a systematic way. Based on data for a period of six months car sales, the following procedures are involved:

    1. Geocode the existing customer database by street address to generate customer point distribution data.
    2. Identify the potential demographic characteristics of customers likely to purchase similar motor vehicles. The data for demographic characteristics can be down loaded from CData96. The screen capture shown in Figure 8 is a summary of CData96 using MapInfo. Thirty-two basic community profile tables are included in CData96, each of which contains a subset of Census variables, many of which represent categories of demographic variables.

    3. GC053_08.GIF
      Figure 8. Screen capture of Census tables from CData96

    4. When down loading the potential demographic variables from the Census tables, the data can be aggregated directly by other boundaries rather than CD boundaries. In this example, the aggregation is by postcode areas although any statistical division may be used.
    5. The developed procedure can be launched from the Tool Menu (see Figure 9) to explore the statistical relationships between demographic variables and the distribution of existing customers by calculating correlation coefficients for each demographic variable. Higher positive rank correlation coefficients indicate that the particular demographic variable may be a good predictor for the order of postcode areas established on the basis of existing customer's data.
    GC053_09.GIF
    Figure 9. Screen capture of launching rank tool

    Figures 10, 11, and 12 show examples of the results of the analysis. Figure 10 indicates a strong positive correlation (r s = +0.73) between the variable "A$2000 or above" in the Weekly Family Income table and the distribution of existing customers. This variable therefore can be considered as one of the best predictors of market potential. Figure 11 shows the association with Occupation: the high correlations with variables "Mangers_Administrators" and "Professionals" indicate that these are two other good predictors of potential customers. The variable "A$1500_more" from the Monthly Housing Loan Repayment table is a good predictor as well (Figure 12). On the other hand, the negative correlations shown in Figure 11 between the distribution of existing customers and the variables "Labourers_Related Workers" and " Intermediate Production_Transport Related Workers" from the Occupation table suggest that these demographic variables are poor predictors of market potential.

    GC053_10.GIF
    Figure 10. Correlations between weekly family income and existing customers

    GC053_11.GIF
    Figure 11. Correlations between occupations and existing customers

    GC053_12.GIF
    Figure 12. Correlations between monthly housing loan repayment and existing customers

    Figure 13 shows the products penetration status. From the map it is obvious that sales are mainly concentrated in the vicinity of the sales office where 31 postcode areas account for 60 percent total sales. The remaining 40 percent sales are located in the other 151 postcode areas.

    GC053_13.GIF

    Figure 13. Car sales penetration

    Analysis of the relationship between sales data and Census data suggests the socio-economic characteristics of customers that are most likely to buy the product or to spend the most. Using this knowledge and the information from businesses' data base, it is easy to find the location of potential customers having similar socio-economic characteristics and hence to target specific sub- groups by statistical areas which could be responsive to the marketing of the same or similar brands of motor vehicles.

    5. Conclusion

    Marketing research is very dynamic: methods and products are continually subject to change and upgrading. GIS coupled with statistical techniques provide efficient tools for building and mapping demographic profiles based on information about customers collected and stored in corporate databases. GIS can combine these based on a geographic component, such as postcodes or addresses. Combining demographic data from the Census with a customer database offers the potential for substantially increasing a company's capability for marketing analysis. Understanding customers and their socio-economic characteristics is essential in making good business decisions. Understanding the demographic profiles of their own customers, business companies can evaluate the suitability of a particular geographical area for the marketing effort, and also to better target direct mail, sales brochures, and media advertising geographically

    Although GIS are increasingly being used as an information technology tool by the business community, more effort still needs to be invested in GIS development for providing analysis in such mission-critical areas as site selection, target marketing, prospect analysis, territory allocation and media planning.

    Further research is being conducted to extend the analysis based on the tool developed in this paper to combine significant demographic predictors into an index, and to call upon GIS to perform various kinds of spatial analysis that together can be used to identify specific geographical areas (i.e. postcode areas) to be targeted in marketing.

    Acknowledgments

    I am very grateful to Professor Barry Garner, School of Geography, University of New South Wales, for his valuable comments on the method and for his help in editing the paper.

    References