Table of Contents
A review of some GEOGRAPHICAL tools for Data Mining
Contents
Some introductory comments
But
PPT Slide
PPT Slide
There is an increasingly serious problem caused by IT developments ..
its called ?????????
DATA
AND..
there are LOTS of it!
Everyday there is More DATA than previously
More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and More Data and
and
More DATA
=
?
A bigger ...
No!We now store it in Data Archives also called Data Warehouses and Data marts!
BUT
PPT Slide
Data warehouses are expensive!
Justification is HARD because the potential value can only be expressed in terms of NEW USES that were previously impossible making economic assessment uncertain
And
As a result there is a FOCUS on short-term objectives that have little real relevancy to the underlying ambitions of data mining
Maximizing Profits
PPT Slide
If YOU don’t evolve better DATA USING technologies then having access to more and more data, more bandwidth, downsized hardware, faster computers, and bigger data repositories WILL NOT HELPat all
Indeed
The problem is.. that in our current IT age a fully computerized bureaucracy with computer based management systems covering most areas of modern life - data are being created and stored many times faster than it can be processed and used!
As a result probably 90%+ of all databases are not being fully exploited via state of the art analysis and modeling technologies and most are not being used at all!
The situation is rapidly becoming WORSE..
Data MiningandKnowledge Discovery to the RESCUE?
Well that is the HOPE!
My Definition
Data Mining is HOT!
The generic Aims and Objectives in Data Mining for Knowledge Discovery are fine .. its just that much of the technology, being used to do it, is next to useless andno one REALLY knows how to do it properly as yet!!!
One problem is that people are focusing on the TRENDY parts ignoring the broader picture
To exploit your (or other people’s) Data Riches you need 3 things..
You should recognize thatmuch of Data Mining is..
There is a GROSS underestimation of the problems of modelling the BEHAVIOUR of people
worse still!
There is gross ignorance of the GeographicalDimension
Some common Data Mining Tools
BUT
PPT Slide
So WHERE are the DISTINCTLY GEOGRAPHICAL data mining tools?
Geographyis unlike any other variable!!
The Geography variable is very SPECIAL because:
PPT Slide
A geographical approach to Data Mining
PPT Slide
The most USEFUL conventional geographical tool is the MAP
The map is a wonderful data viewing device BUT does little else!
The problem here is that most GIS experts have not the vaguest idea of how to do it!
There is a deep PREJUDICE against Data Mining in Geography
Yet.. Geographical Data Mining is PRECISELY what geography and many other FACT based social sciences actually needs if they are to move forward in the IT Age
PPT Slide
The Aggregation Operator
Type 1 Aggregation
Type 2 Aggregation
Flow data aggregation
PPT Slide
Both Aggregation Operations make the data
Geographical AGGREGATION tends to be very useful at drawing out the patterns in databases
Both types of Aggregation operations can be applied to Data Warehouse Databases
HOWEVER.. BEWARE!!!!
PPT Slide
PPT Slide
PPT Slide
PPT Slide
PPT Slide
PPT Slide
PPT Slide
PPT Slide
PPT Slide
PPT Slide
PPT Slide
PPT Slide
Database Pattern Summarizers #1
PPT Slide
Database Pattern Summarizers #2
There are various EXISTING methods for creating these classifications
You then need to EMBED them in some kind of intelligent targeting system
The Intelligent Geodemographic Targeting Machine (IGTM)
Intelligence due to matching method to context!
Geographical CONTEXT is another very useful predictor
Life Style Classification is often an important as a surrogate for more complex relationships!
Pattern and Process Models #1
PPT Slide
PPT Slide
PPT Slide
PPT Slide
PPT Slide
Much of these suggestions could be performed using fairly well understood legacy technologies. The NEW aspect is LINKING these models to Data Warehouses using High Performance Computing
Doing better
Various ways of creating BETTER models
PPT Slide
Two Dimensional Spatial Pattern Detectors
The Geographical Analysis Machine
The GAM worked as follows
The GAM was used to analyze cancer data
The Geographical Correlates Exploration Machine (GCEM)
MAP Explorer (MAPEX)
BUT
Spatial Data Mining
Conventional Data Mining methods focus on the WHAT question
For Example
PPT Slide
A marketing example: Predicting Alcohol Sales
PPT Slide
Data Mining tools are mainly UNI-SPACE explorers
Developing tri-space database explorers
Its a HARD problem because the three Data Domains are characterised by data with measurements that not in the same units and cannot be related to each other in any simple way
Space-Time-Attribute Creatures STACs
Geocyberspace Movies
Robustness
Example 1. Financial Services application
Example 2. Crime Data Analysis
PPT Slide
Conclusions
AND
Data Mining and Knowledge Discovery in Databases cannot safely ignore the GEOGRAPHICAL dimension
Finally...
Do Not be too SIMPLE MINDED!
|