Introduction
- This web page is for organising information about the CSAP meeting on 2007-04-27.
- Contents
Agenda
- 12.00 – 1.00 Ten minute presentations
- Belinda Wu – “Building migration into a dynamic microsimulation model”
- Dianna Smith – “Modification of a static microsimulation model”
- Kim Procter – “SimObesity: The longest mile is the last mile home”
- Andy Turner – “Individual and Household Level Estimates Based on 2001 UK Human Population Census Data”
- Karyn Morrissey – “Combining Microsimulation and Spatial Interaction Models”
- 1.00 – 1.30 Lunch
- 1.30 – 2.00 Discussion led by Phil Rees, Andy Evans & Karyn Morrissey
People
Notes
- Introduction
- Martin
- Microsmulation as a technique has been around for some time.
- Mention of an article by Alan Wilson and somebody published in 1973.
- Building migration into a dynamic microsimulation model
- Belinda
- Student population still a problem.
- Other problems listed
- Behavioural modelling...
- Computational Problems of migration modelling...
- How to Validate?
- Modification of a static microsimulation model
- Dianna
- SimHealth
- Linking:
- Health Survey for England
- 2001 Census Data
- Based on Flexible Modelling Framework developed by Kirk Harland
- Listed alternative methods:
- Logit
- Bayesian
- Agents
- Others?
- Discussion
- Martin mentioned that Experian have linked MOSAIC and Health Data in collaboration with Slough PCT
- A bus went round to diagnose diabetes based on areas targetted with the data as likely to have a large number of people with undiagnosed types of specific diabetes.
- Graham mentioned that Dianna working on linking her work to look at food access in terms of the accessibility of supermarkets.
- SimObesity: The longest mile is the last mile home
- Kim
- Obesity in children.
- Along the lines of SimBritain and SimHealth
- Choose variables that are correlated with that being added to perform the linkage of two data sets.
- Individual and Household Level Estimates Based on 2001 UK Human Population Census Data
- Andy
- Seemed to go OK
- Martin asked about computation and how long it took to generate output for Leeds.
- John Stilwell asked about SCAM
- Combining Microsimulation and Spatial Interaction Models
- Karyn
- Accessibility to Health Care
- SMILE
- Living in Ireland Survey
- What is the difference between Location-Allocation (L-A) models and Spatial Interaction Models (SIM)?
- Graham Clarke defined L-A models to be purely distance based and that SIM have a more complex attractiveness variable built in.
- Discussion
- John and Adam are getting for England and Wales the data Karyn wants on referals and admissions (for Ireland):
- From where do people go and where are they treated?
- Karyn knows the data exist for Ireland, but she has been refused access for her research!
- Cluster Business
- Mark
- 2007-05-14 12:00 to 14:00 Joint Cluster Meeting
- Develop research proposals related to other SoG research clusters
- I wondered about something along the lines of:
- Affordable Housing: The social housing stock, changes over time: A case study in Leeds
- Involving Stuart Hodkinson given that he has already talked to MASS and others in CSAP are interested in the Housing Market.
- Graham
- Please provide suggestions for invitees to the SoG Global Seminar Series for the 2007-2008 season.
- Professor from Mexico visiting and sharing G10 office.
- Lunch
- Discussion after Lunch
- Andy Evans Kicked this off :)
- Jin's work is on the N:/
- Open Source?
- How many different beasts exist?
- Talk of developing a Funding Proposal to:
- Integrate all the software and make available as open source.
- Develop a User Friendly Interface
- Mark wants us to focus on developing the ultimate population dataset.
- Karyn suggested that in terms of validation we should look to economics work.
- I had to leave 5 minutes before the scheduled end to make my way to the next meeting.
Preparation
- Individual and Household Level Estimates Based on 2001 UK Human Population Census Data
Andy Turner
- Presentation Slides:
- Outline
- MoSeS: A Brief Introduction to the Demographic Modelling work.
- Focus on Population Initialisation for 2001.
- Future Work
- Detail
- MoSeS: A Brief Introduction to the Demographic Modelling work.
- Population Initialisation for 2001.
- Dynamic Model for 2001 to 2031.
- Enriching the Individual and Household data with variables from health and social survey data.
- Making the results and methods available for researchers and policy makers.
- Focus on Population Initialisation for 2001.
- Overview
- The basic task is to select a well fitting set of records from the Individual and Household Sample of Anonymised Records (ISAR and HSAR respectively) to constitute census areas.
- The fitness of a set is evaluated by a fitness function which compares aggregate estimates from the ISAR and HSAR, with those from published Census Aggregate/Area Statistics (CAS).
- Mark Birkin has done some work to find a well fitting set using an Iterative Proportional Sampling (IPS) technique and I have done some work using Genetic Algorithms (GA).
- Genetic Algorithm Details
- Control Constrain (CC) and Optimisation Constrain (OC)
- CC are measures that *have to* be met in solutions.
- OC are measures that are fitted to.
- A story of development:
- July 2005.
- Readers and formatters of a selected set of Census Aggregate Statistics (CAS) data were developed.
- Reader and formatter of the Individual SAR (ISAR) was developed.
- GA developed with CC of total population.
- Output generated for Leeds based on Output Area CAS.
- Code parallelised and output generated for UK.
- Feb 2006.
- Output attempted to be used as input into Belinda's dynamic simulation.
- Problem: ISAR records could not easily be formed into households
- Belinda had developed a Household Formation Routine (HFR) and this was in most cases it could not assign all ISAR records into Households.
- Solution:
- Use Belinda's HFR to encourage solutions where ISAR records could be formed into Households.
- New output generated for Leeds and UK.
- April 2006.
- Concerns over the assumptions in the HFR and the availability of a Household SAR (HSAR) a new solution was proposed:
- Select from the HSAR the Household Population (HP) and the ISAR for the Communal Establishment Population (CEP)
- In theory this is OK,
- Although the HSAR is only available for England and Wales.
- The HSAR also has some stricter data licence conditions due to confidentiality on disclosure concerns.
- But in practice this has taken longer than hoped.
- New strategy was:
- CC the CEP using CAS001 and CC the HP using CAS003.
- OC based on variables from a set of tables
- Strategy to CC by age is considerably more difficult than simply CC by HP and CEP totals:
- Census data have age variables aggregated in one of a number of different ways, for instance:
- In the ISAR the age variable AGE0 has values 0-15 as single years of age, 16-19, 20-24, 25-29, 30-44, 45-59, 60-64, 65-69, 70-74, and 75-79 as single years of age.
- In the HSAR the age variable AGEH has values 0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56, 58, 60, 62, 64, 66, 68, 70, 72, 74, 76, 78, 80.
- In CAS001 age is grouped as 0-24 as single years of age, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75-79, 80-84, 85-89, 90+.
- In CAS003 age is grouped as 0-19, 20-24, 25-29, 30-34, 35-39, 40-44, 45-49, 50-54, 55-59, 60-64, 65-69, 70-74, 75-79, 80-84, 85-89, 90+.
- To CC on HP age is very hard!
- Any Household can contain up to 12 individuals.
- Consider the difficulty ensuring that the total number of individuals is correct.
- Household Records can be ordered by number of individuals.
- To order by the exact age breakdown of the individuals is considerably more complex.
- HP CC is on the Household Reference Person (HRP) only:
- For this CAS003 age is grouped as 0-19, 20-29, 30-39, 40-49, 50-59, 60-69, 70-79, 80+.
- For CEP CC based on CAS001 age is grouped as 0-15 as single years of age, 16-19, 20-24, 25-29, 30-44, 45-59, 60-64, 65-69, 70-74, 75-79, 80-84, 85-89, 90+.
- GA:
- An initial set of solutions is generated.
- Solutions are bred:
- This is done by mutatation involving swapping a random number of HSAR and ISAR ensuring CC are met:
- CC are met by ensuring any record swapped in has has age in the relevant age group of CC.
- Each solution is measured for a goodness of fit.
- The best fitting solutions are kept.
- Steps 1 to 5 are repeated until convergence or until a fixed number of iterations are completed.
- Optimisation Constraints (OC)
- Optimisation Constraint Variables:
- Health:
- PeopleWhoseGeneralHealthWasGood
- PeopleWhoseGeneralHealthWasFairlyGood
- PeopleWhoseGeneralHealthWasNotGood
- PeopleWithLimitingLongTermIllness
- Household Composition:
- LoneParentHouseholdsWithChildren
- OneFamilyAndNoChildren
- MarriedOrCohabitingCoupleWithChildren
- Female:
- Age:
- FemalesAge0to4
- FemalesAge5to9
- FemalesAge10to14
- FemalesAge15to19
- FemalesAge20to24
- FemalesAge25to29
- FemalesAge30to34
- FemalesAge35to39
- FemalesAge40to44
- FemalesAge45to49
- FemalesAge50to54
- FemalesAge55to59
- FemalesAge60to64
- FemalesAge65to69
- FemalesAge70to74
- FemalesAge75to79
- FemalesAge80AndOver
- Marriage Age:
- FemalesMarriedAge0to15
- FemalesMarriedAge16to19
- FemalesMarriedAge20to24
- FemalesMarriedAge25to29
- FemalesMarriedAge30to34
- FemalesMarriedAge35to39
- FemalesMarriedAge40to44
- FemalesMarriedAge45to49
- FemalesMarriedAge50to54
- FemalesMarriedAge55to59
- FemalesMarriedAge60to64
- FemalesMarriedAge65to74
- FemalesMarriedAge75to79
- FemalesMarriedAge80AndOver
- Economic Activity:
- FemalesAge16to24Unemployed
- FemalesAge16to74
- FemalesAge16to74EconomicallyActiveEmployed
- FemalesAge16to74EconomicallyActiveUnemployed
- FemalesAge16to74EconomicallyInactive
- FemalesAge50AndOverUnemployed
- Male:
- Age:
- MalesAge0to4
- MalesAge5to9
- MalesAge10to14
- MalesAge15to19
- MalesAge20to24
- MalesAge25to29
- MalesAge30to34
- MalesAge35to39
- MalesAge40to44
- MalesAge45to49
- MalesAge50to54
- MalesAge55to59
- MalesAge55to59
- MalesAge55to59
- MalesAge60to64
- MalesAge65to69
- MalesAge70to74
- MalesAge75to79
- MalesAge80AndOver
- Marriage Age:
- MalesMarriedAge0to15
- MalesMarriedAge16to19
- MalesMarriedAge20to24
- MalesMarriedAge25to29
- MalesMarriedAge30to34
- MalesMarriedAge35to39
- MalesMarriedAge40to44
- MalesMarriedAge45to49
- MalesMarriedAge50to54
- MalesMarriedAge55to59
- MalesMarriedAge60to64
- MalesMarriedAge65to74
- MalesMarriedAge75to79
- MalesMarriedAge80AndOver
- Economic Activity:
- MalesAge16to24Unemployed
- MalesAge16to74
- MalesAge16to74EconomicallyActiveEmployed
- MalesAge16to74EconomicallyActiveUnemployed
- MalesAge16to74EconomicallyInactive
- MalesAge50AndOverUnemployed
- For the last 6 months the results have been scruitinized and code revisited to identify various bugs and logic errors.
- In the main this is because the results were not as good as expected and did not compare favourably to those from an Iterative Proportional Sampling (IPS) developed by Mark Birkin.
- Our intuition is that a GA would be much better than IPS and it remains a challenge to show this.
- Hopefully the results generated this week do this...
- Current Stage
- For the last 6 months the results have been scruitinized and code revisited identifying various bugs and logic errors.
- Results have not yet passed expert approval.
- The first results were not as good as expected and did not compare favourably to those from an Iterative Proportional Sampling (IPS) developed by Mark Birkin.
- Intuitively the GA should be much better than IPS and it remains a challenge to show this.
- Hopefully the results generated this week do this...
- Future Work
- Next Steps
- Examine Errors Graphically and investigate outliers.
- Produce and analyse Geographical Maps of the Errors.
- Run Belinda's dynamic model based on these synthetic data:
- This will either work or we will identify further issues and problems.
- Enrich the ISAR and HSAR records by incorporating variables from the BHPS and other Survey and Health Databases.
- Mark has some Masters students looking into this.
- Develop a publication comparing the IPS and GA method and results.
- Reproduce ISAR only results which can be used more readily by others.
- Tidy and fully document the code.
- The code is written in Java and is open source.
References