Introduction
People
Agenda
- Ken Miller's e-INF_D1.1.1_V2.doc Preliminary Selection of Datasets Draft
- AoB
Documentation
Notes
- Various technical problems
- I am using Access Grid Client 3.0.2 Build 1 on my PC for the first time with a borrowed headset and camera.
- I have no audio out:
- I suspect the microphone I am using or some setting on my PC is wrong.
- Will use textual input via a chat window and signalling to the camera to interact.
- Going through the agenda
- Ken Miller's e-INF_D1.1.1_V2.doc Preliminary Selection of Datasets Draft
- Pascal, Lancaster and MoSeS all working/have interest in Grid Enabled BHPS.
- Assessment of which datasets to grid enable partly based on what data has been used in social science research (popularity)
- In terms of Census data it might be better to focus firstly on the most recent data (2001)
- Although it has probably been used less than 1991 data, it probably will be being used more now and for the next few years?
- Harmonisation of long running time series data is regarded as a good Use Case for Grid Enabling the data.
- There is always a trade off between how much you can store and how fast you can process.
- In simulations, storing all the data after every step is too heavy.
- It is always sensible to store the source data and the formatted version of it (as in a database dump or file).
- Enriched data is formatted data that has been cleaned and integrated.
- The workflow and the programs used to enrich data should be stored and configuration done so that this step can be automatically repeated if desired (under different random seeds or with updated inputs).
- http://www.beyond2020.com/
- http://www.ccsr.ac.uk/sars/publications/newsletters/16/sarsnews16.htm
- Some organisations (e.g. the UK Office for National Statistics) are increasingly worried about disclosure risks:
- They worry that other data and data integration in general could lead to disclosures about individuals which could cause them problems.
- As a consequence the data that they release is increasingly; anonymised, aggregated, intentionally obfuscated with errors, and subjected to special license conditions.
- This decreases the utility of the data and makes the task of Grid enabling it harder.
- Are psuedo anonymisation, security, special digital licenses, legal and ethical guidelines about what can be published and what requires consent are needed?
- I think that we should try to break down the barriers preventing the integration of data:
- Perhaps a key challenge in doing this is to simultaneously produce high barriers against data misuse.
- A review of Ken's draft will be done by setting up a review group consisting of representatives from various nodes.
- This will include a MoSeS person (probably me).
- AoB
- Workshop in Manchester
- Next Meeting
- Wei Jie to talk about Security
Action List
- Andy Turner
- Provide following links to the group on request of Rob Allan:
References