To extend current initiatives on statistical metadata (particularly metadata for statistical processes) to include the representation of statistical models, both in their underlying structure and as used for particular analyses or data syntheses. In particular, to provide a representation of algebraic model forms as an extension of existing expression representations, records of input data sources (including their versions), parametric assumptions used in model invocations (including distribution (Bayesian) assumptions for parameters) and links to synthesised information from model invocations. It will also draw on work for generic version control and audit trails.
The work package will draw on existing (and ongoing) work by others on statistical metadata (particularly on process metadata), and on the related implementation activities for metadata within the LATS transport database project being undertaken in London. It will also draw on concurrent work in WP2 to identify the form of models to be considered.
The work will comprise three interrelated activities.
The first is the development of a representation of the underlying form of statistical models as metadata, in a way that is accessible for review by people and execution by software. This will include algebraic expressions and relationships, distributions, variables and parameters, and will not assume that the model can be represented as a single component.
The second is the development of a representation of the way in which a model is used in the context of a statistical database. This will include recording the input information sources used, together with prior settings or assumptions about parameters (including vague assumptions in the form of distributions and dependencies). The representation should allow the use of a model to be reviewed, revised and re-executed. Issues of version control will need to be addressed, since in general the input sources will be dynamically updated.
The third is the development of a representation of the results of using a model. This will include estimated parameter values (with suitable posterior support or precision information), and also synthesised information generated from the fitted model. The main body of the latter will be stored in standard structures (as for real data), but it is essential to retain the link to the model and the generation process. Issues of version control will arise again.
Deliverables from this work package are available to User Forum members
| Back to Top | Workplan | Home |
Page last updated: 28 February 2008 |