Data-level, or object-level, documentation provides information at the level of variables in a database or individual objects such as interview transcripts or pictures. Data-level information can be embedded in data files, such as variable, value and code labels in an SPSS file or headers in a interview transcript.
Where possible, variable-level annotation should be embedded within a data file itself. More comprehensive variable level documentation can also be created using a structured metadata format such as XML.
Structured tabular data should have as documentation (where applicable):
|Example: variable 'q11hexw' with label 'Q11: hours spent taking physical exercise in a typical week' —— the label gives the unit of measurement and a reference to the question number (Q11)|
|Example: variable 'p1sex' = 'sex of respondent' with codes '1=female', '2=male', '8=don't know', '9=not answered'|
|Examples: Standard Occupational Classification, 2000 —— a series of codes to classify respondents' jobs; ISO 3166 alpha-2 country codes —— an international standard of 2-letter country codes|
|Example: '99=not recorded', '98=not provided (no answer)', '97=not applicable', '96=not known', '95=error'|
Uncoded, ungrouped and underived raw data provide more re-use options than those where coding, grouping or derivation has been applied, allowing secondary users to apply their own codes, groupings or derivations.
Embedding data documentation
Many data software packages have facilities for data annotation and description as variable attributes (labels, codes, data type, missing values), table relationships, etc.
A structured dataset may also be accompanied by a codebook detailing all variables and their values. This can be created by importing frequency distribution outputs, created from the software package used, into a word processor, with annotation added where necessary.
Structured metadata: XML schemas
More comprehensive variable level documentation, including basic data dictionary information, question text and question routing instructions can also be created using a structured metadata format. XML is often used to enable this, such as in the Data Documentation Initiative (DDI). Detailed DDI documentation can be directly created from various software packages, using DDI-specific XML authoring tools.
Such standardised documentation in XML format can be used for data extract and analysis engines such as Nesstar; see for example the datasets included in our Nesstar catalogue.
The new UK Data Service website will go live, 12 August 2021. Redirections from the old site will be in place but we also recommend you remove old bookmarks and add new ones.