Study-level documentation provides high-level information on the research context and design, the data collection methods used, any data preparations and manipulations and summaries of findings based on the data. This documentation is key to enabling the secondary user to make informed use of the data.
Good study-level data documentation includes information on:
- research design and context of data collection: project history, aims, objectives, hypotheses, investigators and funders
- data collection methods: data collection protocols, sampling design, sample structure and representation, work flows, instruments used, hardware and software used, data scale and resolution, temporal coverage and geographic coverage, and digitisation or transcription methods used
- structure of data files, with number of cases, records, files and variables, as well as any relationships among such items
- secondary data sources used and provenance, for example, for transcribed or derived data
- data validation, checking, proofing, cleaning and other quality assurance procedures carried out, such as checking for equipment and transcription errors, calibration procedures, data capture resolution and repetitions, or editing, proofing or quality control of materials
- modifications made to data over time since their original creation and identification of different versions of datasets
- for time series or longitudinal surveys, changes made to methodology, variable content, question text, variable labeling, measurements or sampling, and how panels were managed over time and between waves
- information on data confidentiality, access and any applicable conditions of use
- publications, presentations and other research outputs that explain or draw on the data
Data documentation can exist in reports to funders, technical reports, working papers, lab books or publications. Important data documentation is original questionnaires, interviewer instructions, interview topic guides or experimental protocols.
Example of online study-level documentation for a data collection in our data catalogue, including variables list, research instructions, questionnaires, coding frames and a user guide for the data.