Preparing data for deposit

Article Image

"Guidance on preparing and managing data"

Whether depositing large-scale survey data in our curated collection or smaller research collections via our self-deposit system, ReShare, data creators should consult our guidance below on preparing data. Ideally this should be prior to the start of fieldwork or data collection. In additon to the summary points noted here, we also provide comprehensive best practice guidance aimed at individual researchers and research support staff which can be found on our Manage data web pages.

We run a programme of regular training workshops covering key areas of managing and sharing research data. Please also get in touch with us if you would like to discuss any of these issues further.

Data files

Preparing data files

Allow sufficient time during and towards the end of a project for these preparations. Build in quality control checks for your data capture and cleaning processes:

  • use consistent and meaningful file names that reflect the file content, avoiding spaces and special characters; if data are sensitive or restricted, indicate this in the file name
  • use meaningful and self-explanatory variable names, codes and abbreviations
  • ensure internal consistency checks are completed
  • ensure variable and value labels are complete and consistent, both questionnaire and derived variables
  • remove all your own temporary, administrative or dummy variables created for internal purposes/not of use to researchers
  • ensure no repetition of variables, especially redundancy in derived variables
  • check that the level of detail included in the data is suitable for the agreed access arrangements and licensing
  • apply an appropriate level of anonymisation e.g. serial numbers anonymised so that they cannot be linked to other sources, any top coding applied, cases removed
  • provide anonymised Primary Sampling Unit information if possible so that researchers can incorporate the sampling design into their analyses
  • check that any textual variables included are suitable for dissemination e.g. no disclosive information or internal comments in free-text variables
  • ensure consistent treatment and labelling of missing values
  • include weights as variables but do not apply them in the deposited data files
  • use our recommended file formats
  • check our recommended transcription format for qualitative textual data
  • if converting data across file formats, check that no data or internal metadata have been lost or changed
  • check whether copyright permission needs to be sought with regard to data ownership.
  • finally, make sure that data are complete and try to ensure one deposit only, with any data issues resolved before deposit

Requirements for publishing in Nesstar

Nesstar is the UK Data Service’s online data browsing, analysis, subsetting and download tool that enables easy access to richly documented variables. Instant tabulation and graphing can be done . Full question text, universe and routing information is displayed alongside variable name, code values and labels, and frequencies.

A selection of key data, typically from government departments, are made available through our Nesstar service. These requires additional processing work to render them suitable for user-friendly online browsing, including:

  • variable and value labels must be clear and consistent, avoiding truncation of variable and value labels
  • non-compliant characters, such as &, @ and <>, should be removed
  • question text should be made available in as structured a format as possible, e.g. XML or spreadsheet.
Confidential data
Access and licensing

Back to top