Anonymisation

Anonymisation is a valuable tool that allows data to be shared, whilst preserving privacy. The process of anonymising data requires that identifiers are changed in some way such as being removed, substituted, distorted, generalised or aggregated.

A person's identity can be disclosed from:

  • Direct identifiers such as names, postcode information or pictures
  • Indirect identifiers which, when linked with other available information, could identify someone, for example information on workplace, occupation, salary or age

You decide which information to keep for data to be useful and which to change. Removing key variables, applying pseudonyms, generalising and removing contextual information from textual files, and blurring image or video data could result in important details being missed or incorrect inferences being made. See example 1 and example 2 for balancing anonymisation with keeping data useful for qualitative and quantitative data.

Anonymising research data is best planned early in the research to help reduce anonymisation costs, and should be considered alongside obtaining informed consent for data sharing or imposing access restrictions. Personal data should never be disclosed from research information, unless a participant has given consent to do so, ideally in writing.

Quantitative data
Qualitative data

When anonymising qualitative data such as transcribed interviews, textual or audio-visual data, pseudonyms or generic descriptors should be used to edit identifying information, rather than blanking-out information.

Consideration should be given to the level of anonymity required to meet the needs agreed during the informed consent process. Pre-planning and agreeing with participants during the consent process, on what may and may not be recorded or transcribed, can be a much more effective way of creating data that accurately represent the research process and the contribution of participants. For example, if an employer's name cannot be disclosed, it should be agreed in advance that it will not be mentioned during an interview. This is easier than spending time later removing it from a recording or transcript.

Best practices for anonymising text

  • Do not collect disclosive data unless this is necessary, e.g. do not ask for full names if they cannot be used in the data
  • Plan anonymisation at the time of transcription or initial write up, (longitudinal studies may be an exception if relationships between waves of interviews need special attention for harmonised editing)
  • Use pseudonyms or replacements that are consistent within the research team and throughout the project, for example use the same pseudonyms in publications and follow-up research
  • Use 'search and replace' techniques carefully so that unintended changes are not made, and mis-spelled words are not missed
  • Identify replacements in text clearly, for example with [brackets] or using XML tags such as <seg>word to be anonymised</seg>
  • Keep unedited versions of data for use within the research team and for preservation
  • Create an anonymisation log of all replacements, aggregations or removals made and store such a log separately from the anonymised data files
  • Consider redacting statements where there is an increased risk of harm or disclosure.

Our text anonymisation helper tool can help you find disclosive information to remove or pseudonymise in qualitative data files. The tool does not anonymise or make changes to data, but uses MS Word macros to find and highlight numbers and words starting with capital letters in text. Numbers and capitalised words are often disclosive, e.g. as names, companies, birth dates, addresses, educational institutions and countries.

Transcript anonymisation example

In an interview transcript a person's name is replaced with a pseudonym or with a tag that typifies the person [farmer Bob, paternal grandmother, council employee]. This is also done when reference is made to other identifiable people. An exact geographical location may be replaced with a meaningful descriptive term that typifies the location [southern part of town, near the local river, a moorland farm, his native village]. See this example with markup.

Examples of 'over' and 'under' anonymisation

Original: So my first workplace was Arronal which was about 20 minutes from my home in Norwich. My best colleagues from day one were Andy, Julie and Louise and in fact, I am still very good friends with Julie to this day. She lives in the same parish still with her husband Owen and their son Ryan.

Example A, too heavy: So my first workplace was X which was about X minutes from my home in X. My best colleagues from day one were X, X and X and in fact, I am still very good friends with X to this day. X lives in the same parish still with her husband X and their X X. 

Example B, too light: So my first workplace was [name] which was about 20 minutes from my home in Norwich. My best colleagues from day one were Andy, Julie and Louise and in fact, I am still very good friends with Julie to this day. She lives in the same parish still with her husband Owen and their son Ryan.

Anonymising audio-visual data

Anonymisation of audio-visual data, such as editing of digital images or audio recordings, should be done sensitively. Bleeping out real names or place names is acceptable, but disguising voices by altering the pitch in a recording, or obscuring faces by pixellating sections of a video image significantly reduces the usefulness of data. These processes are also highly labour intensive and expensive.

If confidentiality of audio-visual data is an issue, it is better to obtain the participant's consent to use and share the data unaltered. Where anonymisation would result in too much loss of data content, regulating access to data can be considered as a better strategy.

We urge researchers to consider and judge at an early stage the implications of depositing materials containing confidential information and to get in touch to consult on any potential issues.

Step-by-step

Back to top  

Discover UK Data Service

Quick Access To

Add-on MS Word anonymisation macro tool for qualitative data, available from our tools collection.