Anonymisation is a valuable tool that allows data to be shared, whilst preserving privacy. The process of anonymising data requires that identifiers are changed in some way such as being removed, substituted, distorted, generalised or aggregated.
A person's identity can be disclosed from:
You decide which information to keep for data to be useful and which to change. Removing key variables, applying pseudonyms, generalising and removing contextual information from textual files, and blurring image or video data could result in important details being missed or incorrect inferences being made. See example 1 and example 2 for balancing anonymisation with keeping data useful for qualitative and quantitative data.
Anonymising research data is best planned early in the research to help reduce anonymisation costs, and should be considered alongside obtaining informed consent for data sharing or imposing access restrictions. Personal data should never be disclosed from research information, unless a participant has given consent to do so, ideally in writing.
When anonymising qualitative data such as transcribed interviews, textual or audio-visual data, pseudonyms or generic descriptors should be used to edit identifying information, rather than blanking-out information.
Consideration should be given to the level of anonymity required to meet the needs agreed during the informed consent process. Pre-planning and agreeing with participants during the consent process, on what may and may not be recorded or transcribed, can be a much more effective way of creating data that accurately represent the research process and the contribution of participants. For example, if an employer's name cannot be disclosed, it should be agreed in advance that it will not be mentioned during an interview. This is easier than spending time later removing it from a recording or transcript.
Best practices for anonymising text
Our text anonymisation helper tool can help you find disclosive information to remove or pseudonymise in qualitative data files. The tool does not anonymise or make changes to data, but uses MS Word macros to find and highlight numbers and words starting with capital letters in text. Numbers and capitalised words are often disclosive, e.g. as names, companies, birth dates, addresses, educational institutions and countries.
Transcript anonymisation example
In an interview transcript a person's name is replaced with a pseudonym or with a tag that typifies the person [farmer Bob, paternal grandmother, council employee]. This is also done when reference is made to other identifiable people. An exact geographical location may be replaced with a meaningful descriptive term that typifies the location [southern part of town, near the local river, a moorland farm, his native village]. See this example with markup.
Examples of 'over' and 'under' anonymisation
Original: So my first workplace was Arronal which was about 20 minutes from my home in Norwich. My best colleagues from day one were Andy, Julie and Louise and in fact, I am still very good friends with Julie to this day. She lives in the same parish still with her husband Owen and their son Ryan.
Example A, too heavy: So my first workplace was X which was about X minutes from my home in X. My best colleagues from day one were X, X and X and in fact, I am still very good friends with X to this day. X lives in the same parish still with her husband X and their X X.
Example B, too light: So my first workplace was [name] which was about 20 minutes from my home in Norwich. My best colleagues from day one were Andy, Julie and Louise and in fact, I am still very good friends with Julie to this day. She lives in the same parish still with her husband Owen and their son Ryan.
Anonymising audio-visual data
Anonymisation of audio-visual data, such as editing of digital images or audio recordings, should be done sensitively. Bleeping out real names or place names is acceptable, but disguising voices by altering the pitch in a recording, or obscuring faces by pixellating sections of a video image significantly reduces the usefulness of data. These processes are also highly labour intensive and expensive.
If confidentiality of audio-visual data is an issue, it is better to obtain the participant's consent to use and share the data unaltered. Where anonymisation would result in too much loss of data content, regulating access to data can be considered as a better strategy.
We urge researchers to consider and judge at an early stage the implications of depositing materials containing confidential information and to get in touch to consult on any potential issues.
The UK Data Service has recently upgraded its data discovery and access systems. Read more about the changes to our new Data Catalogue and ‘My Account’ area.