Digitising data

Non-digital data can be converted to the digital source in a variety of ways, depending on their format and condition. Information can be entered manually by keyboard into a text or database template. Image scans can be created via a document scanner or by digital photography; and text can be digitised via optical character recognition from image scans.

Digitising audio-visual material

Images can be scanned or photographed and saved in TIFF file format. Audio is best digitised to WAV file format and video to MPEG or motion JPEG 2000 format.

Digitising text

Textual data can be digitised to different levels depending on the quality of the writing or typeface.


Scanning as an image file and saving as a TIFF image file. This is the best method for information in poor typeface, readable handwritten text or text with multiple tables and graphs. If information needs to be anonymised, black marker black-out is used on a copy (preferably not the original), prior to scanning. Precious materials should be photocopied before feeding into any multi-feed scanner, in case they get damaged. If using a digital camera to capture images, it should have sufficient resolution capability, measured in megapixels. The camera should be secured with a horizontal mount and ensure that there is good overhead or dedicated lighting. Camera images should be transferred to safe media, and files well-organised.

Searchable PDF/A

If there are multiple pages in the original document, resulting scanned TIFF files can be collated into a searchable PDF/A file using 'Paper Capture' in Adobe Acrobat. Of course, the degree of searchability is much lower than a rich text text, but it is secured at a much lower cost of time and labour.

Bookmarked PDF/A

The PDF/A file can be bookmarked to aid navigation, with contents page and headings. This creates a series of embedded links within a document which can be very useful for navigating through long documents.

Rich text version via OCR

Text with good typeface can be scanned as an image file and then processed using optical character recognition (OCR) software that recognises text. Some training of the system may be required to enable it to recognise non-standardized words, such as technical terminology. Checking and proofing of the resulting OCR text against the original text source is necessary as errors do occur. This can be rather time consuming work. The resulting file can be saved and formatted as a word-processed file.


Text can be manually transcribed by keyboard from the original source. In this process the new source should be kept as close to the original as possible and if changes are made, such as correcting typos, such changes should be indicated in brackets.

Preparing paper for scanning

In preparation for scanning, papers can be photocopied to improve readability by adjusting poor-quality items in tone or brightness, and to protect originals. Scanning, which may involve several passes, is best not done on the paper originals.

Back to top  

We are giving away £20 in Amazon vouchers to the first 100 people who complete our online survey*

Discover UK Data Service