Recommended formats

Recommended formats

File formats recommended by the UK Data Service

The table contains guidance on file formats recommended and accepted by the UK Data Service for data sharing, reuse and preservation.

You may need to convert your data files to a preservation file format.

We welcome queries from researchers about appropriate file formats for working and preservation, particularly early in the research process. If you are unsure of the suitability of your file formats for the data you want to deposit with the UK Data Service, pleaseĀ get in touch.

Type of data
Recommended formats
Other acceptable formats
Quantitative tabular data with extensive metadata.

A dataset with variable labels, code labels, and defined missing values, in addition to the matrix of data.

Proprietary formats of statistical packages e.g. SPSS (.sav), Stata (.dta), .sas7bdat.

Delimited text and command (‘setup’) file (SPSS, Stata, SAS, etc.) containing metadata information.

Some structured text or mark-up file containing metadata information, e.g. DDI XML file.

SPSS portable format (.por).

MS Access (.mdb/.accdb).

Quantitative tabular data with minimal metadata.

A matrix of data with or without column headings or variable names, but no other metadata or labeling.

Comma-separated values (CSV) file (.csv).

Tab-delimited file (.tab).

Including delimited text of given character set with SQL data definition statements where appropriate.

Delimited text of given character set – only characters not present in the data may be used as delimiters (.txt).

Widely-used formats: MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), OpenDocument Spreadsheet (.ods).

Geospatial data.

Vector and raster data.

ESRI Shapefile (essential – .shp, .shx, .dbf, optional – .prj, .sbx, .sbn).

Geo-referenced TIFF (.tif, .tfw).

CAD data (.dwg).

Tabular GIS attribute data.

ESRI Geodatabase format (.mdb).

MapInfo Interchange Format (.mif) for vector data.

Keyhole Mark-up Language (.kml).

Adobe Illustrator (.ai), CAD data (.dxf or .svg).

Binary formats of GIS and CAD packages.

Qualitative data.

Textual.

eXtensible Mark-up Language (XML) text according to an appropriate Document Type Definition (DTD) or schema (.xml).

Rich Text Format (.rtf).

Plain text data, ASCII (.txt).

Hypertext Mark-up Language (.html).

Widely-used formats: MS Word (.doc/.docx).

Some software-specific formats: NUD*IST, NVivo and ATLAS.ti.

Digital image data. TIFF version 6 uncompressed (.tif).

Digital Imaging and Communications in Medicine (DICOM) (.dcm, .dcm30) – for CT/MRI data.

JPEG (.jpeg, .jpg) but only if created in this format.

TIFF (other versions) (.tif, .tiff).

Adobe Portable Document Format (PDF/A, PDF) (.pdf).

Standard applicable RAW image format (.raw).

Photoshop files (.psd).

BMP (.bmp) but only if created in this format.

PNG (.png) but only if created in this format.

Digital audio data. Free Lossless Audio Codec (FLAC) (.flac). MPEG-1 Audio Layer 3 (.mp3) if original created in this format.

Audio Interchange File Format (.aif).

Waveform Audio Format (.wav).

Digital video data. MPEG-4 (.mp4).

OGG video (.ogv, .ogg).

motion JPEG 2000 (.mj2).

MOV (.mov)

Windows Media Video (WMV) (.wmv).

WebM (.webm).

Documentation and scripts. Rich Text Format (.rtf).

PDF/A or PDF (.pdf).

HTML (.htm).

OpenDocument Text (.odt).

R Markdown files (.rmd) (with HTML version as well).

Plain text (.txt).

Widely-used proprietary formats: MS Word (.doc/.docx), MS Excel (.xls/.xlsx).

XML marked-up text (.xml) according to an appropriate DTD or schema, e.g. XHMTL 1.0.