Skip navigation to content

How should I format my data?

File formats

Since hardware and software used to create and interact with your research data may become obsolete, to ensure long-term accessibility, data may need converting from the original format into standard formats that are suitable for preservation. All digital information is designed to be interpreted by computer programs to make them understandable, and are - by their nature - software-dependent.

Despite the backward compatibility of many software packages to import data created in previous software versions, the safest option to guarantee long-term data access is to convert data to standard formats that most software are capable of interpreting.

Formats more likely to be accessible in the future are:

  • non-proprietary
  • based on an open, documented standard
  • in common usage by research community
  • represented by a standard (e.g. ASCII, Unicode)
  • unencrypted
  • uncompressed.

Here are some recommended file formats for different types of data:

Type of DataRecommended File Formats for Sharing, Reuse and Preservation
Quantitative tabular data with extensive metadata

A dataset with variable labels, code labels, and defined missing values, in addition to the metadata

  • .por
  • SPSS, Stata, SAS, etc.
  • Some structured text or markup file containing metadata information (e.g. DDI XML file)

Quantitative tabular data with minimal metadata

A matrix of data with or without column headings or variable names, but no other metadata or labelling

  • .csv
  • .tab

Geospatial data

Vector and raster data

  • ESRI Shapefile (essential: .shp, .shx, .dbf; optional: .prj, .sbx, .sbn)
  • Geo-referenced TIFF (.tif,. tfw)
  • CAD data (.dwg)
Qualitative data

Textual

  • eXtensible Mark-up language (XML) text according to an appropriate Document Type Definition (DTD) or schema (.xml)
  • Rich Text Format (.rtf)
  • Plain text data, ASCII (.txt)

Digital image data

  • .tif

Digital audio data

  • .flac

Digital video data

  • .mp4
  • .jp2

Documentation

  • .rtf
  • .pdf
  • .odt

Although Microsoft Office programs are proprietary, these are in widespread use and are provided by default by the University, therefore it is also expected that some files may be in Microsoft format.


Converting data

When data are offered to data archives for preservation, you should convert the data to a preferred preservation format; you know your data best, so you are in the best position to ensure data integrity during conversions.

When data are converted from one format to another through export or by data translation software, certain changes may occur to the data. They should be checked to ensure that:

  • for data held in statistical packages, spreadsheets or databases: data or metadata may be lost during conversions such as value definitions, decimal numbers or formulae - or data may be truncated; or

  • for textual data, editing such as highlighting or headers/footers may be lost.


More Information

Contact

Research Data team

University of St Andrews Library
North Street
St Andrews
Fife
KY16 9TR
Scotland, United Kingdom

Tel: (01334) 462343