Skip navigation to content

Documentation and metadata

In this section:

  1. Documentation
  2. Metadata

Documenting data and annotating it with metadata is crucial for making data accessible, understandable and usable. It supports preservation, sharing and re-/use of the data by yourself or others in the future and makes it easier to cite your data and acknowledge your effort.

It encompasses documentation at three main levels:

  1. Study: high-level information about the study, its context and design, methods, required data preparations and manipulations and its main findings.
  2. Dataset: information about the content, context and origin of a dataset, including the purpose, origin, temporal characteristics, geographic location, authorship, access conditions and terms of use.
  3. Data: information about variables in a spreadsheet, database or individual data objects such as interview transcripts or pictures. 

Documentation

Documentation about data should be provided at both, the study and the data level.

Supporting documentation

Supporting information can often be found in laboratory notebooks, questionnaires and interview guides, final reports or catalogue metadata. It typically provides context to the data and instructions for its use/ re-use and any information about potential restrictions. Supporting information is often used to provide study-level documentation (see for example documentation provided for the "Health Survey for England, 2010" at the UK Data Service).

Supporting information files are also especially suited for qualitative data, where they can assist with the anonymisation process. A particular example is a data list, which summarises items within a data collection, assigns a unique identifier and provides biographical characteristics or main features of the item. For interviews, this could be

  • Interview ID,
  • age,
  • gender,
  • occupation,
  • location,
  • interview place,
  • interview date,
  • transcript file name,
  • recording file names.

Embedded documentation

Embedded documentation is most suitable for data-level documentation, that can often be found for quantitative (tabular) data, but it can also be represented by separate file (e.g. README text file) inside a file archive containing a dataset.

Examples of embedded metadata include

  • headers/ variable names,
  • units,
  • field labels,
  • value codes,
  • reference to external classification schemes,
  • instructions how derived variables have been created.

Embedded metadata for qualitative data may be headers in interview transcripts. Some metadata can also be embedded as document properties of a file (in Windows) and structured more extensive metadata may be created using formats such as XML (see the UK Data Archive for an example extract from a UK Data Service DDI catalogue record in XML format) 


Metadata

Metadata (i.e. data about data) is the standardised, structured and searchable information about a dataset or data file. It can describe the content of a data file, its context, origin and purpose, geographical and temporal settings. An example of such structured information about a UK Data Service record can be found on the UK Data Service website.

Although metadata should always provide a similar set of information, there is no one-size-fits-all template for the creation of metadata. Instead, metadata is typically created using a controlled vocabulary defined by metadata standards that exist for different disciplines, types of data and types of repositories.


The UK Data Service guidance on documenting data and the Digital Curation Center provide additional guidance and examples on the creation of metadata on study, dataset and data level, including data-level metadata for quantitative and qualitative datadiscipline specific metadatause cases from existing data repositories and tools which help to collect metadata in a standardised way.


Please don't hesitate to contact the RDM team for further information and advice.