Citing data

Data are a vital part of the research process and proper citation should be a significant feature of research publications.

Why cite data?

Data citation:

  • acknowledges the author's sources
  • makes identifying data easier
  • promotes the reproduction of research results
  • makes it easier to find data
  • allows the impact of data to be tracked
  • provides a structure which recognises and can reward data creators.

How do I cite datasets?

Citing data using persistent identifiers is an area that is rapidly growing in importance, especially given the increased need to track citation as a way to understand impact.

Many journals and style manuals already use Digital Object Identifiers (DOIs), and there is a need to extend this to include standard identification of research data.

Although there will be differing requirements for data citation across disciplines, a general recommendation for a data citation is as follows:

  • Creator(s), PublicationYear, Title. Publisher. Identifier

It may also be useful to include optional properties, Version and ResourceType, in which case the data citation would generally take the form:

  • Creator (PublicationYear). Title. Version. Publisher. ResourceType. Identifier

For citation purposes, it is recommended that DOI names are displayed as linkable, permanent URLs.

When the identifier is persistent, a data producer or researcher can remain confident that a citation will always lead to the original information about the object, even if the location of that object changes.

Below is a typical example of a data citation:

Bloggs, J; Brander, S. (2014); Chemical and mineral compositions of sediments. Version SA9992c. UK Data Archive [Computer file].

It is highly recommended that you cite any existing datasets you use, if they are to be regarded as legitimate academic outputs in their own right.

If you use data from a repository that has been released under an open license, you are obliged to cite it (even under a CC0 license). By citing the data paper you also reward the author for sharing their data, as these citations can be tracked as for any scholarly paper. You should therefore include a reference to the data paper describing the data, followed by a reference to the data in the repository itself. In order for this to work it is essential that the citations are in the references section of the article and include the DOIs (or any other identifier the repository might use).

Where can I find data?‌

There are a growing number of digital repositories, with varying content types (e.g. articles, data sets, images, etc.) and disciplinary foci. Most of these can be found in one of the following registries:


