Data citation and access statements
In this section:
Data citation is important because it:
- acknowledges the author's sources,
- makes identifying data easier,
- promotes the reproduction of research results,
- makes it easier to find data,
- allows the impact of data to be tracked,
- provides a structure which recognises and rewards data creators.
It is good practice to cite any existing datasets you use. If you use data from a repository that has been released under an open license, you are obliged to cite it (even under a CC0 license) and should do so with a full citation as described at How to cite data. It is also good practice to cite datasets published in a data journal with a full citation.
By citing the data paper, you also reward the author for sharing their data, as these citations can be tracked in the same way as for any scholarly paper. You should therefore include a reference to the data paper describing the data, followed by a reference to the data in the repository itself. In order for this to work, it is essential that the citations are in the references section of the article and include the DOI (or any other identifier the repository might use).
How to cite data
Although there might be differing requirements for data citation across disciplines, a general recommendation for a data citation is:
Creator(s), Publication year, Title. Publisher. Identifier
It may also be useful to include optional properties, version and resource type:
Creator (Publication year). Title. Version. Publisher. Resource type. Identifier
Similar to publications, datasets should be cited using persistent identifiers such as digital object identifiers (DOIs), which, unlike standard web links, allow permanent linkage to the digital object (dataset). DOIs are coupled with metadata, which can be modified over time to keep track of the locations and characteristics of the objects they identify. This makes it easier to keep track of the objects and information about them in the long term and to easily automate processes such as sharing this information with other electronic systems, e.g. databases, websites etc.
It is recommended that DOIs are displayed as linkable URLs in the format 'https://doi.org/10...'. Below are typical examples of data citations:
Bloggs, J; Brander, S. (2014); Chemical and mineral compositions of sediments. Version SA9992c. UK Data Archive [Computer file]. https://doi.org/10.5255/UKDA-SN-6614-2
Wahl, P. et al. Discovery of a strain-stabilized smectic electronic order in LiFeAs (dataset). University of St Andrews Research Portal. https://doi.org/10.17630/c47ff360-09d4-4620-9ebf-bbb8792fb808 (2018)
Please note that journals might also have specific requirements regarding the citation of secondary data sources as well as datasets underpinning the publication. Authors should therefore also always check journals' citation guidelines.
Citing data underpinning a publication
If you are making data underpinning your publication or other digital outputs which have been created as a result of your study available, you should also include a data accessibility statement in the publication. Many funders' research data policies now mandate the use of a data accessibility statement and many journals now include sections for these in article templates.
We recommend including a data accessibility statement either in a section provided in the article template from your journal or as a sentence within the acknowledgements section and also include the citation in the list of references as described above, e.g.:
“The research data [supporting/ underpinning] this publication can be accessed at [DOI] [reference number]”
"The research data supporting this publication can be accessed at http://dx.doi.org/10.17630/c47ff360-09d4-4620-9ebf-bbb8792fb808 .
 Wahl, P. et al. Discovery of a strain-stabilized smectic electronic order in LiFeAs (dataset). University of St Andrews Research Portal. https://doi.org/10.17630/c47ff360-09d4-4620-9ebf-bbb8792fb808 (2018)."
Please note that journals might have specific guidelines regarding the formatting and wording of data access statements and where they should be included. Authors should check these guidelines to ensure that their datasets are cited correctly.
Data access statements for different scenarios
The above examples are well suited for cases where data has been deposited in our institutional repository, Pure, or any other external repository or archive. However, statements that describe how underpinning data, software or other digital outputs can be accessed (or why not) should also be included in other scenarios.
Some use cases and examples:
- Data is provided as supplementary files on the journal website: "All data supporting this study is provided as supplementary information accompanying this paper."
- Data has been submitted to a discipline-specific archive (without a DOI): "All reads are publicly available and have been submitted to the European Nucleotide Archive with the ENA accession number PRJEB28455 at https://www.ebi.ac.uk/ena/data/view/PRJEB28455"
- More than one digital output exists: "Gene expression data are publicly available under GenBank accession number [ACC NUMBER] at [URL]. Computational code for data processing is available at the University of St Andrews Research Portal at [DOI]."
- Secondary data was used or re-analysed: "This study is based on the re-analysis of existing data, which is openly available and cited in the manuscript. Additional documentation on data processing and analysis are available at the University of St Andrews Research Portal at [DOI]."
- Sensitive data: "The raw data underpinning this study cannot be made publicly available due to ethical concerns. An anonymised dataset and further information about the data and conditions for access are available at the University of St Andrews Research Portal [DOI]."
- Commercial/ Intellectual Property constraints:
- Temporary embargo: "The research data underpinning this publication will be available from the University of St Andrews Research Portal at [DOI] following a [X] months embargo to allow for commercialisation of the results."
- Subject to non-disclosure agreements or approval: "The research data underpinning this publication can only be made available subject to [a non-disclosure agreement/ approval by a data access committee]. Information about the data and conditions for access are available at the University of St Andrews Research Portal [DOI]."
- No new data has been produced (e.g. theoretical work, review articles): "No new data were created during the study."
Further examples of data access statements are included in guidance developed by the University of Bath.
Please don't hesitate to contact the RDM team for further information and advice.