Skip navigation to content

Describing and documenting your data

Describing your research data and associated data documentation provides context for your data, tracks its provenance, and makes it easier to find and use in the long term, or for others to discover on the web.

The importance of documenting your data during the collection and analysis phase of your research cannot be underestimated, if your research is going to be part of your scholarly record. It is also increasingly required by funders as part of their policies to make the data open to the wider community.

Describing and documenting your research data ensures it can:

  • be discovered
    • by the right people
  • be preserved
  • be verified (evidence of logical processes and methods
  • be reused in the short, medium, and long-term
  • be accessed in the long term
  • save time on finding the data
  • help researchers understand the data

Documentation

Data documentation explains how data were created or digitised, what data mean, what their content and structure are, and any manipulations that may have taken place. It ensures that data can be understood during research projects, that researchers continue to understand data in the longer term and that re-users of data are able to interpret the data. Good documentation is also vital for successful data preservation.

Good documentation for research data contains both study-level information about the research and data creation, as well as descriptions and annotations at the variable, data item or data file level. It includes all the contextual information needed to help a future user interpret it properly, for example:

• information about when, why, and by whom the data was created
• what methods were used
• an explanation of acronyms, coding, or jargon.

It is good practice to begin documenting your data at the start of your research project and to continue to add information as the project progresses. You should also include procedures for documentation in your data planning activities.


About metadata

Sharing your data is impossible without describing your data with sufficient metadata, "data about data". Metadata are a subset of core data documentation and are records of the essential information about data that you have created or used during your research.

When writing metadata, it is useful to think of what another researcher would need to locate your data and briefly assess whether it would be useful for their research without having to look through all of the data itself.

Research metadata is a significant and developing area with specific descriptive standards such as the three below used to enable sharing, access, interpretation and reuse of research data; with Dublin Core being the best-known and most widely-used:

Research funder requirements generally now require researchers to create and make metadata openly available, notably to describe complex datasets, and thus facilitate access and reuse.


What should I include in my metadata?

A good question to ask when creating metadata is “What information would I need to understand and use this data in twenty years?”

Potentially useful information includes:

  • Title
  • Date
  • Subject descriptors
  • Creator(s) [creator of the dataset and main researchers involved]
  • Funder(s)
  • File format
  • File name/path
  • Storage location of the data/URL
  • Subject
  • Rights
  • Access information
  • Keywords
  • Time references for the data (key dates associated with the data: start, end, release, etc.)
  • Geospatial information

Pure, our research information system, supports most of these elements in its current version. More information on recording datasets in Pure is available here.


When should I create metadata?

Best practice is to decide on a format or template for your metadata at the very beginning of your project, and continue to add information as it progresses. It is easier to capture metadata at the point of collection or creation, rather than trying to remember things at a later date.

This should provide you with a full metadata catalogue at the end of your project that contains information about all of the data created and would allow you to locate specific pieces of data easily. This will benefit you as well as others as it should prevent data replication over the course of long-term projects. It will also prove useful when submitting your data to a data repository.


Are there subject-specific metadata standards?

A detailed list of discipline-specific metadata has been compiled by the Digital Curation Centre (DCC). The listing comprises metadata standards, extensions and tools. Choosing a metadata standard will also be influenced by specific requirements from funders and individual discipline conventions.

If your field doesn't have an established metadata standard, or if you just need a simpler system to keep track of data internally, consider that there are three main types of metadata addressed by most standards:

descriptive describes the data for identification and discovery
structural how datasets are related or put together
administrative creation date, file type, rights management, etc.

Describing your data adequately also helps you cite it; more information on citing your data can be found here.


Can I automate the creation of metadata?

We are aware that collecting metadata can often be labour-intensive and one of our main aims is to encourage the creators of the data to deposit it by making the process simple and straightforward. We will look to automate the process of metadata creation as much as possible across the University; this is a developing field, and the research community is quickly trying to establish the extent to which metadata creation can be automated as the urgency rises. We would particularly be interested to hear of any current practices that already exist for automating metadata creation within Schools or Research Groups (please email research-data@st-andrews.ac.uk).

However, some software programs create metadata automatically, while others allow the metadata to be edited such as Microsoft Office programs. These examples are from Microsoft Word; other Microsoft Office programs have similar fields.

User-editable fields in Microsoft Word
Title Company Edit time
Tags Manager Templates
Comments Manager Last modified (time and date)
Status Author Created (time and date)
Categories File size Last printed (time and date)
Subject Pages Last modified by
Hyperlink base Words  

More Information

Contact

Research Data team

University of St Andrews Library
North Street
St Andrews
Fife
KY16 9TR
Scotland, United Kingdom

Tel: (01334) 462343