Skip navigation to content

What are research data?

Research data are not limited to those forms above and may include:

  • Documents (text and word)
  • Spreadsheets
  • Laboratory notebooks
  • Field Notebooks
  • Diaries
  • Questionnaires
  • Transcripts
  • Codebooks
  • Audiotapes
  • Videotapes
  • Photographs
  • Films
  • Test responses
  • Slides
  • Artefacts
  • Specimens
  • Samples
  • Collection of digital objects acquired and generated during the process of research
  • Data files
  • Database contents (video, audio, text, images)
  • Models
  • Algorithms
  • Scripts
  • Contents of an application
    • Input
    • Output
    • Logfiles for analysis software
    • Simulation software
    • Schemas
  • Methodologies and workflows
  • Standard operating procedures and protocols

What are datasets?

A dataset is a defined collection - or logically complete set - of data with common, or related, elements.

The term ‘dataset’ is used throughout this guide to mean a logically complete set of data; some systems or services prefer the terms ‘data product’ or ‘data package’. - See more at:

Datasets are:

  • a group of data files--usually numeric or encoded--along with the documentation files (such as a codebook, technical or methodology report, data dictionary, etc.) which explain their production or use. Generally a dataset is unusable for sound analysis by a second party unless it is well documented. A good quality data collection can be enhanced by the inclusion of contextual information. This might include, for example, information about the study, observation or investigation. It may also include information about the structure of the data itself.
  • Typically centred on an event or study.

Types of dataset may include:

  • a spreadsheet of numerical data
  • a collection of interview transcripts, field notes, audio recordings, readings or photographs resulting from a research project
  • a database containing survey data, numeric data files, input data and script used to model scenarios.

Specific examples of datasets are:

  • Sequence and structure databases
  • Image sets from satellite observations
  • Atmospheric temperature records
  • Population and species inventories
  • World Economic Outlook, 1980-2018

Classification of research data

Research data are generated for different purposes and through different processes:

Observational: Data captured in real-time, usually irreplaceable. For example, neuroimages, sample data, sensor data, survey data.
Experimental: Data captured from laboratory equipment. The data is often reproducible but reproduction would be costly. For example, gene sequences, chromatograms, chemical toroid magnetic field data.
Simulation: Data generated from test models. For example climate, mathematical or economic models.
Derived or compiled: Data is reproducible but reproduction would be costly. For example, text and data mining, 3D models, compiled databases.
Reference or canonical: A (static or organic) conglomeration or collection of smaller (peer-reviewed) datasets, most probably published and curated. For example, gene sequence databanks, chemical structures, or spatial data portals.

More Information

* Although "data" is plural, you may see "data is" used interchangeably on associated sites.


Research Data team

University of St Andrews Library
North Street
St Andrews
KY16 9TR
Scotland, United Kingdom

Tel: (01334) 462343