What are research data?
Types of research data
The UKRI Concordat on Open Research Data defines them as: Evidence that underpins the answer to the research question.
- These might be quantitative information or qualitative statements collected by researchers in the course of their work by experimentation, observation, modelling, interview or other methods such as data extraction from existing evidence.
- Data may be raw/ primary (e.g. directly from measurement or collection), derived from primary data (e.g. cleaned up or as an extract from a larger dataset), or derived from existing sources where the rights may be held by others.
- Data may be defined as "relational" or "functional" components of research, which signals whether and how researchers use them as evidence for claims.
There can be different implications for working with and preserving different types of research data:
Observational: Data captured in real-time, for example neuro-images, sample data, sensor data, survey or interview data. It is usually irreplaceable and hard or impossible to re-create.
Experimental: Data captured from laboratory equipment by the researcher or a service used by them, for example gene sequences, chromatograms, chemical toroid magnetic field data. The data is often reproducible but reproduction could be costly.
Simulation: Data generated from test models. For example climate, mathematical or economic models. Datasets used here are usually very large but model code in itself might be sufficient to re-capitulate results.
Derived or compiled: For example, text and data mining, 3D models, compiled databases. Data is reproducible but reproduction could be costly.
Reference or secondary: A (static or organic) conglomeration or collection of smaller (peer-reviewed) datasets, most probably published and curated elsewhere for re-use. For example, gene sequence databases, chemical structures, or spatial data portals.
Some examples of research data
Research data can be seen as the collection of digital objects acquired and generated during the process of research. They can have many forms and names and may include:
- Data files
- Database contents (video, audio, text, images)
- Laboratory notebooks
- Field Notebooks
- Standard operating procedures, protocols and workflows
- Test responses
- Contents of an application
- Log files for analysis software
- Simulation software
What are datasets?
The term ‘dataset’ is used throughout this guide to mean a logically complete set of data with common or related elements. Sometimes datasets may be called ‘data product’ or ‘data package’.
- Datasets are composed of a group of data files along with the documentation files which explain their production or use (codebook, technical or methodology report, data dictionary, etc.), as well as information about the structure of the data itself.
- A good quality data collection can be further enhanced by the inclusion of contextual information such as information about the study, observation or investigation.
- Datasets are typically centred on an event or study.
Types of datasets may include:
- a spreadsheet of numerical or encoded data,
- a collection of interview transcripts, field notes, audio recordings, readings or photographs resulting from a research project,
- a database containing survey data, numeric data files, input data and script or code used to model scenarios,
- a database of population or economic data,
- sets of images analysed during a research project,
- a collection of sequence and structure data.