Secondary data
Research involving humans - response to COVID-19
15 January 2021
Due to the current circumstances, researchers must consider using online or remote methods where at all possible.
Any in-person face-to-face research or research involving travel must be permissible, safe and ethical.
Due to the current circumstances and restrictions in Scotland and the rest of the UK, research involving in-person contact with human participants is only permissible in very limited situations. For more information see the University research and coronavirus page.
Researchers must check the University coronavirus information pages and travel and fieldwork guidance frequently and before commencing any activity to ensure they are complying with current requirements.
For more information on the ethical review process at this time, see the interim guidance for research involving humans.
Using secondary data can be a good alternative to collecting data directly from participants (primary data), removing the need for face-to-face contact.
Secondary data relating to living human subjects often requires ethical approval depending on the source and nature of the data. The extent to which the ethical review application form must be completed also depends on the source and nature of the data.
This guidance covers some of the ethical issues relating to use of secondary data and how this impacts the ethical application process.
-
Ethical review and approval is not required for secondary data that:
- does not relate to living human subjects
- relates to deceased human subjects and which:
- does not contain any sensitive information about living human subjects
- does not contain health or census information from the last 100 years
- is completely and robustly anonymised
If research involves any of the above, but there are additional ethical issues then an ethical review application may be required. Researchers should discuss this with their School ethics committee (SEC).
If research involves any of the above, but the data source requires assurances regarding data management or an ethical review, an ethical review application or data management plan may be required. Researchers should discuss with their SEC or Research Data Management.
-
Secondary data – internal datasets
Secondary datasets may sometimes be sourced from the within the University i.e. data collected as part of previous projects within a School. It is important to consider whether re-use of this data is in line with the original ethical approval and the consent given by participants. An ethical amendment may be required for both the original ethical approval to allow the data to be shared AND a new ethical review application for the new research project (if sufficiently different).
Internally sourced data should still be acknowledged and appropriately referenced, and the same considerations given as to other secondary data sources such as around access and permissions, data management and confidentiality. Researchers should also consider whether using this type of secondary data is appropriate for their needs (i.e. whether it meets the requirements for an academic research project).
Secondary data - large quantitative datasets
A commonly used source of secondary data are large quantitative data sets such as census data, health data, household surveys and market research.
There are several sources that can give access to these types of data and what is required to access them varies by source and by the nature of the data, for example:
- ‘open’ datasets where the data is freely available to download
- ‘closed’ datasets where users must register with the data source but that require minimal additional work
- datasets that contain more sensitive information and where users may have to complete paperwork such as a data management plan.
Sometimes more sensitive datasets can only be accessed via a secure web portal and no local copies retained.
Secondary data - qualitative and mixed-methods data
Secondary qualitative data is less common, largely due to the difficulty in anonymising qualitative data. However, there are sources of secondary qualitative data including the UK Data Service and library data such as oral histories, diaries and biographies.
Secondary data - biological data
There are several resources for access to biological data including human-related data. Use of biological data and bioinformatics is a wide are with several ethical concerns around confidentiality, implications of research into DNA and genomics, bias and profiling, the sensitivity of identifying risk levels related to disease. Researchers planning research involving biological data or bioinformatics should consult with disciplinary guidelines and organisations and colleagues with specific expertise. If using secondary data of this type, researchers must ensure they do so in accordance with the requirements of the data sources. Researchers should also ensure that they check if any NHS ethical approval, governance or R&D approvals are required.
-
Access, permissions and consent
Access to secondary data must always be used in accordance with the requirements of the data source, GDPR and the common law duty of confidentiality. Secondary data must always be appropriately referenced and acknowledged. Researchers should always act in accordance with the Principles of Good Research Conduct, even when working with secondary data.
Researchers should check whether their use is in line with the consent originally obtained from participants and seek assurances on this from the data source.
Where data is obtained in anonymous form, researchers should be conscious of the risk of de-anonymising data through triangulation of several data points or sets.
While there are open access datasets that are freely available, it is common that there are conditions and requirements put in place by the data source or controller around who can access the data and how it is used. For example, this might include:
- that researchers sign terms of use
- that researchers have a comprehensive data management plan
- that researchers can provide assurances around the security of the data once in their possession
- verification that the person accessing the data has a legitimate reason i.e. evidence that you are a researcher at a recognised institution
- that the data be accessed via a secure portal
- that no local copies are retained
- that any copies of the data be destroyed within a certain timescale (may require a destruction certificate)
- that the raw data be processed by the data source into an anonymised form before it is released
In the latter examples, where there is more complex requirements and the data source is providing a service such as preparing and moderating access, this may incur costs that would need to be factored into researchers plans and budgets.
-
Ethical issues to consider
The ethical application form includes an early filter question on use of secondary sources. This means that if researchers are using secondary data with no additional ethical issues they can skip to the end of the form – the declarations section. If, however, there are ethical issues, researchers should describe these and how they will be mitigated in the ‘Ethical Considerations’ free text field later in the form.
If data are particularly sensitive, or it is required by the data source, researchers may wish to complete the Data Management section of the ethical review application form (Word) or a separate data management plan.
When making an application for ethical approval of research using secondary data, researchers should consider:
- Is the proposed research in line with the participants original consent? Can the data source provide assurances on participants original consent?
- How will the data be managed? If there is identifiable, personal or sensitive data how will confidentiality be maintained and data kept secure?
- Will the proposed research and use, management and storage of the data meet with the data sources requirements? Have all the appropriate documents been completed and permissions granted?
- Will the data source be acknowledged and referenced?
- Are there any copyright issues around the data?
- By pulling together several data sources is there any risk of de-anonymising participants?
- Will using this data or combining it with other data risk bias or ‘profiling’ of a particular group?
- How will you present the data or analysis? Will this ensure the confidentiality and anonymity of participants?
- Will the data identify individuals as being at risk of a condition or disease where they may have otherwise been unaware?
-
Data sources
The UK Data Service – this is one of the core UK sources of secondary data, including government data such as the Household Survey, plus an increasing amount of qualitative data and data collected as part of research funded by UK research councils https://www.ukdataservice.ac.uk/
The Office of National Statistics – this is the UK’s recognised national statistics institute and conducts the census in England and Wales amongst other large national and regional surveys https://www.ons.gov.uk/
The Scottish Governments statistics publications – this includes often aggregated statistics reporting regional level (rather than individual level) data, though some more detailed datasets are available for older data https://www.gov.scot/publications/?publicationTypes=statistics&page=1
NHS Digital data and statistics publications – this includes details about clinical indicators, health and social data, though again this is often aggregated and at a regional level rather than individual level data https://digital.nhs.uk/data-and-information/data-collections-and-data-sets/data-sets
Information Services Division (ISD) Scotland – this includes Scottish health and social dare data, often aggregated and at a regional level https://www.isdscotland.org/
Data.gov.uk – a new resource for ‘open’ UK government data https://data.gov.uk/
British Library – the British Library hold a number of collections including oral histories, biographies and newspaper articles. https://www.bl.uk/collection-guides/oral-history#
Qualitative Data Repository – a qualitative data repository hosted by Syracuse University https://qdr.syr.edu/
European Molecular Biology Laboratory – European Bioinformatics Institute (EMBL-EBI) https://www.ebi.ac.uk/Health Informatics Centre (HIC) – local health informatics service linking health data https://www.dundee.ac.uk/hic/
Open access data directories
OpenAire.eu – A searchable directory of open access datasets such as those accompanying publications https://explore.openaire.eu/
JISC Directory of Open Access Repositories (OpenDOAR) – a searchable directory of open access repositories http://v2.sherpa.ac.uk/opendoar/
-
Association of internet researchers – ethics guidance http://aoir.org/ethics/
The European Commission (2018) – Use of previously collected data (‘secondary use’). Ethics and Data Protection, VII, 12-14 https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/ethics/h2020_hi_ethics-data-protection_en.pdf
Irwin, S. (2013). Qualitative secondary data analysis: Ethics, epistemology and context. Progress in development studies, 13(4), 295-306. https://doi.org/10.1177/1464993413490479
Morrow, Virginia and Boddy, Janet and Lamb, Rowena (2014) The ethics of secondary data analysis. NCRM Working Paper. NOVELLA. http://eprints.ncrm.ac.uk/3301/
Rodriquez, L. (2018) Secondary data analysis with young people. Some ethical and methodological considerations from practice. Children’s Research Digest Volume 4, Issue 3. The Childrens Research Network. https://childrensresearchnetwork.org/knowledge/resources/secondary-data-analysis-with-young-people
Salerno, J., Knoppers, B. M., Lee, L. M., Hlaing, W. M., & Goodman, K. W. (2017). Ethics, big data and computing in epidemiology and public health. Annals of epidemiology, 27(5), 297-301. https://doi.org/10.1016/j.annepidem.2017.05.002
UK Data Service guidance on secondary analysis -https://www.ukdataservice.ac.uk/use-data/secondary-analysis.aspx