Skip navigation to content

Organising your data

Once you create, gather, or start analysing data and files, they can quickly become disorganised. It can be time-consuming to find files and difficult to distinguish between different versions of data.

Therefore, you should decide how you will name and structure files and folders consistently. If you work with others on data, devise a system in conjunction with colleagues. Create supporting documentation and metadata as you work, to avoid having to produce this at the end of a project. Supporting documentation and metadata make data reuse possible by yourself and others in the future by providing context.

Here are a few tips for organising your data files:

1) File names

Well-organised file names and folder structures make it easier to find and keep track of data files. Decide on a file naming convention at the start of your project. Useful file names:

  • are consistent - within a research group, it's recommended that you agree on file naming conventions early on in the project.
  • are concise but informative.
  • classify broad file types.
  • are meaningful to you and your colleagues.
  • allow you to find the file easily.
  • do not contain special characters or spaces.
  • should not conflict when moved from one location to another.
  • should outlast the file creator who originally named the file.
  • take account of how scalable your file naming policy needs to be; e.g. if you want to include the project number, don't limit your project number to two digits, or you can only have ninety nine projects.

It is useful if your school/project agrees on the following elements of a file name:

  • Version number (see Version control below)
  • Vocabulary: choose a standard vocabulary for file names, so that everyone uses a common language.
  • Identification: project acronyms, researcher's initials, name of research team, etc.
  • Punctuation: decide on conventions for if and when to use punctuation symbols, capitals, hyphens and spaces.
  • Dates: agree on a logical use of dates so that they display chronologically i.e. YYYY-MM-DD.
    • Date of creation
    • Publication date
  • Order: confirm which element should go first, so that files on the same theme are listed together and can therefore be found easily.
  • Numbers: specify the amount of digits that will be used in numbering so that files are listed numerically e.g. 01, 002, etc.

While computers add basic properties to a file (such as the date it was uploaded), this is not reliable data management; it is better to represent such essential information through the folder structure, or in the file name itself.

Renaming large numbers of files

There are occasions when you may want to rename large number of files at once. This is often the case with digital images such as photographs, whose default file names are simply numbers. The easiest way is to do this is by using batch renaming, where one command renames all the files in a sequence. An example of such a tool is the Bulk Rename Utility (download required).


2) File structures

 This is essential for accessibility and makes it easier to find and track of data files.

  • Develop a system that works for your project.
  • Use file names to classify broad types of files.
  • Create meaningful but brief file names (“Abdn_ID437_Interview01_2014-12-22” is clearer than “Year01” or “Autumn14”).
  • Capitalize each word to differentiate it.
  • Avoid using special characters in a file name (\ / : * ? “ < > | [ ] & $).
  • Underscore spaces or use hyphens instead of periods or spaces.
  • Capture place, time, and theme: extremely useful, even if done in a highly abbreviated manner.
  • Reverse dates so they sort usefully YYYYMMDD e.g. filename_2014-12-22.
  • Capture document version control (v01, v02, v03 instead of filenaming_latest).
  • Be consistent. It is only effective if everyone in the group follows the rules consistently.
  • Take time occasionally to reassess your folder or tag structure: perhaps moving old, unused items to a folder called ‘Archive’ or something similar so they don’t clutter up the screen.
  • Most operating systems default to a hierarchical file structure – files inside folders, which may be nested inside other folders. This is really useful if your material can easily be grouped into relatively discrete categories.
  • In planning a hierarchical folder structure, aim for a balance between breadth and depth – so no one category gets too big, but also so that you don’t have to click through endless folders to find a file.
  • It may be more helpful to use a tag-based system – where each file is assigned one or more tags, or labels. This makes it easier to have overlapping categories, and files can be categorised in multiple ways simultaneously (by subject, by author, and by the project it relates to, for example). Some modern operating systems will allow you to add tags to files; file tagging software is also available.

3) Versioning

It is important to identify and distinguish versions of research data files consistently. This ensures that a clear audit trail exists for tracking the development of a data file and identifying earlier versions when needed. Thus you will need to establish a method that makes sense to you that will indicate the version of your data files.

  • A common form for expressing data file versions is to use ordinal numbers (1,2,3 etc.) for major version changes and the decimal for minor changes e.g v1, v1.1, v2.6.
  • Beware of using confusing labels: revision, final, final2, definitive_copy as you may find that these accumulate.
  • Record every change irrespective of how minor that change may be.
  • Discard or delete obsolete versions (whilst retaining the original 'raw' copy).
  • Use an auto-backup facility (if available) rather than saving or archiving multiple versions.
  • Turn on versioning or tracking in collaborative documents or storage utilities such as Wikis, GoogleDocs, etc.
  • Consider using version control software e.g. Subversion, TortoiseSVN.

Some structured examples of maintaining version control [document name] [version number] [status: draft/final]:

  • Smith_interview_July2010_V1_DRAFT
  • Lipid-analysis-rate-V2_definitive
  • 2001_01_28_ILB_CS3_V6_AB_edited
  • 2014_08_23_Image3 or Image3_1.3

You can also use a version control table or file history alongside the data file to provide more details of changes to the file; e.g.:

File Name Changes to file
Image3_1.0 Minor revisions made
Image3_1.1 Minor revisions made
Image3_1.2 Further minor revisions
Image3_2.0 Substantive changes

Version control strategies for collaborative work

If you are working collaboratively with colleagues on data, decide on a strategy that suits you all to ensure nobody gets confused about which version is which.

You could:

  • Use software to control the rights to who can edit files e.g. decide who has “read only” permission to the file and who has “write” permissions e.g. MS Word.
  • Use versioning or file sharing software.
  • Merge multiple entries or edits manually i.e. one person is responsible for the final version.

4) Managing references

  • Reference management software can be used to store details of all the articles, books, and other sources you make use of in your research, and to automatically generate citations in written work.
  • You can also use reference management software to store copies of articles (usually as PDFs), and to record your own notes. Some software packages offer additional features, such as the ability to annotate PDFs.
  • Popular reference managers include:

Contact

Research Data team

University of St Andrews Library
North Street
St Andrews
Fife
KY16 9TR
Scotland, United Kingdom

Tel: (01334) 462343