Skip navigation to content

Organising data

In this section:

  1. File naming
  2. Versioning
  3. Organising files on disk

File naming

Structured and consistent file naming is another pillar of accessible and re-usable data. File naming conventions should be established within Schools or research groups and adhered to at any stage.

File names should

  • be meaningful for current and future users
    • order file name elements logically and consistently
  • be concise and scalable
    • Use capital letters to delimit words
    • Omit articles (a, the) and conjunctions (and, or, but, etc)
    • Use two-digit numbers, e.g. 'v01' not 'v1'
    • Write dates back to front, in the format 'YYYYMMDD'
    • avoid repetition, e.g. don't repeat folder names in file names
  • avoid using spaces, dots and special characters (& or ? or !). Certain characters are not allowed by all operating systems and this might differ between operating systems (see examples for Windows),
    • use underscores (_) to separate elements in a file name where needed,
  • classify the type of file (don't purely rely on the file metadata provided the operating system),
    • include date and description, especially for recurring events/ files
  • include versioning

Please note that special file naming conventions exist for files that will be uploaded to any part of the University's website, including group and school web pages. Please refer to the University's digital standards service manual for further information.


Versioning

It is important to identify and distinguish versions of data files consistently. This ensures that a clear audit trail exists for the development of a data file and the content presented within it and that earlier versions can be identified when needed.

Approaches to expressing file versions

A common approach to expressing data file versions is to use ordinal numbers (v01,v02,v03 etc.) for major version changes and a second level for minor changes (v01, v01_1, v02_6, etc.). This can be combined with other elements such as a date or descriptive label to obtain a structured file name of the form: 

[description/date]_[version number]_[status: draft/final/editor], for example:

  • SmithInterview_20071007_v01Draft.docx
  • 20010128_ILB_CS3_v06_AB.docx
  • Image3_1.png
  • 20140823_Image3.png

In addition, a version control table or notes within a file can be used to record versions, dates, authors and details of changes to a file.

Good practice for version control in file names

  • Avoid confusing, accumulating labels, e.g. revision, final, final2, definitive_copy.
  • Record every change irrespective of how minor that change may be.
  • Discard or delete obsolete versions whilst retaining the original 'raw' copy.
  • Turn on versioning or tracking in collaborative documents and storage utilities such as MS Office applications, Wikis, Google Docs, etc.
    • create a new version when appropriate by accepting all changes and incrementing the version number
  • Consider using version control software e.g. Git, which is covered by regular Software Carpentry workshops available through PDMS. See Managing code for more information.

If you are working collaboratively on data, agree on a common versioning strategy. You could:

  • Use software to control the file-editing rights.
  • Use versioning or file sharing software.
  • Agree for one team member to merge multiple entries or edits manually, e.g. using the 'compare' functionality in Microsoft Word.

Please note that special file naming conventions exist for files that will be uploaded to any part of the University's website, including group and school web pages. Please refer to the University's digital standards service manual for further information.


Organising files on disk

  • Files should be organised in a logical and easy-to-access folder structure that reflects the hierarchy between files most appropriate for the project.
  • Keep file and folder names concise and avoid repetitions.
  • Consider that some operating systems impose path-length limitations
    • A 260 character limit exists in Windows, which also affects files and folders synced from Cloud services such as OneDrive.
    • Even for operating systems that do not impose path length limits (e.g. Unix), path length should be kept as short as possible while being meaningful. This avoids problems when directories are transferred or synced to other equipment for analysis or shared with collaborators who might use a different operating system.

Additional advice on file formats, organisation, versioning as well as quality control for data and its application to qualitative data (transcription) and digitisation is available from the UK Data Service and the Records Management team at the University of Edinburgh.

Please don't hesitate to contact the RDM team for further information and advice.