Skip Navigation

Is your data safe?

Is your data safe? was one of the themes of the recent exhibition in the Library to mark this year's 40th anniversary of the computer service within the University. In the exhibition, this topic was limited to displaying storage media ranging from the early days of computing within the University to the present, most of which - although they were in use not very many years ago - can no longer be read on modern computers.

miscellaneous cards, tape, disks etc of various ages: floppy disks, tapes/cartridges, CD, paper tape, memory stick, punched cards

Closely related to storage media are other technical issues such as the hardware and software that is used to produce computer files. Many of us will have encountered difficulties trying to open files that were created on an older computer in a previous version of the software we are using today. Some of us may have found that the software package that was originally used has become obsolete and incompatible with today's computers and that, although files exist on a storage medium that can still be read, it is impossible to open them.

In addition to technical issues the safety of data will be jeopardised if is improperly managed, through bad planning, or shortage of resources, or lack of awareness or knowledge as to what to do. Management issues also include arrangements to maintain data after its creation and the proper provision of metadata.

In this and in the following two issues of the LIS Newsletter we will elaborate on the wide areas mentioned above. We will publish a number of case studies of projects around the UK whose data was, owing to various factors, either lost or difficult and expensive to restore. We include these examples in our Newsletter because the problems they describe are generic so that we can learn from them and draw conclusions for our own work. These case studies are reproduced by kind permission of the Digital Preservation Coalition (http://www.dpconline.org).

This month's case studies provide examples of:

1 Newham Museum Archaeological Service

Newham Museum Archaeological Service was closed down in 1998. Its digital archive was passed to the Archaeology Data Service (ADS) by the London Borough of Redbridge. The archive represents some 10 years of fieldwork and incorporates the work of other units that had previously been closed including those associated with the Passmore Edwards Museum and the Manor Valley Museum. The archive as delivered consists of about 230 floppy disks containing over 6000 files totalling over 130 MB of data. The files were in a variety of proprietary software formats and versions, some of which are now 'archaic'.

This case serves as an example of organisational failure leading to loss. It should be noted that, strictly speaking, none of the data has actually been lost. Indeed, the first priority for action by the ADS was to migrate all files to the ADS file server where they are included in the general backup strategy and are hence safely preserved. However, in terms of access and use, parts of the files have become inaccessible. This might better be described as 'information loss', rather than 'data loss.' Some, if not all, of the inaccessible information could be recovered but there are no resources to set about doing this. Those parts of the data which were recovered successfully are online at:

http://ads.ahds.ac.uk/catalogue/projArch/newham/newham_intro.cfm

For published discussions of this case, see: Austin T, Robinson DJ & Westcott KA (2001) "A digital future for our Excavated Past" in Z Stancic and T Veljanovki (eds) Computing Archaeology for Understanding the Past: CAA 2000, BAR International Series 931, ArcheoPress, Oxford pp 289-296.

2 Rescue of Complete Archaeological Projects (RECAP)

The ADS is also involved in the RECAP project, which is attempting to recover data from some of the major English Heritage sponsored excavations from the last twenty years. The RECAP project looks at 18 different projects (but this represents a small sample) which range from the archaeology of the whole of Lincoln to smaller, more compact archives. This is a mixed bag because the projects were diverse. More importantly, digital data was not seen as an important component in the project plans until after they had been completed. The problem with the RECAP data is that it had never been intended for public consumption, but was largely a by-product of the analysis and publication process. Whatever the causes of this problem, there is an additional one in that the RECAP data is now missing, so there are two tasks: the first to find out where it is, the second to find out what state it is in and whether it can be recovered for access.

3 Archaeological Records of Europe - Networked Access (ARENA)

The Terraconensis project, from the 1980s, was an archaeological landscape survey (see: http://ads.ahds.ac.uk/catalogue/projArch/tarra_var_2003/). The record consisted of some very old databases that were updated by the depositor and became part of a larger digitisation project. This was because, in order for them to be useful, the old databases needed other data sets to go with them that were in paper form. All the maps for the landscape survey were in Aldus Freehand v2 for the Apple Mac. This software required a separate package with an 'export' function for saving its data into other formats, and the ADS was unable to find this software. To reinstate the data, therefore, they had to scan plans from the project publication itself.

The Terraconensis project is an example of a combination of two factors: to start with the ADS was unable to find the right version of obsolete software without expending great effort and resource in the search. The second factor is that data loss could only be avoided because the paper originals from an initial scanning exercise still remained in existence and could be re-scanned.

http://ads.ahds.ac.uk/arena/

There is a short description of this (and other issues arising from Arena) in: Kenny J and Austin T (2004) "Data preservation: Exploring the 'rescue' role of the Archaeology Data Service" in Content Management Focus vol 3 issue 5, pp 25-29.