SCAA - Saskatchewan
Council for Archives and Archivists

Saskatchewan Archival Information Network
Membership
Outreach
Grant Programs
Newsletter
Executive
Policy Documents
What's New
Home


Electronic records: Basic concepts in preservation and access

by Jeff O'Brien
SCA Outreach Archivist
February, 1998
by Jeff O'Brien
SCA Outreach Archivist
February, 1998

Stating the Problem:

The management and preservation of electronic records is perhaps the most critical issue affecting archivists and other information professionals today. While it does not seem as if the much-touted "paperless office" will become a reality anytime soon, most records today are created electronically, and an increasingly larger percentage of them are maintained that way throughout their existence.

Put simply, electronic records are physically and intellectually fragile. They are easy to manipulate (creating questions about reliability and authenticity) and can be rendered unusable not only through physical degredation of the meduim upon which they are created but through changes in available hardware and software.

Physically, electronic media are less stable than their various predecessors (paper, parchment, clay tablets, microfilm, etc) with commensurately shorter lifespans. Magnetic tape and computer disks can deteriorate to the point of unreadability within 30 years. Optical media are slightly more durable, but neither comes close to matching the stability of acid-free paper or a clay tablet. The following table provides an indication of relative lifespans versus information storage densities. As it shows, there is a price to be paid with the advances we have made in our ability to store information.

The Dilemma of Modern Media
Information Density v. Life Expectancy

Information Density vs Lifespan of record media
Click to view full image.

(from Paul Conway, Preservation in the Digital World, The
Commission for Preservation and Access: March 1996.), p. 4


A more compelling concern is that of hardware and software obsolescence and incompatibility.The computer industry has changed dramatically in the last 50 years, and the rate at which changes are introduced is itself increasing. Punch cards, once the mainstay of automated information storage systems are an excellent example. In the absence of the proper reader, a computer punch card is not a record; it is a bookmark. Similarly, in the absence of a program that reads html (hyper text markup language, the coding language of Web pages) and the proper computer monitor, the formatting, links, images and other characteristics that are part of the information conveyed by the document that you are presently reading would be inaccessible.

Other examples abound. If the computer at one's workstation lacks a 5.25 inch drive, one cannot read information stored on disks of that format. The 6.0 version of a wordprocessor will not read files created on the newer 7.0 version. A web site with unmaintained hypertext links loses some of its functionality and its ability to present information in the way it was designed. So when we talk about the preservation of electronic records we are referring to them both as physical objects and as intellectual concepts. If either side of the equation is ignored the records cease to be comprehensible. If it cannot be read, it is not information.

While some electronic records (e-mail and word processor documents, for example) can be printed to paper and preserved that way, an increasing percentage of the electronic records we now create cannot be printed out with out destroying most of their meaning. For example, while individual entries in a database could conceivably be printed to paper, the system as a whole with the inter-relationships between information and the different "views" accorded different users would be completely lost. The challenge has been for archivists to discover ways to effectively preserve and provide access to electronic information.

In the Life Cycle approach, records go through three distinct stages: Active (when they are being used by their creators); Semi-active (when they are no longer being used but might still need to be consulted from time to time); and Inactive, when they are of no further value to the creating organization. It is at this point, somewhere between the Records Centre and the garbage can, that Archivists intercede, rescuing those few records with long-term significance and preserving them for historical and cultural reasons that can be quite different from the reasons for which they were created.

It should be obvious that to treat electronic records in this way is to spell their certain doom. To begin with, electronic records do not necessarily abide by the clear-cut line between the three stages. At what point does the information in a database become Semi-Active, for example? Or does information which is no longer used merely get deleted and lost forever? Similarly, how is electronic mail managed to ensure that it is kept beyond the immediate needs of the receiver? More importantly, what happens to electronic records which are no longer maintained in a current operating environment but instead are saved to disk, say, and stuffed in a drawer? It is one thing to discover a box of files that someone stored in a closet 20 years ago, but it is quite another to discover a box of computer disks of the same vintage. The chances are that the information they contain will not be salvageable without a great deal of expense and effort, if at all.

Thus the single most important key to long-term preservation of electronic records is the active involvement of Archivists - or at least an awareness and respect for archival concerns - from the moment a record or record-keeping system is conceived. Only then is there any chance that a record will be created which will continue to be accessible and maintain the necessary qualities of authenticity and reliability beyond the time that it is being used for current business purposes.

II. Some Solutions - a short introduction

A great deal has been written about the problems of dealing with electronic records in archives and in the context of current administrative affairs, some of which can be located through the Links page and Bibliography links at the bottom of this document. Perhaps the most sage advice, however, is that of the Australian Greg O'Shea, who said "Records is records", meaning that the essential qualities of records are the same regardless of format. With electronic records the problems become ones of methodology and procedure: given a vulnerable storage medium, how does one ensure that the information held in records stored electronically remains accessible over time and how does one ensure that it provides reliable evidence of the actions it documents?

As suggested earlier, it is vital that long-term preservation issues be discussed from the moment the record or record-keeping system is conceived. The first priority must be the setting up of formal records schedules which identify the records in an unambiguous fashion and provide retention and disposition instructions regarding them. It is impossible to implement long-term preservation measures for records without knowing what records are to be preserved. When scheduling paper records the focus seems to be on deciding what can be destroyed, and when. The surviving material, slated for permanent preservation, will if properly stored pretty much look after itself. Those dealing with electronic records, however, must focus their attention on the records which need to be kept. There is no such thing as "benign neglect" in the world of electronic records management.

The physical storage environment for electronic records is very similar to that of other record media: temperature and humidity have to be controlled (59 degrees Fahrenheit with an RH of 40% is ideal) as well as dust, airborne pollutants, pests and other factors which damage or destroy record material. Magnetic tape (including video and audiotape) should be rewound under constant tension after processing and at regular intervals thereafter. Perhaps most importantly, a statistical sample of all electronic records in storage should be read annually in order to identify real or impending catastrophic information loss.

The physical well-being of electronic records is of little importance if they fall victim to hardware or software obsolescence. Although some companies are advertising CD-ROM disks that will last 100 years, it is a sure bet that the hardware and software necessary to read them will not be available at that time. Paper and other "eye-readable" records can be arranged, described, processed, boxed and then pretty much left alone until someone wants to look at them again regardless of whether that happens ten - or a hundred - years down the road. Electronic records require constant vigilance and regular maintenance throughout their lives to ensure that they continue to be accessible.

According to Dr. Charles Dollar, in order to maintain this accessibility it is necessary to address three areas: readability, retrievablitily and intelligibility.

Readability means that the information can be read and manipulated by hardware other than that for which it was originally created or on which it is presently stored. "Typically" suggests Dollar, "non readability involves some aspect of an older storage device (a tape or disk) that makes it physically incompatible with existing equipment. This is generally called hardware obsolescence." Retrievability is the capacity of specific information ("identifiable records or parts of records") to be retrieved. "Ensuring the retrievability of records requires the funcitonalities of the original operating system or device driver - which are also likely to become obsolete." (Charles Dollar, Archivists and Records Managers in the Information Age," Archivaria36, (Autumn, 1993), p. 46). He adds; "Unless there are built-in migration paths, or newer generations of the software offer backwards compatibility to older versions of the software, access to the records will be impossible." (Ibid.)

Finally, in order to continue to be accessible the records must also be intelligible to those using it:

At the simplest level [intelligibility] operates when two computer systems either use or understand the same digital representation of the information and this representation is translated into a form that humans recognize and understand. An ASCII text file is the best example of this level of intelligibility. The second level occurs when two computer sytems can use or understand the same representation of the information (ASCII) but when the repreesentation is presented to users it does not carry sufficient information (i.e., it is not self-referential) for a human to understand its content. Usually this problem is associated with both coded and numeric data, and the intelligibility of such information can be assured only by the use of documentation defining the values presented by the numbers and codes. The third level of intelligibility ocurs when two different software applications functioning in two different computing environments can process the same digital data with the same results. This is particularly difficult for digita images, where proprietary image file headers and compression techniqes are used. Digital data, and in particular digital images, that can only be processed within a specific proprietary environment are especially vulnerable as this environment becomes obsolete. (Ibid.)

Perhaps the most common tools for maintaining the long-term viability of electronic records are data conversion, migration and re-copying. Most electronic records administrators are familiar with these procedures and quite a bit has been written about them. Essentially they involve moving information from older software and/or hardware platforms on to newer ones, ensuring that data will not be lost due to technological changes. This can be a time-consuming and costly process, however, depending on the volume of material and the frequency of re-copying. As well, while readability can be maintained this way, retrievability and intelligibility may not. Dollar suggests that international standards "that support interoperability and upward migration paths across technology generations" will be part of the answer. (Ibid.)

Yet another area which must be addressed both in current user environments and in the long-term storage of electronic records is that of authenticity. Is the record as it presently exists precisely the same record as was originally created? Dollar suggests three ways to help protect authenticity: "secure client-server architecture" (in which the user only has "read-access" to the records); digital signatures which use digital "hash digests" to indicate if a record has been altered in any way and digital "time-stamping" which also generates a hash digest indicating if a record has been changed. With digital time-stamping, "A recipient either next week or a hundred years from now could validate [the record's] authenticity" using a "public verification key which, like a digital signature, allows users to tell if records have been changed in any way. (Dollar, p.43).

Electronic records management is a complex subject with few easy answers. As records creating bodies turn more and more to electronic means for the creation, distribution and storage of their records, the need to devise effective strategies for ensuring the authenticity and long-term preservation of this material becomes even more critical. Archivists are wont to claim that their collections hold "the collective memory of society". If this is true then we face the prospect of a possible world-wide epidemic of electronic senile dementia, with legal, financial and administrative ramifications which could have serious consequences for all who make and use records. Yet as we have seen answers do exist.

Planning and vigilance are the mainstays of a successful electronic records management system; the old saying that "An ounce of prevention is worth a pound of cure" is nowhere more applicable than in the care and maintenance of electronic records. Records are created in the context of human interaction and are kept to provide evidence of that interaction. Thus the basic lessons of records management apply to all records, regardless of form or medium: records are created to support the functions of their creators and must continue to do so for as long as is necessary. Although it would seem to go without saying, it is all too easy to forget that information which is no longer accessible ceases to be information. In the long run, neglect, apathy and poor planning are just as destructive to electronic records as a can of gasoline and a lit match.

[BACK TO ELECTRONIC RECORDS PAGE]

 

 

© 2001 Saskatchewan Council for Archives and Archivists.
All Rights Reserved