Electronic records: Basic concepts in preservation and access
by Jeff O'Brien
SCA Outreach Archivist
February, 1998
by Jeff O'Brien
SCA Outreach Archivist
February, 1998
Stating the Problem:
The management and preservation of electronic records is perhaps the most
critical issue affecting archivists and other information professionals
today. While it does not seem as if the much-touted "paperless office"
will become a reality anytime soon, most records today are created
electronically, and an increasingly larger percentage of them are maintained
that way throughout their existence.
Put simply, electronic records are physically and intellectually fragile.
They are easy to manipulate (creating questions about reliability and authenticity)
and can be rendered unusable not only through physical degredation of the meduim
upon which they are created but through changes in available hardware and
software.
Physically, electronic media are less stable than their various predecessors
(paper, parchment, clay tablets, microfilm, etc) with commensurately shorter lifespans.
Magnetic tape and computer disks can deteriorate to the point of unreadability
within 30 years. Optical media are slightly more durable, but neither comes
close to matching the stability of acid-free paper or a clay tablet. The
following table provides an indication of relative lifespans versus information
storage densities. As it shows, there is a price to be paid with the advances
we have made in our ability to store information.
The Dilemma of Modern Media
Information Density v. Life Expectancy
Click to view full image.
|
(from Paul Conway, Preservation in the Digital World,
The Commission for Preservation and Access: March 1996.), p. 4
|
A more compelling concern is that of hardware and software obsolescence and
incompatibility.The computer industry has changed dramatically in the last
50 years, and the rate at which changes are introduced is itself increasing.
Punch cards, once the mainstay of automated information storage systems are
an excellent example. In the absence of the proper reader, a computer punch
card is not a record; it is a bookmark. Similarly, in the absence of a program
that reads html (hyper text markup language, the coding language of
Web pages) and the proper computer monitor, the formatting, links, images
and other characteristics that are part of the information conveyed by
the document that you are presently reading would be inaccessible.
Other examples abound. If the computer at one's workstation lacks a 5.25 inch
drive, one cannot read information stored on disks of that format. The 6.0
version of a wordprocessor will not read files created on the newer 7.0
version. A web site with unmaintained hypertext links loses some of its
functionality and its ability to present information in the way it was designed.
So when we talk about the preservation of electronic records we are referring
to them both as physical objects and as intellectual concepts. If either
side of the equation is ignored the records cease to be comprehensible. If it
cannot be read, it is not information.
While some electronic records (e-mail and word processor documents, for example) can
be printed to paper and preserved that way, an increasing percentage of the
electronic records we now create cannot be printed out with out destroying
most of their meaning. For example, while individual entries in a database
could conceivably be printed to paper, the system as a whole with the
inter-relationships between information and the different "views" accorded
different users would be completely lost. The challenge has been for archivists
to discover ways to effectively preserve and provide access to electronic
information.
In the Life Cycle approach, records go through three distinct stages: Active
(when they are being used by their creators); Semi-active (when they are no
longer being used but might still need to be consulted from time to time);
and Inactive, when they are of no further value to the creating organization.
It is at this point, somewhere between the Records Centre and the garbage can, that
Archivists intercede, rescuing those few records with long-term significance
and preserving them for historical and cultural reasons that can be quite different from
the reasons for which they were created.
It should be obvious that to treat electronic records in this way is to spell
their certain doom. To begin with, electronic records do not necessarily abide
by the clear-cut line between the three stages. At what point does the information
in a database become Semi-Active, for example? Or does information which is no
longer used merely get deleted and lost forever? Similarly, how is electronic mail
managed to ensure that it is kept beyond the immediate
needs of the receiver? More importantly, what happens to electronic records which
are no longer maintained in a current operating environment but instead are saved
to disk, say, and stuffed in a drawer? It is one
thing to discover a box of files that someone stored in a closet 20 years
ago, but it is quite another to discover a box of computer disks of the same
vintage. The chances are that the information they contain will not be salvageable
without a great deal of expense and effort, if at all.
Thus the single most important key to long-term preservation of electronic records
is the active involvement of Archivists - or at least an awareness and respect for archival
concerns - from the moment a record or record-keeping system is conceived. Only then is
there any chance that a record will be created which will continue to be accessible and
maintain the necessary qualities of authenticity and reliability beyond the time that it
is being used for current business purposes.
II. Some Solutions - a short introduction
A great deal has been written about the problems of dealing with electronic
records in archives and in the context of current administrative affairs, some
of which can be located through the Links page and Bibliography links at the
bottom of this document. Perhaps the most sage advice, however, is that of
the Australian Greg O'Shea, who said "Records is records", meaning that the
essential qualities of records are the same regardless of format. With electronic
records the problems become ones of methodology and procedure: given a vulnerable
storage medium, how does one ensure that the information held in records stored
electronically remains accessible over time and how does one ensure that it
provides reliable evidence of the actions it documents?
As suggested earlier, it is vital that long-term preservation issues be discussed
from the moment the record or record-keeping system is conceived. The first priority
must be the setting up of formal records schedules which identify the records in
an unambiguous fashion and provide retention and disposition instructions regarding
them. It is impossible to implement long-term preservation measures for records
without knowing what records are to be preserved. When scheduling paper records the
focus seems to be on deciding what can be destroyed, and when. The surviving material,
slated for permanent preservation, will if properly stored pretty much look after
itself. Those dealing with electronic records, however, must focus their attention
on the records which need to be kept. There is no such thing as "benign neglect" in
the world of electronic records management.
The physical storage environment for electronic records is very similar to that of
other record media: temperature and humidity have to be controlled (59 degrees
Fahrenheit with an RH of 40% is ideal) as well as dust, airborne pollutants, pests and
other factors which damage or destroy record material. Magnetic tape (including
video and audiotape) should be rewound under constant tension after processing and
at regular intervals thereafter. Perhaps most importantly, a statistical sample of
all electronic records in storage should be read annually in order to identify real
or impending catastrophic information loss.
The physical well-being of electronic records is of little importance if they fall
victim to hardware or software obsolescence. Although some companies are advertising
CD-ROM disks that will last 100 years, it is a sure bet that the hardware and software
necessary to read them will not be available at that time. Paper and other "eye-readable"
records can be arranged, described, processed, boxed and then pretty much left alone
until someone wants to look at them again regardless of whether that happens ten - or
a hundred - years down the road. Electronic records require constant vigilance and
regular maintenance throughout their lives to ensure that they continue to be accessible.
According to Dr. Charles Dollar, in order to maintain this accessibility it is necessary
to address three areas: readability, retrievablitily and intelligibility.
Readability means that the information can be read and manipulated by hardware other than that
for which it was originally created or on which it is presently stored. "Typically" suggests
Dollar, "non readability involves some aspect of an older storage device (a tape or
disk) that makes it physically incompatible with existing equipment. This is
generally called hardware obsolescence." Retrievability is the capacity of specific
information ("identifiable records or parts of records") to be retrieved. "Ensuring
the retrievability of records requires the funcitonalities of the original operating
system or device driver - which are also likely to become obsolete." (Charles Dollar,
Archivists and Records Managers in the Information Age," Archivaria36, (Autumn,
1993), p. 46). He adds; "Unless there are built-in migration paths, or newer generations of the software offer
backwards compatibility to older versions of the software, access to the records will
be impossible." (Ibid.)
Finally, in order to continue to be accessible the records must also be intelligible to
those using it:
At the simplest level [intelligibility] operates when two computer systems either
use or understand the same digital representation of the information and this
representation is translated into a form that humans recognize and understand. An
ASCII text file is the best example of this level of intelligibility. The second
level occurs when two computer sytems can use or understand the same representation
of the information (ASCII) but when the repreesentation is presented to users it
does not carry sufficient information (i.e., it is not self-referential) for a human
to understand its content. Usually this problem is associated with both coded and
numeric data, and the intelligibility of such information can be assured only by the
use of documentation defining the values presented by the numbers and codes. The third
level of intelligibility ocurs when two different software applications functioning
in two different computing environments can process the same digital data with the
same results. This is particularly difficult for digita images, where proprietary
image file headers and compression techniqes are used. Digital data, and in
particular digital images, that can only be processed within a specific proprietary
environment are especially vulnerable as this environment becomes obsolete. (Ibid.)
Perhaps the most common tools for maintaining the long-term viability of electronic
records are data conversion, migration and re-copying.
Most electronic records administrators are familiar with these procedures and quite
a bit has been written about them. Essentially they involve moving information from
older software and/or hardware platforms on to newer ones, ensuring that data will
not be lost due to technological changes. This can be a time-consuming and costly
process, however, depending on the volume of material and the frequency of re-copying.
As well, while readability can be maintained this way, retrievability and
intelligibility may not. Dollar suggests that international standards "that support
interoperability and upward migration paths across technology generations" will be
part of the answer. (Ibid.)
Yet another area which must be addressed both in current user environments and in the
long-term storage of electronic records is that of authenticity. Is the record as it
presently exists precisely the same record as was originally created? Dollar suggests
three ways to help protect authenticity: "secure client-server architecture" (in which the
user only has "read-access" to the records); digital signatures which use digital
"hash digests" to indicate if a record has been altered in any way and
digital "time-stamping" which also generates a hash digest indicating
if a record has been changed. With digital time-stamping, "A recipient either
next week or a hundred years from now could validate [the record's] authenticity" using a
"public verification key which, like a digital signature, allows users to tell if
records have been changed in any way. (Dollar, p.43).
Electronic records management is a complex subject with few easy answers. As
records creating bodies turn more and more to electronic means for the creation,
distribution and storage of their records, the need to devise effective strategies
for ensuring the authenticity and long-term preservation of this material becomes
even more critical. Archivists are wont to claim that their collections hold "the
collective memory of society". If this is true then we face the prospect of a
possible world-wide epidemic of electronic senile dementia, with legal,
financial and administrative ramifications which could have serious consequences
for all who make and use records. Yet as we have seen answers do exist.
Planning and vigilance are the mainstays of a successful electronic records management
system; the old saying that "An ounce of prevention is worth a pound of cure" is
nowhere more applicable than in the care and maintenance of electronic records.
Records are created in the context of human interaction and are kept to provide
evidence of that interaction. Thus the basic lessons of records management
apply to all records, regardless of form or medium: records are created to support
the functions of their creators and must continue to do so for as long as is
necessary. Although it would seem to go without saying, it is all too easy to forget
that information which is no longer accessible ceases to be information. In the long
run, neglect, apathy and poor planning are just as destructive to electronic records
as a can of gasoline and a lit match.
[BACK TO ELECTRONIC RECORDS PAGE]
|