31 December 2008

More on Digital Archival Storage

Bit Preservation: A Solved Problem?” by David Rosenthal discusses the problems with our current understanding of the reliability of data storage systems. He examines the (comical) claims of storage system vendors and of optimistic researchers, exposes the fact that the claims are meaningless and untestable, and proposes a new metric: “bit half-life”.

The most abstract model of a bit preservation system is as a black box, into which a string of bits S(0) is placed at time T(0) and from which at subsequent times T(i) a string of bits S(i) can be extracted. The system is successful if S(i) = S(0) for all i.

No real-world system can be perfect and eternal, so real systems will fail. The simplest model of these failures is analogous to the decay of radioactive atoms. Each bit in the string independently is subject to a random process that has a constant small probability per unit time of causing its value to flip. The time after which there is a 50% probability that a bit will flip is the “bit half-life”.


There is no escape from the problem that the size of the data collections to be preserved and the times for which they must be preserved mean that experimental confirmation that the technology chosen is up to the job is not economically feasible. Even if it was the results would not be available soon enough to be useful. What this argument demonstrates is that, far from bit preservation being a solved problem, it is in a very specific sense an unsolvable problem.

No comments:

Post a Comment