How can we archive our data long-term?

On 2010-08-13 20:28, Will Honea wrote, on the thread “SuSE 8.2 personal + Old PC + Firefox 3.6?”:

(I’m starting a new thread on this one)

> That leads to a question: where to go for archival storage? I bogged down
> when it came time to decide WHERE to put the data. There have been all sorts
> of discussions over the years on this topic but I have no idea of the
> current state of the art. Anything as reliable as the 1970 era punch
> cards/paper tape yet?

very good question… Good enough for another thread.

I do not really know.

I remember when I saw the first mention of optical media, for audio, in the Spanish translation of
the magazine “Scientific American”, here called “Investigación y Ciencia”. Some time later they
reported on the modifications done to use CDs to store data. Of course, nothing usable for common
mortals… er, users.

The thing is, the CD was considered then so reliable as to be “ethernal”.

Then, I suppose, makers started to cut corners for consumers, and their expected self life could be
as short as a decade.

The ony reliable archival media, with a proven record, is ink and paper. A track record of
centuries. Even Millennia.

So… could we use paper again for our archives?

Not as text. Dots. Pages fulls of dots that can be scanned again and converted to the original code.

Is it doable? What data density could we get per page? Laser or inkjet? Colour/Greys/BW? At what
resolution? How many printer pixels per information dot? Redundancy / forward error recovery?

Food for thought, eh?


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” GM (Elessar))

Carlos E. R. wrote:

> On 2010-08-13 20:28, Will Honea wrote, on the thread “SuSE 8.2 personal +
> Old PC + Firefox 3.6?”:
>
> (I’m starting a new thread on this one)
>
>> That leads to a question: where to go for archival storage? I bogged
>> down when it came time to decide WHERE to put the data. There have been
>> all sorts of discussions over the years on this topic but I have no idea
>> of the
>> current state of the art. Anything as reliable as the 1970 era punch
>> cards/paper tape yet?
>
In the mid 90s I worked for a company (medical databases, medical imaging)
where this was a big issue. The result was using magneto optical (not cd)
drives and media with a guaranteed life time of 30 years for the written
data. Cost per media with something about 1 GB per disk was at that time 150
DM (I think at the beginning it was even more expensive) which was at that
time about 100 US $. I do not remember the exact specs of the MO’s and
WORM’s used or the vendor name and if this type of media is still in use (of
course the hospitals will still have the drives to read the data).

But this leads to another interesting question. It is not enough to have
media for long term storage you also need to be sure that it can be read
with some device and used after a long time, maybe such a long time that the
companies which made the drives do not even exist.

Martin Helm wrote:

> I do not remember the exact specs of the MO’s
> and WORM’s used or the vendor name and if this type of media is still in
> use

The manufacturer name was plasmon or plassmon or similar.

Martin Helm wrote:
>
> The manufacturer name was plasmon or plassmon or similar.
Found it (just as an example, I think there are also competitors with
similar solution)
http://www.plasmon.com/media/mo.html
The media are now bigger (9 GB) the guarantee is still 30 years.

But rocks out live paper lol

Anything requiring electronic gear to read is not “long term storage”, regardless how long the media last. Because the electronics capable to read it will no longer work or exist after 50 or 100 years. The Babylonians showed us what long term storage of information is all about. We - in turn - will be the “lost culture”. Nothing will remain. This is the irony of history: our explosion of information will leave no trace.

vodoo wrote:

>
> Anything requiring electronic gear to read is not “long term storage”,
> regardless how long the media last. Because the electronics capable to
> read it will no longer work or exist after 50 or 100 years. The
> Babylonians showed us what long term storage of information is all
> about. We - in turn - will be the “lost culture”. Nothing will remain.
> This is the irony of history: our explosion of information will leave no
> trace.
>
>
Not really, we still read books printed on paper (I prefer it to ebooks, I
have some ebooks but much more than thousend printed books, I count only the
ones I read), there are still paintings, there are still things carved in
stone.
Even the data from the beginning of the computer age is still there copied
over and copied over and copied over. From puchcards and printouts to floppy
disks, tapes, to cd’s to the internet, back to cd’s and dvd’s, back to the
net.
True it can be lost if all breaks down but the same happens (and happened in
some cases) to the the eternal writtings set in stone, if the last person
able to read it dies, if the culture and language is lost.
Archived information which is still there but no way to read it is not an
invention of the 20th and 21st century.
This problem exists for thousends of years.

On 2010-08-13 23:00, Martin Helm wrote:
> Carlos E. R. wrote:

> But this leads to another interesting question. It is not enough to have
> media for long term storage you also need to be sure that it can be read
> with some device and used after a long time, maybe such a long time that the
> companies which made the drives do not even exist.

Yes, that is indeed a problem.

That’s why I keep that old computer, it is the only one where the original software used to create
those backups runs ok. On newer hardware it failed.

The NASA has that problem, too, big scale. They have, for example, lots of data recorded from their
long distance missions that they can not read, because the computers that made those tapes are there
no longer. I have the vague recollection that they had to get somebody to recreate equivalent tape
readers, from scratch, connected to some modern computer, and started retrieving data.

Yes, there are two problems: hardware, and then, encoding format used. There is an open standard I
read somewhere about for long term archival. I think they have to store the original documentation,
the result (PDF, whatever), the software… and after the years, the new software to convert to the
next standard format, and the converted data.

But that is an overkill for us plain users, perhaps.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” GM (Elessar))

Carlos E. R. wrote:

>
> Yes, there are two problems: hardware, and then, encoding format used.
> There is an open standard I read somewhere about for long term archival. I
> think they have to store the original documentation, the result (PDF,
> whatever), the software… and after the years, the new software to
> convert to the next standard format, and the converted data.
>
> But that is an overkill for us plain users, perhaps.
>
From a practical point of view as a user (and I am not rich enough for
expensive solutions) the problem of media for storage can in most cases be
solved simply by copying the content to a newer media.
At some point in time I copied my floppy disks to the hard disk and burnt it
onto cd (just one example over the time).
The problem with the formats is much more serious as you mention and this is
my experience too.
I made long ago presentations with harvard graphics and some years later I
wanted to reuse one of it as base for something new. I could not read and
print it with anything I had. Ok, this was not that my life depended on it,
but it was more than annoying. Now again a few years later I do not care
about that old things.
But there are many things like that.
For the important things the only way I can handle (due to limitations of
time, money and whatever else) is copy it and not only copy it but try to
convert it to a format which is hopefully usable for the next few years.
Then repeat this process, repeat it again and so on.
And sometimes I simply print it and keep the hardcopy.
(and a lot of things to be honest is simply not worth even this simple
effort)

Yes, there are two problems: hardware, and then, encoding format used. There is an open standard I
read somewhere about for long term archival. I think they have to store the original documentation,
the result (PDF, whatever), the software… and after the years, the new software to convert to the
next standard format, and the converted data.

But that is an overkill for us plain users, perhaps.

Not so overkill as you might think, medical records have been previously mentioned, but there also exists legal documents (even for private people), tax records also in the public realm. My desktop machine has a 5.25" floppy, 3.5" hardflopy, CD/DVD, and refurbished: teletype paper reader, card punch reader, drum drive, and Phimon drives all from pre-PC era just so I am not limited as new technology comes out. The interesting thing here is that up until Mandrake 9.1 I could have full freedom to access and use any device I wanted. openSUSE, Ubuntu, Fedora, and Mandriva don’t allow me such backward capabilities. The move to scrap the old technology in favor of the more limited new devices is worrisome.

gogalthorp wrote:

>
> But rocks out live paper lol

And archeology shows that clay tablets are also viable targets :wink:


Will Honea

techwiz03 wrote:

> Not so overkill as you might think, medical records have been
> previously mentioned, but there also exists legal documents (even for
> private people), tax records also in the public realm. My desktop
> machine has a 5.25" floppy, 3.5" hardflopy, CD/DVD, and refurbished:
> teletype paper reader, card punch reader, drum drive, and Phimon drives
> all from pre-PC era just so I am not limited as new technology comes
> out. The interesting thing here is that up until Mandrake 9.1 I could
> have full freedom to access and use any device I wanted. openSUSE,
> Ubuntu, Fedora, and Mandriva don’t allow me such backward capabilities.
> The move to scrap the old technology in favor of the more limited new
> devices is worrisome.

But you miss one key point: with open source, I can (and have) insert a
driver for specific devices. Such efforts are needed for many pre-IDE disc
technologies and other i/o avenues - but your thesis still holds.

The missing element is more one of interpreting, not just “reading” the
data. Even with a “Rossetta Stone” document, that’s not always viable given
the plethora of proprietary formats/encodings used over the years. And
consider a PGP encrypted document with no known copy of the key(s)…

If someone has the money to spare, this would be a very good project to
undertake.


Will Honea

Yes, and probably more food for recycling. Quality is also an issue for paper. The long lasting paper you referred to was handcrafted, and earliest stuff was parchment - a labourious process. The photocopier type paper used today with inkjet/laser printers wouldn’t last that long, would see the physical storage problem return i.e. archive space, and the costs would escalate for that and better quality paper. I suppose NASA could utillize “space” for storage and a non-corrosive environment, but we probably couldn’t afford that.

On 2010-08-14 11:36, consused wrote:
>
> Carlos E. R.;2206093 Wrote:
>> The ony reliable archival media, with a proven record, is ink and paper.
>> A track record of
>> centuries. Even Millennia.
>>
>> So… could we use paper again for our archives?
>>
>> Not as text. Dots. Pages fulls of dots that can be scanned again and
>> converted to the original code.
>>
>> Is it doable? What data density could we get per page? Laser or inkjet?
>> Colour/Greys/BW? At what
>> resolution? How many printer pixels per information dot? Redundancy /
>> forward error recovery?
>>
>> Food for thought, eh?
>>
>>
>
> Yes, and probably more food for recycling. Quality is also an issue for
> paper. The long lasting paper you referred to was handcrafted, and
> earliest stuff was parchment - a labourious process. The photocopier
> type paper used today with inkjet/laser printers wouldn’t last that
> long, would see the physical storage problem return i.e. archive space,
> and the costs would escalate for that and better quality paper. I
> suppose NASA could utillize “space” for storage and a non-corrosive
> environment, but we probably couldn’t afford that.

True enough,

Quick calculation: If the dot density could be as high as 300 DPI, a paper of 8*10 inches would hold
the massive info of… 24000 bytes. Wow.

:-}

There goes our paper archive to the idea dustbin. I don’t think it could really use more than 100
dpi, which means 8000 bytes. Hey, not bytes, bits. Same above, the 24000 bits, would be 3000 bytes.
We could perhaps use color dots, three colors and black, 4 values per dot. Even so… density is not
good enough.

Perhaps for text documents stored in this way, which is easier to scan and recover: real text on one
side of the page, dots on the other.

(Dot matrixes scanned by laser are used here (Spain) for some government forms. They are created
with Adobe Reader, some plugin, and a special server, I understand)


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” GM (Elessar))

On 2010-08-14 02:06, Martin Helm wrote:
> Carlos E. R. wrote:
>
>>
>> Yes, there are two problems: hardware, and then, encoding format used.
>> There is an open standard I read somewhere about for long term archival. I
>> think they have to store the original documentation, the result (PDF,
>> whatever), the software… and after the years, the new software to
>> convert to the next standard format, and the converted data.
>>
>> But that is an overkill for us plain users, perhaps.
>>
> From a practical point of view as a user (and I am not rich enough for
> expensive solutions) the problem of media for storage can in most cases be
> solved simply by copying the content to a newer media.

That is what I intended to do, but did not manage to do.

My old data was (is) in backups made by proprietary software (pctools). It was a good method at the
time: good speed, reliable (forward error recovery), good data density (for the time). The data is
still retrievable.

The snag is that the software wasn’t even compatible with the next generation of PCs. Something
changed with the floppy drives, or with dma, that it would not work.

So you say: migrate the data.

True: but the only method I had of migrating the data from one computer to another was floppies -
which meant it was impossible. Some files were bigger than one floppy. I needed hundreds of flopy
transfers. It was not practically feasible. I had no network. I had to wait for another technology
to do the transfer.

After a few years, I stopped caring.

I could do it now, if I wanted. But… now I’m to lazy to start.

My point is, there maybe data migration stages over the years that are so big a chore that aren’t
done when they could be done. Later is perhaps too late. Even gig people like the NASA have been
caught by this problem.

Suppose I have five hundred or more DVDs in storage. Copying them over to whatever comes is going to
be a big chore, too.

> The problem with the formats is much more serious as you mention and this is
> my experience too.

Indeed.

> I made long ago presentations with harvard graphics and some years later I
> wanted to reuse one of it as base for something new. I could not read and
> print it with anything I had. Ok, this was not that my life depended on it,
> but it was more than annoying. Now again a few years later I do not care
> about that old things.

True.

I have old documents written in wordperfect that for a long time I could not even read. I think that
oowriter now can, but I haven’t tried, they are in old disks stored I don’t remember where.

> And sometimes I simply print it and keep the hardcopy.

Yep.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” GM (Elessar))

On 2010-08-14 03:06, techwiz03 wrote:
>
>> Yes, there are two problems: hardware, and then, encoding format used.
>> There is an open standard I
>> read somewhere about for long term archival. I think they have to store
>> the original documentation,
>> the result (PDF, whatever), the software… and after the years, the
>> new software to convert to the
>> next standard format, and the converted data.
>>
>> But that is an overkill for us plain users, perhaps.
> Not so overkill as you might think, medical records have been
> previously mentioned, but there also exists legal documents (even for
> private people), tax records also in the public realm. My desktop
> machine has a 5.25" floppy, 3.5" hardflopy, CD/DVD, and refurbished:
> teletype paper reader, card punch reader, drum drive, and Phimon drives
> all from pre-PC era just so I am not limited as new technology comes
> out.

That’s some hardware.

> The interesting thing here is that up until Mandrake 9.1 I could
> have full freedom to access and use any device I wanted. openSUSE,
> Ubuntu, Fedora, and Mandriva don’t allow me such backward capabilities.
> The move to scrap the old technology in favor of the more limited new
> devices is worrisome.

True.

Which means, keep running old hardware and software. In this case, I think a virtual host (modern)
running a guest with the old software would not work, as the host would not have the drivers for the
hardware.

At least, with opensource, if you have the resources you can mantain that old hardware in use. If
the drivers were closed, no way.

It is something, at least.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” GM (Elessar))

Will Honea wrote:

> gogalthorp wrote:
>
>>
>> But rocks out live paper lol
>
> And archeology shows that clay tablets are also viable targets :wink:
>
What about a open source clay tablet printer - no bad idea is write only -
better a clay tablet disk drive?

Carlos E. R. wrote:

> So you say: migrate the data.
>
> True: but the only method I had of migrating the data from one computer to
> another was floppies - which meant it was impossible. Some files were
> bigger than one floppy. I needed hundreds of flopy transfers. It was not
> practically feasible. I had no network. I had to wait for another
> technology to do the transfer.
>
What I wrote is no general purpose method for everything you can and will
always run into situations where a method fails. It is simply something
which I use.
The copy and migrate way to keep data accessible is of course limited but so
far it works for me for the most important things now.
I also lost old data in the past because it was in formats and later could
not read or on floppy disks I forgot to copy at the right time.
What I learnt for my own data over the time (not in my job where there are
people responsible for such things) is to avoid proprietary solutions. Keep
things simple and use things like tar.gz for archives where the format is
described and known. Have copies of complex documents I would like to have
available for a long time also in a less complex format which is not binary
but in the worst case human readable (plain text) and graphics in standard
formats which are also not proprietary.
To find a solution for multimedia will most likely be an incredible pain (I
do not convert everything I have into another format).

All this is of course limited, all this can fail and has disadvantages.

What about something like Binary Format Description (BFD) Language for any purely binary data?
At least it is a little less retro!
Binary Format Description (BFD) Language
If I understand it correctly (and there is a good chance that I don’t) any binary data could be archived as hardcopy ASCII text, albeit with a negative compression rate (but that’s XML for you isn’t it?).
Terry.

a sad reality … keep old working equipment on old working PC and have archival access to questionably needed old data or go with new equipment on new PC which in todays market has an effective lifespan matching the attention span of an average 1 year old.

> The interesting thing here is that up until Mandrake 9.1 I could
> have full freedom to access and use any device I wanted. openSUSE,
> Ubuntu, Fedora, and Mandriva don’t allow me such backward capabilities.
> The move to scrap the old technology in favor of the more limited new
> devices is worrisome.

True.

Which means, keep running old hardware and software. In this case, I think a virtual host (modern)
running a guest with the old software would not work, as the host would not have the drivers for the
hardware.

At least, with opensource, if you have the resources you can mantain that old hardware in use. If
the drivers were closed, no way.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” GM (Elessar))

Drivers are the problem. While I have the source tucked in archives of mandrake 9.1 somewhere, as I recall when I sought to integrate them into openSUSE at the kernel level there were library errors, compile errors, timing errors, so in short I was left with a borked kernel. Maybe one day when I have more time than brains I’ll convert the code to assembly language