opendedup & openSuSe 12.1/12.2 & AMD64

I have recently come across the opendedup project and am looking at installing it. However, I have done a google & a search of these forums and I see no pertinent threads regarding any experiences with this package on these oS’s. Does anyone have any input before I walk into the NFS 12.2 trap?

Appreciate any discussion…

-ejb

I found opendedup myself not that long ago also, and did a quick personal evaluation about its requirements. I didn’t find anything that stuck out, it should install without any problem. Still, if you are especially wary the project has created a virtual appliance you can try.

TSU

On 02/23/2013 05:26 AM, ejboshinski wrote:
> Appreciate any discussion…

i’d never heard of this until your post…so i read (at
http://opendedup.org/whatisdedup) this:

According to wikipedia, "Data deduplication is a specific form of
compression where redundant data is eliminated, typically to improve
storage utilization. In the deduplication process, duplicate data is
deleted, leaving only one copy of the data to be stored. However,
indexing of all data is still retained should that data ever be
required. Deduplication is able to reduce the required storage
capacity since only the unique data is stored. For example, a typical
email system might contain 100 instances of the same one megabyte
(MB) file attachment. If the email platform is backed up or archived,
all 100 instances are saved, requiring 100 MB storage space. With
data deduplication, only one instance of the attachment is actually
stored . . .

which sounds kinda cool…in the example given you get to ‘save’ 99
MBs of space in this one case alone! yipee…

but, wait a second, there is another way to look at it: if in all
this compression (and then decompression for use) one byte gets
corrupted then suddenly ALL 100 of those 1 MB files are
corrupted…if each of those 100 files were crucial to the document
it was associated with, then instead of loosing one document due to
one byte being lost, you would have lost 100!

not only that, but just like the old (last century) DoubleSpace
application [which “doubled” the apparent size of your disk via
compression] there is a speed penalty to pay because every time
you retrieve anything it has to be uncompressed (by the CPUs)…

now, you have not explained your situation and your need, but i’d
say that the price of 1 MB (or 10,000 MB) of storage has fallen so
much in the last 10 years that if you are working with the ‘average’
home’s storage needs you might be better off in the long run to leave
this new look at an OLD trick alone and just buy a few more gigabytes
of local drives…on the other hand, if you are a google
data-farm administrator there might be millions of dollars to
save–but a word to the wise, before jumping in on this i’d wanna:

-find out more about that “open” in the product’s name…that is, i
do not see anywhere where it says it is either “open source” or
“free” and i’m sure i wouldn’t wanna get caught in a trap where i
learn two years down the road it is CLOSED source and the price just
went from free to a big bunch!

-what is the time penalty that has to be paid for the compression
step and the over and over and over and over decompression steps??

-how safe would my data actually be, once compressed?

-how much data risk and decompression time cost am i willing to PAY
in order to save how much actual storage space? (how much does the
storage space saved cost versus the cost to replace 100 times the
data lost??)

-the kind of data you have will directly drive how much space can be
saved…and you can bet it will seldom if ever even get close to
approaching a savings on the order of: start with 100 MBs space
needed and end up with only 1 MB of space needed…

-sure are a lot of moving parts! that is, i see it relies on Java 7
(i guess to provide a “cross-platform” compression/decompression
engine. so, do you have to have Oracle Java, or openJDK, or the
convoluted Java version provided by MS? and will those three always
produce exactly the same compressions and decompression…every
time? [if not: trouble. you can count on this: if it is possible
for MS to tweak its Java just a little bit so that it is no longer
possible to read a disk with Oracle’s or openJDK’s Java–well,
anytime MS is involved, you should never expect all things to work
together smoothly, ever!]

so…ymmv, and i might be looking at this all wrong! (but, it sounds
like DoubleSpace Revisited to me, with the added complication of a
critical Java application doing the work…and, with all of the same
risks! read especially the section titled “Bugs and data loss” in

one thing for sure i wouldn’t trust my emails from Aunt Tillie to
this new wonderful savings thing just to not need to buy a few
terabytes of hard drives…


dd
openSUSE®, the “German Engineered Automobile” of operating systems!
http://tinyurl.com/DD-Caveat

On 2013-02-24 10:12, dd wrote:

> but, wait a second, there is another way to look at it: if in all this
> compression (and then decompression for use) one byte gets corrupted
> then suddenly ALL 100 of those 1 MB files are corrupted…if each of

That is true of any compression method.

For example, if you do a tar archive backup, which is then compressed (a
…tar.gz or .tgz), a single error in the compression and all is lost.

Other methods compress each file individually, then archive the lot.
This way, an error only destroys one file.

There are more sophisticated methods of backup or archival. I have used,
in MsDos times, proprietary methods that combined compression with
forward error recovery, which is a step ahead of simple error detection.
I have not seen any Linux backup software using this, unless we use our
own combination with par or par2, not even in the oss repo.

(search for ‘forward error recovery’. It is used by the NASA for their
llllong distance communications, for example)

> not only that, but just like the old (last century) DoubleSpace
> application [which “doubled” the apparent size of your disk via
> compression] there is a speed penalty to pay because every time you
> retrieve anything it has to be uncompressed (by the CPUs)…

It is negligible compared to the read speed of mechanical devices such
as hard disks :slight_smile:

Even at the times doublespace was designed and used, you actually gained
speed on most hardware, because hard disks were very slow compared to
recent hardware; having to write or read half the sectors made the
overall speed faster. Curious, eh? :slight_smile:

Of course, for that you use fast algorithms with low rate of
compression. If the intention is backup or archival, you usually
compress more, speed is less of an issue.

Interestingly, NTFS does have directory compression, a successor of
doublespace. You can tag individual files or entire trees for
compression. Ext2/3/4 has flags to mark files as compressed, but the
code to achieve this was never implemented. Shame on Linux, IMNSHO.

btrfs does have compression if you enable it, I heard. Of course, if you
use replication in that filesystem space increases…

> now, you have not explained your situation and your need, but i’d
> say that the price of 1 MB (or 10,000 MB) of storage has fallen so much
> in the last 10 years that if you are working with the ‘average’ home’s
> storage needs you might be better off in the long run to leave this new
> look at an OLD trick alone and just buy a few more gigabytes of local
> drives…on the other hand, if you are a google data-farm
> administrator there might be millions of dollars to save–but a word to
> the wise, before jumping in on this i’d wanna:

Compression is nice. My desktop has about 2.5TB of storage. To do a
backup I need at least that external space, maybe duplicated for
alternating backups. A 3TB Seagate HD costs between 100 and 300 euros at
Alternate. Plus box, tax, P&P.

I don’t call that “cheap”.

However, I know nothing about this particular product the OP asks about.
I’m curious, though.


Cheers / Saludos,

Carlos E. R.
(from 12.1 x86_64 “Asparagus” at Telcontar)