Results 1 to 4 of 4

Thread: opendedup & openSuSe 12.1/12.2 & AMD64

  1. #1

    Default opendedup & openSuSe 12.1/12.2 & AMD64

    I have recently come across the opendedup project and am looking at installing it. However, I have done a google & a search of these forums and I see no pertinent threads regarding any experiences with this package on these oS's. Does anyone have any input before I walk into the NFS 12.2 trap?


    Appreciate any discussion...

    -ejb

  2. #2
    Join Date
    Jun 2008
    Location
    San Diego, Ca, USA
    Posts
    10,823
    Blog Entries
    1

    Default Re: opendedup & openSuSe 12.1/12.2 & AMD64

    I found opendedup myself not that long ago also, and did a quick personal evaluation about its requirements. I didn't find anything that stuck out, it should install without any problem. Still, if you are especially wary the project has created a virtual appliance you can try.

    TSU

  3. #3
    dd NNTP User

    Default Re: opendedup & openSuSe 12.1/12.2 & AMD64

    On 02/23/2013 05:26 AM, ejboshinski wrote:
    > Appreciate any discussion...


    i'd never heard of this until your post...so i read (at
    http://opendedup.org/whatisdedup) this:

    According to wikipedia, "Data deduplication is a specific form of
    compression where redundant data is eliminated, typically to improve
    storage utilization. In the deduplication process, duplicate data is
    deleted, leaving only one copy of the data to be stored. However,
    indexing of all data is still retained should that data ever be
    required. Deduplication is able to reduce the required storage
    capacity since only the unique data is stored. For example, a typical
    email system might contain 100 instances of the same one megabyte
    (MB) file attachment. If the email platform is backed up or archived,
    all 100 instances are saved, requiring 100 MB storage space. With
    data deduplication, only one instance of the attachment is actually
    stored . . .
    which sounds kinda cool...in the example given you get to 'save' 99
    MBs of space in this one case alone! yipee...

    but, wait a second, there is another way to look at it: if in all
    this compression (and then decompression for use) one byte gets
    corrupted then suddenly ALL 100 of those 1 MB files are
    corrupted....if each of those 100 files were crucial to the document
    it was associated with, then instead of loosing one document due to
    one byte being lost, you would have lost 100!

    not only that, but just like the old (last century) DoubleSpace
    application [which "doubled" the apparent size of your disk via
    compression] there _is_ a speed penalty to pay because _every_ time
    you retrieve anything it has to be uncompressed (by the CPUs)..

    now, you have not explained _your_ situation and _your_ need, but i'd
    say that the price of 1 MB (or 10,000 MB) of storage has fallen so
    much in the last 10 years that if you are working with the 'average'
    home's storage needs you might be better off in the long run to leave
    this new look at an OLD trick alone and just buy a few more gigabytes
    of local drives.......on the other hand, if you are a google
    data-farm administrator there might be millions of dollars to
    save--but a word to the wise, before jumping in on this i'd wanna:

    -find out more about that "open" in the product's name...that is, i
    do not see anywhere where it says it is either "open source" or
    "free" and i'm sure i wouldn't wanna get caught in a trap where i
    learn two years down the road it is CLOSED source and the price just
    went from free to a big bunch!

    -what is the time penalty that has to be paid for the compression
    step and the over and over and over and over decompression steps??

    -how safe would my data actually be, once compressed?

    -how much data risk and decompression time cost am i willing to PAY
    in order to save how much actual storage space? (how much does the
    storage space saved cost versus the cost to replace 100 times the
    data lost??)

    -the kind of data you have will directly drive how much space can be
    saved....and you can bet it will seldom if ever even get close to
    approaching a savings on the order of: start with 100 MBs space
    needed and end up with only 1 MB of space needed..

    -sure are a lot of moving parts! that is, i see it relies on Java 7
    (i _guess_ to provide a "cross-platform" compression/decompression
    engine. so, do you have to have Oracle Java, or openJDK, or the
    convoluted Java version provided by MS? and will those three always
    produce _exactly_ the same compressions and decompression.....every
    time? [if not: trouble. you *can* count on this: if it is possible
    for MS to tweak its Java just a little bit so that it is no longer
    possible to read a disk with Oracle's or openJDK's Java--well,
    anytime MS is involved, you should never expect all things to work
    together smoothly, ever!]

    so....ymmv, and i might be looking at this all wrong! (but, it sounds
    like DoubleSpace Revisited to me, with the added complication of a
    critical Java application doing the work....and, with all of the same
    risks! read especially the section titled "Bugs and data loss" in
    http://en.wikipedia.org/wiki/DriveSpace

    one thing for sure i wouldn't trust my emails from Aunt Tillie to
    this new wonderful savings thing _just_ to not need to buy a few
    terabytes of hard drives..

    --
    dd
    openSUSE®, the "German Engineered Automobile" of operating systems!
    http://tinyurl.com/DD-Caveat


  4. #4
    Join Date
    Feb 2009
    Location
    Spain
    Posts
    25,547

    Default Re: opendedup & openSuSe 12.1/12.2 & AMD64

    On 2013-02-24 10:12, dd wrote:

    > but, wait a second, there is another way to look at it: if in all this
    > compression (and then decompression for use) one byte gets corrupted
    > then suddenly ALL 100 of those 1 MB files are corrupted....if each of


    That is true of any compression method.

    For example, if you do a tar archive backup, which is then compressed (a
    ..tar.gz or .tgz), a single error in the compression and all is lost.

    Other methods compress each file individually, then archive the lot.
    This way, an error only destroys one file.

    There are more sophisticated methods of backup or archival. I have used,
    in MsDos times, proprietary methods that combined compression with
    forward error recovery, which is a step ahead of simple error detection.
    I have not seen any Linux backup software using this, unless we use our
    own combination with par or par2, not even in the oss repo.

    (search for 'forward error recovery'. It is used by the NASA for their
    llllong distance communications, for example)

    > not only that, but just like the old (last century) DoubleSpace
    > application [which "doubled" the apparent size of your disk via
    > compression] there _is_ a speed penalty to pay because _every_ time you
    > retrieve anything it has to be uncompressed (by the CPUs)..


    It is negligible compared to the read speed of mechanical devices such
    as hard disks :-)

    Even at the times doublespace was designed and used, you actually gained
    speed on most hardware, because hard disks were very slow compared to
    recent hardware; having to write or read half the sectors made the
    overall speed faster. Curious, eh? :-)

    Of course, for that you use fast algorithms with low rate of
    compression. If the intention is backup or archival, you usually
    compress more, speed is less of an issue.

    Interestingly, NTFS does have directory compression, a successor of
    doublespace. You can tag individual files or entire trees for
    compression. Ext2/3/4 has flags to mark files as compressed, but the
    code to achieve this was never implemented. Shame on Linux, IMNSHO.

    btrfs does have compression if you enable it, I heard. Of course, if you
    use replication in that filesystem space increases...


    > now, you have not explained _your_ situation and _your_ need, but i'd
    > say that the price of 1 MB (or 10,000 MB) of storage has fallen so much
    > in the last 10 years that if you are working with the 'average' home's
    > storage needs you might be better off in the long run to leave this new
    > look at an OLD trick alone and just buy a few more gigabytes of local
    > drives.......on the other hand, if you are a google data-farm
    > administrator there might be millions of dollars to save--but a word to
    > the wise, before jumping in on this i'd wanna:


    Compression is nice. My desktop has about 2.5TB of storage. To do a
    backup I need at least that external space, maybe duplicated for
    alternating backups. A 3TB Seagate HD costs between 100 and 300 euros at
    Alternate. Plus box, tax, P&P.

    I don't call that "cheap".


    However, I know nothing about this particular product the OP asks about.
    I'm curious, though.

    --
    Cheers / Saludos,

    Carlos E. R.
    (from 12.1 x86_64 "Asparagus" at Telcontar)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •