Backup ideas that can detect missing files?

Ok, this is the delima I’ve thought about for years. I have about 10 years of home videos and pictures on my computer that I can’t loose. I run Crashplan to back up my files. But, I have hundreds of folders holding pictures. If a folder all the sudden is empty or the pictures become corrupt, how will I know? I don’t check pictures from 10 years ago often. If they become missing or corrupt, I may not know for months or years. By then my backups are long gone. It would be nice to find a backup that would backup all my files, and if it detected any missing files, it should alert me… send me a report of what files were there yesterday but are missing today. Does such a program exist?

(I am now more worried because today I noticed a lot of folders that contained pictures from about ten years ago I found empty today. They may have been empty for the last year… I don’t look at the pictures daily. All I know is I checked in my backups and, as of 2 months ago, they were not there according to my backups. I have one other hope I can get them back when I get home. But I need to guarantee my home videos and pictures will be available 20 to 40 years from now. )

How are you doing the backup? Done correctly older files should not be erased just because the source files are missing.

Simply copying files to the backup media should preserve all files. If you do some sort of partition imaging over writing the backup then if the files go missing in the source then they go missing on the image

I currently am using Crashplan to backup. Problem is a few weeks ago, I formatted and reinstalled my OS and restarted the backups. So, I can’t restore anything before that restore of my OS. If I was somehow receiving reports of filesystem deletions on the folder structures I’m backing up, it would’ve prevented these folders disappearing.

You should consider backup to optical media, specifically BR or DVD (not to CD),
Then storing that data disk in a controlled environment, eg air tight and maybe even in nitrogen or vacuum with zero humidity and away from light and heat.
Done properly, it should be good indefinitely and not subject to erosion.

And as always develop a redundancy of three…
ie
At least 3 copies
At least 3 different ways of making copies
Stored in at least 3 geographical separate locations
etc.

TSU

The other way I’m hoping I can get the pictures back is a year or two back, I copied all my home videos and pictures to DVD and I’m 99% sure they’re sitting in a box in my closet. When I get home tonight, I’m going to look through those to see if I can find the pics for the empty folders. The pics that are missing are pics from about 10 years ago, so my 1-2 year old DVD backups should have them. I need to stay on top of it though and make more regular DVD backups.

For long term storage I use Amazon Glacier and CrossFTP for management. It’s currently $0.004 per GB per month so I pay $1.08US per month for 271 GB of storage.

The CrossFTP program can be used to synchronize in either direction, and shows what will be copied or deleted. I don’t remember the cost, but it wasn’t much and updates are free.

Just a FYI for anyone considering Glacier (It’s under consideration for a project I’m working on),
It’s incredibly cheap to upload and store long term, but be prepared for a massive cost if you ever want to retrieve more than 5% of your files in a given month.
Plus, files should be stored for over 90 days before you attempt to retrieve.

More restrictions in the fine print which is extensive.

TSU

You are correct but for long term storage the need to retrieve everything in a hurry would be unusual.

But even bulk retrieval at a low rate is pretty cheap.

https://aws.amazon.com/glacier/faqs/

So, I did find a lot of pictures last night from my backup DVDs I kept in my closet. Still, the part of the question I’m way more interested is not so much how to back up, but maybe a script that could detect a major change in my files and warn me that files might be missing so I don’t go months or years before noticing a ton of files just vanished.

Personally I backup everything to a local NAS using rsync via a cron script, with 14 days backup. The log files are mailed to me and I examine them daily to see if anything unexpected has happened. This is my first line of defence.

I also do a daily backup to Amazon S3 using the s3cmd sync utility, via a cron script, for anything that I deem not to be long term storage, with a 2 month retention policy on changed files. Again, the log files are emailed to me so I can check for surprises.

As mentioned above I use Amazon Glacier for long term storage, things like photos, receipts, etc, that I manually sync to avoid accidentally deleting something that has disappeared locally.

Almost anything is doable, but exact help will depend on what cloud you are using and what software you are using.

Not necessarily.
Since the following blog was written, Amazon has clarified some of their vague pricing but AFAICS the actual costs described in the blog article have not changed.

So,
Whatever you push to Glacier,
As long as you might not want to retrieve more than a few files in a month, it’s reasonable.
But, if like in the blog you decide you need or want a large number of files within days it can be pretty expensive, more than making up for the previous savings.

A brief summary of the problems described in the article…
The author wanted to transfer his glacier stash of music files out of Glacier to save on the dollar or so per month.
There apparently was/is some kind of governor that breaks up extremely large jobs into a series of smaller jobs, and it’s not certain when those jobs would run, but would typically complete over 4 hrs (How’s that for the fastest time, but 4 hrs is faster than days and weeks). Not knowing what was happening and working with unfamiliar CLI tools, he kept re-invoking the transfer which caused numerous start and stops, and of course Amazon charged him for every single one of those failed transfers.
One lesson learned, be prepared for surprises, it’s possible to try to save a dollar or two a month and end up with a $150 bill instead.

TSU

This is a link that explains the current download rates (it has changed, as noted at the start of the referenced blog post):
https://aws.amazon.com/glacier/pricing/?nc=sn&loc=3

So in my case, if I decided to download the entire 250 GB at the bulk rate, it would work out to 250 * 0.09 (transfer out) + (250 - 10) * 0.0025 (retrieval) + 60000 * 0.025 / 1000 (retrieval requests) = ~$22.50.

For 250 GB of static storage, Glacier is $1/mo, S3 is $5.55/mo. A 250 GB S3 download is $22.41, so it’s about the same as Glacier, it just shows up quicker.

A nifty calculator for all AWS services is here:
https://calculator.s3.amazonaws.com/index.html
The price of this cloud storage just keeps dropping. I can’t believe what I spend on keeping my NAS current, there is no payback at all and it’s not a whole lot faster :slight_smile:

Since many years I use www.syncovery.com on Windows for backup (not for synchronizing).
I don’t know if all features are available in the linux version, but it has many options like “move files in the backup to folder x and keep y old versions if the original files are deleted or modified”. It can detect renamed and moved files in source and move and rename in the backup.
The standard editions costs only ~30 $ or €.

Beside that, you could store your original files on a file system like btrfs and “scrub” that (search corrupted files) every week.
Btrfs keeps checksums for all files and can detect more problems than other file systems.

For my data, I synchronize to External HDs using Unison. Whenever an External HD comes on sale at an irresistable bargain, I buy another one, so I now have a few. I rotate them, keeps me with synchronized backups going a ways back.

In addition, those Externals have separate partitions where I keep Full Disk Backups (made using Clonezilla) going back actually a few years. After a couple of years, you can always go into the Clonezilla backed-up directory and remove the backup files for the System directory, leaving just the backup files for the data partitions, since you no longer need the System backups for, say, openSUSE v1.5 or something like that.:wink:

Unison has a nice GUI interface that will show files that have been deleted, files that are new, files that are changed and give you the opportunity to make your decisions about them.

I find this method to be of Essential Value in my computing environment.

As for Cloud, or anything else like that, I refuse to let unknown people store any of my systems or data.