Sync tool (with "live diff")

Just starting with openSUSE I need to have a backup tool
(preferably GUI at KDE).
It doesn’t has to be sophisticated - for me it’s sufficient
to have regulary syncs (of selected dirs and files) to an external HDD.

At Windows (NTFS, no journaling) I made the experience that doing that
by comparing the sync objects (source vs. sync) takes very long
(for large numbers of objects) and by that also strains the disks.

I heard about MacOS using a more clever method by “live-logging”
all operations on the selected objects; and when it comes
to sync to “just re-do” these ops at the sync destination.
I didn’t find anything alike for Windows so I wrote an app for myself.
(Typical time usage for a weekly sync on 7 TB data:
2 hours for complete diff, 10 mins with my tool).

So my question (finally! :slight_smile: ):
How do I do best at openSUSE (ext4)?
Will the effect by diff be the same as at Windows?
(ext4 makes journaling already at last…)

What tool would be the appropiate for me?
Syncthing, Back in Time, … ?

Take a look at rsync. It’s designed to do just that. Set up a cron job to run it automatically.

I’m sure there are front-ends for it (or tools to build a command line based on the options you want). The *nix way of doing things often is to combine multiple tools that meet specific needs to handle a job.

A quick search turns up grsync as a front-end that runs in KDE.

Yes, I already “heard” about rsync (and its GUI grsync).
But as far as I know it just does the not very clever diff-on-all (see above).
I.e. no “live-logging”, and moreover doesn’t recognizes rename, moving etc.
Maybe I’m wrong?

I’ve only used rsync, for years. Reliable, many options, simple to use.

Some GUI tools (like GrSync) are simply a front-end to rsync.

If you have to have a GUI tool, I’d suggest you research Back-in-Time, GrSync, TimeShift.

Keep in mind, many tools come and go, so be sure whatever tool you consider, is in active development.

Yes, thank you.
I do not doubt at all rsync is reliable and all…
My question is just about ressource usage (time, and by that disk straining, see above):
Does rsync uses ext4 journaling entries, or does it a full diff (= “no memory”)?

rsync is file system agnostic. It acts on the directories and files as defined by the arguments and configurations.

That’s what I suspected.
As there are so much dependencies (ext4 vs. NTFS, and the file system access code)
best I will just try some apps and see how long they take…

I have no idea what you are talking about or what you are after. In any case NTFS is not a Linux file system type and while rsync will most probably copy files from it as from all others, the faked owner/group/ permissions will be copied with it to the backup file system (which is, I may hope, a Linux file system), so things are changed then compared with the original.

In any case, in my opinion there are basically two ways to make backups:

  • doing it on the file level and thus copying (part(s)) of the Unix/Linux directory tree (which is of course file system agnostic as all file access);
  • doing it on the file system level and thus copying complete file systems (most often on partitions) which is by many called “cloning” (which is basically also file system agnostic because you copy all bytes as they are).

Both have their use cases. It depends on you to decide what disasters you want to be prepared for, designing an appropriate backup scenario for it, implement it and test it AND the disaster recovery.

Here’s a nice article about Grsync, which utilizes rsync. Personally, I only use rsync, but a few months ago, I tried Grsync out of curiosity.

You can also try unison. I use it for all my backups and syncs. It’s got (basically) the same feature as rsync, can run on command line in the background (fastest) but also got a generic GUI for setup, troubleshooting etc.

Thank you for the feedback so far! :slight_smile:

Still there seems to be misunderstand of my main aim; so I try again:

At my Windows system there was a “home” folder of about 6 TB.
It’s not the whole folder I need sync, but several folders, several files, here and there.
Moreover at the “system” folder there are some config files I needed to sync too.
For me it was sufficient to do a sync update once a week.
Within this week I typically changed all over the places files, folders etc.
Create, change, delete, rename, move, …
All in all about say 200 changes.
When it comes to the weekly sync a typical sync app in Windows
checked all places (folders etc.) entered in the app’s settings.
If I choose to check “content wise” this took about 2 - 3 hours.
If I choose to use Window’s archive bit it still tooks more than 1 hour.

So, please keep in mind there are only 200 specific changes I actually did.
If these changes would have been logged (as a journaling fs does),
I could simply go through that list and re-do only these 200 changes to the sync dest.
And even better: If it recognizes the renamings and movings
(which are simple folder (inode) entries) it would speed up even more.
I heard that Apple uses this kind of sync/ backup; but there was nothing for Win.
So I coded this “merchant with a pencil under his ear” for my self:
“DelaSync” (Delayed Sync) was born!

And for the typical week from above it took about 10min in average!

I was pretty sure that at Linux there would exist this kind of clever
sync since decades…
As I’m just a starter at that… I asked you.

How do sync tools in openSUSE do?
Which principle (brute force bitwise comparing for the whole sync set?)?
And how long will my sketched szenario (above) take for that?

The features you’re asking about aren’t, as far as I know, part of rsync, because they would depend on specific filesystem features.

I use rsync to back up data on a weekly basis from my internal drive to a trio of external drives. It runs in the background, and the details are unimportant to me, because (a) it’s in the background, and (b) it’s sufficiently fast for what I’m looking for.

Tracking every file rename and comparing (for example) checksums of every file becomes a highly time-intensive process; at a guess, it would require building a large database of every file on the system, updating it with every single change to any file in the filesystem, and effectively “de-duping” the records on a regular or continuous basis.

What you’re looking to do isn’t really that complex until you add in the constraints on “how”.

You could also increase the frequency at which you run the backup, and that might (depending on usage patterns) reduce the amount of time for each backup.

No. Not at all necessary.
If you have fs journaling (optional at NTFS, always at ext4) everything is there with a few bytes.
You don’t need any checksum, only the merchant and his pencil :slight_smile:

Moreover: As I said: DelaSync did the job. One line per action.
In sum even less, as that merchant’s list could be reduced logically.

There comes a point where “optimization” becomes counterproductive. Try rsync - like I said, I use it to manage multiple terabytes of data, and it handles it just fine. :slight_smile:

Yes, I will do that… “Suck it and see” :slight_smile:

But I will have to use a script for that, as rsync does only for a single folder if I get it right?

I use a script, and I would recommend that. I run it multiple times to do my backups - just my home directory, so it’s not too complex, but if you have multiple directories you want to sync, yes, a separate command for each would be needed.

Personally, I use:

rsync -av /source/ /dest/ > /path/to/logfiles/backup-`date '+%Y-%m-%d'`.log`

For the subsequent comands, I append (>> rather than >)

Turning on compression just increases the time for local copies (read and compress in memory/decompress and write doesn’t really add value nor increase performance).

rsync does use a delta transfer for network-based sync, but for local file sync, I don’t believe it does - diffing binary files and applying them is generally horribly inefficient - part of the reason why source control systems generally recommend against storing binary files in them (the storage requirements grow very rapidly).

The man page is pretty detailed - probably doesn’t go into the full technical details, but it does explain a lot about how rsync works. Worth a read.

1 Like

I use similar to @hendersj for the backup of the backup :wink: I use a remote destination for backup over ssh…

 rsync --msgs2stderr -av \
	  --exclude /some/directories \
	  /source/ \
	  user@remote:/dest/
2 Likes

I just go

unison -ui text sync-this

While in
/home/user/.unison/sync-this.prf
All details can be specified, like this example:

# sync the Firefox-Profile 

root = /home/user/.mozilla/
root = /home/userserver/.mozilla/

# "force" will force direction from the specified root to the other.
# For new installation, to be sure, set force to the "safe" root: 
# force = /home/userserver/.mozilla/

# With the next line it won't bother you with feedback and go fastest:
batch = true

confirmbigdel = true
group = true
owner = true
times = true
ignore = Path .mozilla/firefox/Crash Reports

There’s so many options…
If you skip -ui text you’ll get the GUI.
Same as with rsync you can put it into a script - as I do. Roots can also point to just one file.

Convert ext4 to btrfs and use btrfs send/receive: Search results for 'send/receive @karlmistelberger order:latest' - openSUSE Forums

Not everybody wants or needs btrfs. Different users, different needs.