How to make sure data is copied correctly?

Hi,

I have a hard drive which I’m making a backup off.

The folders is data from a lot of smaller hard drives copied to this large hard drive - therefore the names.

I copied the data using rsync (rsync -rtuva /mnt/2e95f1fc-5192-442e-ac29-094850059dee/ /run/media/rofe/6b926642-267e-4e14-91d3-82953370aaf1/), both hard drives is EXT4 (1 is 4TB, other is 2TB), but when I do a ‘du -s’ or ‘du -sb’ they differ in size:

linux:/ # du -sb mnt/2e95f1fc-5192-442e-ac29-094850059dee/500G-Large
243860180733 mnt/2e95f1fc-5192-442e-ac29-094850059dee/500G-Large
linux:/ # du -sb run/media/rofe/6b926642-267e-4e14-91d3-82953370aaf1/500G-Large
243860188925 run/media/rofe/6b926642-267e-4e14-91d3-82953370aaf1/500G-Large

linux:/ # du -sb mnt/2e95f1fc-5192-442e-ac29-094850059dee/500G-Small
287493189953 mnt/2e95f1fc-5192-442e-ac29-094850059dee/500G-Small
linux:/ # du -sb run/media/rofe/6b926642-267e-4e14-91d3-82953370aaf1/500G-Small
287493210433 run/media/rofe/6b926642-267e-4e14-91d3-82953370aaf1/500G-Small

linux:/ # du -sb mnt/2e95f1fc-5192-442e-ac29-094850059dee/500G-Small-Moved
287493185857 mnt/2e95f1fc-5192-442e-ac29-094850059dee/500G-Small-Moved
linux:/ # du -sb run/media/rofe/6b926642-267e-4e14-91d3-82953370aaf1/500G-Small-Moved
287493202241 run/media/rofe/6b926642-267e-4e14-91d3-82953370aaf1/500G-Small-Moved

Why is it different in size and how do I make sure that the data was copied correctly?

  • Ronni

Use a code block for machine output. It is easier to read. I’ve requoted but with a code block.

If there are any file with links (hard links), then where there is only one actual copy on the source, there would be two copies on the destination. That would probably account for the size difference.

On 2014-12-22 22:06, nrickert wrote:
>
> rofe;2684690 Wrote:

>> Why is it different in size and how do I make sure that the data was
>> copied correctly?

When I have that suspicion, I use ‘mc’ to display sizes of directories,
and compare both disks using the two panes. When I see a directory that
has different sizes, I get in, and repeat, till I find the culprit.

> Use a code block for machine output. It is easier to read. I’ve
> requoted but with a code block.

Thanks.

> If there are any file with links (hard links), then where there is only
> one actual copy on the source, there would be two copies on the
> destination. That would probably account for the size difference.

-H, --hard-links preserve hard links


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

On Mon, 22 Dec 2014 20:46:01 +0000, rofe wrote:

> Why is it different in size and how do I make sure that the data was
> copied correctly?

In addition to the comment about hard links perhaps having something to
do with it, also remember that du reports disk space usage, not total
file size.

If the block sizes on the drives are different, you’ll probably get
different usage statistics.

If you write a 1 byte file to a disk that has 32 KB blocks, du will
report 32 KB used for that file.

If you write a 1 byte file to a disk that has 64 KB blocks, du will
report 64 KB used for that file.

If you use du with the ‘-b’ parameter, that will give you numbers based
on bytes rather than block sizes. See the du man page for more info.

Jim


Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

On Tue, 23 Dec 2014 01:11:18 +0000, Jim Henderson wrote:

> On Mon, 22 Dec 2014 20:46:01 +0000, rofe wrote:
>
>> Why is it different in size and how do I make sure that the data was
>> copied correctly?
>
> In addition to the comment about hard links perhaps having something to
> do with it, also remember that du reports disk space usage, not total
> file size.
>
> If the block sizes on the drives are different, you’ll probably get
> different usage statistics.
>
> If you write a 1 byte file to a disk that has 32 KB blocks, du will
> report 32 KB used for that file.
>
> If you write a 1 byte file to a disk that has 64 KB blocks, du will
> report 64 KB used for that file.
>
> If you use du with the ‘-b’ parameter, that will give you numbers based
> on bytes rather than block sizes. See the du man page for more info.

To answer the question in the subject, however - when using rsync, use
the -c option to tell it to skip existing files based on checksum. That
means that files that match based on the checksum are skipped, because
they’re determined to be the same file (not based on file size/timestamp
or other ‘quick’ methods, but based on content).

You can also check individual files by just running a checksum with md5sum
or sha1sum on the files individually. If you wanted to script that, you
probably could pretty easily, but the comparison for all files might take
a while.

Jim


Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

another way, at the gui level, is to use the properties from the pull down menu
when the right mouse button is used in dolphin

any number of files and directories can be selected, then select properties, the
output reports; # of directories, # of files, bytes used

if the formatting of the data sources is the same the bytes used can be
exactally the same

sometimes they can be out by a block, its assumed this is due to block and track
boundaries not matching

On Tue, 23 Dec 2014 20:06:01 +0000, keellambert wrote:

> another way, at the gui level, is to use the properties from the pull
> down menu when the right mouse button is used in dolphin
>
> any number of files and directories can be selected, then select
> properties, the output reports; # of directories, # of files, bytes used
>
> if the formatting of the data sources is the same the bytes used can be
>
> exactally the same
>
> sometimes they can be out by a block, its assumed this is due to block
> and track boundaries not matching

File sizes matching isn’t a guarantee, though, that the contents copied
correctly. :slight_smile:

Jim


Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C

On 2014-12-24 00:21, Jim Henderson wrote:

>> sometimes they can be out by a block, its assumed this is due to block
>> and track boundaries not matching
>
> File sizes matching isn’t a guarantee, though, that the contents copied
> correctly. :slight_smile:

No, it is just a double check to see if we missed a file because of some
error we wrote on the command call.

If we see different sizes, we know we goofed. If they are the same, then
we assume rsync did a good job :wink:


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

On Wed, 24 Dec 2014 03:03:07 +0000, Carlos E. R. wrote:

> On 2014-12-24 00:21, Jim Henderson wrote:
>
>>> sometimes they can be out by a block, its assumed this is due to block
>>> and track boundaries not matching
>>
>> File sizes matching isn’t a guarantee, though, that the contents copied
>> correctly. :slight_smile:
>
> No, it is just a double check to see if we missed a file because of some
> error we wrote on the command call.
>
> If we see different sizes, we know we goofed. If they are the same, then
> we assume rsync did a good job :wink:

Sure, that’s reasonable. Just pointing out that if the goal is to ensure
that data is copied correctly (as indicated in the subject line of the
thread), a file size check isn’t sufficient.

But you know this. :slight_smile:

Jim


Jim Henderson
openSUSE Forums Administrator
Forum Use Terms & Conditions at http://tinyurl.com/openSUSE-T-C