interpreting ls output

I’m comparing the directory contents of 2 disks after an rsync.

The following command



ls  -X  -1  --recursive  --size   --all --almost-all -l 




produces output like this.

36 -rw------- 1 david users 34779 May 26 07:56 30154f2a9d7d1c220d3b883f82ab76f2.png


Doing this on each disk, sometimes produces slightly different results.

Q1) Whats the 36 refer to?

Q2) Whats the best way to compare the contents of 2 directories?

I believe that is the size (the number of data blocks allocated to the file).

Q2) Whats the best way to compare the contents of 2 directories?

That depends on what you are looking for.

I think I have sometimes used like this:


ls -l directoryA | sort > file1
ls -l directoryB | sort > file2
diff file1 file2

I’m trying to make an exact copy of a directory onto a second disk.

Why is this number sometimes different on the second disk?

Thank you.

The blocksize could be different, though that is unlikely.

More likely, you are dealing with a sparse file. In a sparse file, you might have a large file size, but a small number of disk blocks. For example, if you created a new empty file, and then you did a seek to position 1mb in to write, you would get a sparse file. Reading that file would return bytes containing zero, until you reached the actual data. But there aren’t any disk blocks for those zeros.

When you copy the file, that copies the zeros. So the destination file won’t be sparse and will have more blocks assigned. The “cp” command and several other commands actually have options for handling sparse files.

you mean the number at the beginning of the output line? I have also no idea, never saw it before. It is not in a simple

ls -l

It must be the result of one of the many options you use. Did you tray to add them one by one util you see this number appear?

BTW, I do not think the -1 option is usefull here. It is when using

ls

because that only shows file names and without the -1 makes several columns if possible. I do not think the longer lines of an ls -l need this.

Also I am not sure why you use the --size. Why not simply let it as it is in bytes?

Another remark from me is that rsync is designed to copy files. I doubt it guarantees that copying a tree of files results in somethng that is exactly a byte by byte equal to the original. It does so for the contents of the individual files. But directories may become organised different I assume, and when the results are on a file system of a different type that is almost sure IMO.

I use rsync to copy from 1 disk to another, identical disks, same file system.

Based on my experience, I know that rsync can be fussy at times. I’ve set the file permissions to be identical on both disks before running rsync.

I run a script where I pipe the output of ls into a file for each disk,then use diff to compare those 2 files.



ls --version
echo "  " 
cd $DIR1
ls  -X  -1  --recursive  --size   --all --almost-all ___David  &>$FILE1
cd $DIR2
ls  -X  -1  --recursive  --size   --all --almost-all ___David  &>$FILE2
cd $DIR3
diff  $FILE1  $FILE2  &>$FILE3


I feel pretty good when the number of files is exact, but the amount of disk space is usually different.

Any suggestions for a better way to get an exact file copy for a 2Tb directory?

Is there any way to massage a file before hand, so that when it is copied, there are no changes to the copies file?

Thank you.

It turns out that “rsync” does have a “–sparse” option.

Based on my experience, I know that rsync can be fussy at times. I’ve set the file permissions to be identical on both disks before running rsync.

I often use

rsync -an -v --delete source-directory/. destination-directory/.

and that works pretty well for me. But it does not get hard links right, though there’s actually an option for that.

I feel pretty good when the number of files is exact, but the amount of disk space is usually different.

The disk space used by a directory will be a variable. And the existence of sparse files will also cause variation.

Any suggestions for a better way to get an exact file copy for a 2Tb directory?

Exactly what do you mean by “exact”?

I am still a bit unsure about your goal (see also http://www.catb.org/~esr/faqs/smart-questions.html#goal).

May be it is just about the terms you use. Sometimes it looks as if you use the terms “directory” and “disk” as synonyms, which they are not. E.g. when you talk about a 2Tb directory, which would be beyond any practical meaning (think of the many Tb all those millions of files administered within such a directory would take alone).

So, is this about a directory with all the files administered by it (the complete tree of directories and files starting from this directory), a file system (mounted at that directory you talk of), a disk partition, a whole disk? And yes, some of these options may turn out te be (allmost) the same. But nevertheless it is better to be sure we all mean the same.



Exactly what do you mean by "exact"?


For me in this case, exact means identical, so that if I were to delete the source directory, I would not lose any data.



Also I am not sure why you use the --size. Why not simply let it as it is in bytes?


I’ll try this.



It turns out that "rsync" does have a "--sparse" option.


I’ll try this too.



May be it is just about the terms you use. Sometimes it looks as if you  use the terms "directory" and "disk" as synonyms, which they are not.  E.g. when you talk about a 2Tb directory, which would be beyond any  practical meaning (think of the many Tb all those millions of files  administered within such a directory would take alone).


Thank you for letting me know my word choice could be better.

By 2Tb directory, I mean a directory with 2Tb of data in it.

This is how my data is organized, everything under one directory. This make the rsync script simple.

Thank you.

Well, rsync is around for several tens of years. I would say you could now trust that it is able to do this basic functionality.
For all the years I have used rsync for all sorts of goals (backup of course amongst that), it never failed to copy files to a place and state where could be recovered.

But everyone to his own hobby of course. It is only I am afraid that you will be busy with all sorts of side effects of the test you are doing (which in themselves may be interesting subjects, like “what is the meaning of that extra number in front of the output lines of ls in certain circumstances”).

BTW, quoting people’s text is done with the QUOTE tags: the button with “speaking cloud” just left of the CODE tags # button.

Yes, that’s what I assumed you meant. And I assume that you mean that in the recursive sense. That is, your count includes the data in subdirectorys.

In a strict sense, a directory contains only names and related info (inode numbers). The data is in a file. I can put a 2Tb file into a directory. But the actual data in the directory is just the name and inode number of that file. It is, of course, convenient to talk as if the data in the file were part of the directory. But many directories are only 4096 bytes in length (as shown by “ls -l”). You would use the “du” command to get the total data, including the what is in files listed by the directory.

I mention this because it relates to some of your questions. If I have a directory with many files, then the size of the directory could be quite a bit larger than 4096 bytes. But maybe I then delete most of the files (or move them to a different directory). The size of the directory stays large, even if it contains only one or two file names – because most file systems are not actively shrinking directories to minimal size. If I now copy that directory to a new disk, the copied directory will be newly created with only the one or two file names. So the size of the copied directory will be much smaller. Does that count as an exact copy?

That last sentence is mostly a rhetorical question. It does not need an answer. But it helps explain why I asked what you mean by “exact copy”.

The discussion is meandering away from the original post, but for me the discussion about rsync is very relavent.

In my experience, when copying a Tb size from-directory to an EMPTY to-directory, rsync fails every time. The exact reason is not clear to me, but I think it’s related to permissions combined with the rsync delete option to remove empty directories.

To get around the rsync stopping, I always do a cp first pass to get the files there, then I do the rsync pass using the md5 compare option.

The rsync --shared option will explored ext.

One more side note. I read an article about an system administrator having to move Peta byte amounts of data, and the associated copy verification effort.
The strategy used was to make two copies of the original directory to the new disk. Then a diff was done between the 3 directories. Only when the 3 diff files were
identical or explained was the copy considered to have very high integrity and only then was the original and the extra copy directory deleted. This the backstory for
interpreting the ls command. I am trying to duplicate this 3 directory copy concept.

I very much appreciate OpenSuse and have weened myself off Windows, where I was using robocopy and teracopy.

On your original thread title and post.

I have the stromg idea that the extra number before the lines has something to do with the --size option. (i already suggested you to do such a test, but you are too late now. I assume the --size option does not replace the size in bytes that is already in the ls -l listing, but is something added.

Also, as I have indicated earlier, I doubt if using ls for the purpose you have is a sound idea. To many unknowns here of which a few have aleady been indicated (sparse files, directory sizes do not shrink).

A few years ago I cooked up a simple little throw-away Ruby script to provide me with exact numbers and sizes of directory contents, comparable between all filesystems and operating-system platforms I’ve encountered.

The script (which I called »sum-it-up«) proved itself quite useful and »fungible«:

#!/usr/bin/ruby -w
# vim:ai:nu:et:sta:sts=4:sw=4

sum, folders, other = 0, 0, 0

d = Dir"**/*"]

d.each do |f| 
    if test(?f,f) 
        sum += File.size(f)
    else 
        if test(?d,f)
            folders += 1
        else
            other += 1
            p f
        end
    end 
end

puts
puts "%20d folders" % folders if folders > 0 
puts "%20d #{other} non-plain files." % other if other > 0 
puts "%20d bytes in #{d.size - folders - other} files." % sum 


Here is what it looks like:


rig:~ ▶ cd code/
rig:~/code ▶ **sum-it-up*** # invoke as ruby scriptname otherwise*
"cmdcontrol/combine"

                 565 folders
                   1 1 non-plain files.
           322593981 bytes in 6154 files.
rig:~/code ▶ _

The one non-plain file is a broken link to a »combine« command. All I’m usully concerned is that the correct number of folders, files and the byte-exact amount of file data are present at two distinct locations.

Note: the script ignores »invisible files« (.directory, .DS_Store, dot-config files etc) which I prefer. One can quickly replace the line »d = Dir"**/"]«* with »d = Dir.glob("**/", File::FNM_DOTMATCH)«* to recursively match those »dot« files as well. Similarly, one could quickly build in any exceptions, path prunings and other stuff like checksumming contents of some or all files.
Maybe this can be of use to you guys as well. Cheers!