coreutils-8.25-3.3.i586 "cp", "mv", etc.: "value too large for defined data type"

With the current 32-bit Tumbleweed (i586) copying or moving a large file (7GiB) with “cp” or “mv”
leads to the error message “value too large for defined data type”.
Despite of this the file is correctly copied or moved.
The version of the coreutils is “coreutils-8.25-3.3.i586”.
The filesystem type is ext4 with the features “has_journal ext_attr resize_inode dir_index filetype needs_recovery extent flex_bg sparse_super large_file huge_file uninit_bg dir_nlink extra_isize”.

Is this just a cosmetic error (the files are ok after the operation) or is it a time bomb?
Did I configure something wrongly?

Yeah,
Although I haven’t touched a large 32-bit system in a long time,
I wouldn’t be surprised if such a system had file system limitations of 4GB and in some cases 2GB.

It would be a fundamental limitation of 32-bit.

You can use the command “split” to split into smaller files and later “cat” to re-integrate.
The following also describes using 7zip which I haven’t done and looks interesting
http://askubuntu.com/questions/54579/how-to-split-larger-files-into-smaller-parts

TSU

FWIW
Here is an example using split and cat…

7GB file named “BigFile”
You know that you want to avoid a 4GB file size limit.
As a bonus, you know that your file transfer method isn’t very good at transferring files larger than 10MB, so you want to move individual pieces smaller than that. Transferring files locally from one part of your file system to another might not have a problem with large files, but some protocols like HTTP/HTTPS/XML have this performance limitation.

Split your large file into smaller files
You should know that by default your file chunks will be named “xa*” by default, make sure there are no existing files in the same directory that start with “xa”
Now, run the following command which specifies each file chunk is 10MB in size

split --bytes=10MB BigFile

You can inspect the result

ls xa*

Move your pieces to their new location

mv xa* *target_destination* 

Run the following to re-integrate the pieces

cat xa* > BigFile

If you’d rather have only a few (say, 3) big chunks instead of a large number of little chunks, you can run the following instead

split --number=3 BigFile

HTH,
TSU

No, that only depends on the filesystem, and is independent of whether you have a 32bit or 64bit systems.
E.g. FAT16 has indeed a 2 GiB file size limit, and FAT32 has 4 GiB.
The standard btrfs has a maximum file size of 16 EiB (also on 32bit systems) though, ext4’s limit ranges from 16 GiB to 16 TiB.

See Comparison of file systems - Wikipedia.

And the Linux kernel’s file access functions use 64bit integers also on 32bit systems.

So it may be a “bug” in the coreutils, if an application uses int variables for some size value that would be 2 or 4 GiB max on 32bit.
Sounds rather “cosmetic” though, if the files are copied/moved properly.
And to repeat, ext4 does support 7 GiB big files in any case.

I started to write, then deleted a paragraph I was about to write that a file operation would be independent of whatever a file system supports, because although a 32-bit file system can have “extensions” installed to support addressing beyond 32-bit, a file operation like a “move” likely isn’t a streaming operation where the bits would be transferred in smaller chunks (eg standard 256k disk blocks) but would have to be defined as a single “big” file… And, a 32-bit array has a 4GB hard limit. That’s why I stated the 4GB limit, but for other reasons there may be a smaller (eg 2GB) limit.

That said,
My reasoning on why 32-bit core-utils may have a 4GB limit is only a reasonable analysis on general principles on my part, and in no way was based on a special inspection and knowledge exactly how 32-bit core-utils works.

Hence, as observed there is no problem with a 7GB file residing on the file system but a problem if it’s moved, more likely off the existing partition (a move within the same partition may bypass certain restrictions and simply re-makes the file system pointer).

TSU

Seriously?
Loading a 7GiB file completely into memory wouldn’t be possible on 32bit, yes, but it would likely also cause big problems on 64bit systems (especially with less RAM…).

Of course the bits will be transferred in smaller chunks.

A “file copy” basically means reading data from the input file and writing to the output file until there’s no more data left.

My reasoning on why 32-bit core-utils may have a 4GB limit is only a reasonable analysis on general principles on my part, and in no way was based on a special inspection and knowledge exactly how 32-bit core-utils works.

As I said, the Linux kernel uses 64bit integers for file sizes and such stuff (even on 32bit), so there should be no limit (at least no 2/4 GiB limit) unless the file system has one.

And I very much doubt that cp or mv would have such a limit, they do not even have to care about the file size if they only read chunks of data (and I’m sure they do).
I really think this is just a cosmetic problem.
I don’t find this exact error message anywhere in the coreutils source code though, so hard to say what it really means.

I just tested this after today’s updates with coreutils-8.25-3.4.i586, kernel 4.8.3 and the file systems ext4 +XFS. I don’t see such messages.

Hendrik

I have been testing again and found out:
The error message is displayed only,
if “-a” is used with “cp”
or
if “mv” is used across file system boundaries.

With “cp -a” I get this message too and some attributes are missing:

 ls -l test*
-rw-r--r-- 1 hendrik users 7516192768 23. Okt 20:32 test.img
-rw------- 1 hendrik users 7516192768 23. Okt 20:32 test.copy

Do you care to open a bug report?

Hendrik

Sure.
If you could please tell me the right place for this report.
“glibc” for example was at sourceware.org.

The core utils FAQ says:

28 Value too large for defined data type
It means that your version of the utilities were not compiled with large file support enabled. …

Hmm, then why does the main task complete successfully and setting permissions fail?

Of course.When mv is used within the same file system there is no copying at all. Only the file system adimistration is changed (file name altered).

A real mv action across file systems is imposibble, that is why mv reverts to a cp action in those cases. Just for the convenience of the user.

But as with most “automatic” actions, then the user looses contact with reality and does not understand why one mv action is done immediatly while the next takes so much time rotfl!.

Well, maybe.
But this is no explanation for the failure to set permissions after copying and giving the error “value too large for defined data type”.
“mv” doesn’t call the “cp” command but is doing the copy internally.
Probably both programs use the same library function for copying.

After wolfi’s last post,

I took a closer look at the mv utility…

When I finally located the source code, I found that the code didn’t provide any answers…
It only managed support for various “move file” scenarios, eg if the source was a file or folder, and if the specified target was a file or folder and it referenced the c method “movefile.”

No amount of searching the Internet turned up anything that describes the inner workings of a c movefile method, so that’s where I stopped my investigation. IMO it’s almost certainly part of whatever c library is used to compile, if this is the case then that’s who <might> take a look at this, but as I inferred in my earliest posts… If someone is maintaining a 32-bit c library, why would they build something that removes the 32-bit boundaries? Without knowing all the unknown parameters of where that library would be deployed, it would be very risky to assume that something that is more than 32-bit capable yet billed as 32-bit would function everywhere.

So,
Bottom line why I didn’t post earlier is that I don’t know whether documentation exists anywhere to prove/disprove whether what wolfi suggests is correct (that a move or copy c method doesn’t have any concept of the file size, and streams the bits only). And, even if it might be true in some cases (which I still highly doubt) it almost certainly wouldn’t be true always if my guess that the real move functionality is in the library, because then it would depend on which library you’ve used when compiling.

The alternative would be if someone wanted to run the code in debugging mode… maybe.

IMO,
TSU

The movefile() function is part of mv’s source code itself.
coreutils->mv.c
It basically just calls do_move() though, which is part of the same source file, and in turn calls copy(), part of copy.c.

If someone is maintaining a 32-bit c library, why would they build something that removes the 32-bit boundaries?

To make it actually work and/or be useful. :wink:
Files larger than 2 GiB exist for a very long time…

I suggest to open a bugreport at bugzilla.opensuse.org, because I’m not sure, if this is an upstream problem.
I just testet the same operation (cp -a) on the same hardware with Arch Linux i686, coreutils 8.25-2, kernel 4.8.3.
And on Arch I don’t get these error messages. So it might be openSUSE-specific.

Hendrik

Thx for pointing out that movefile() is within the source itself, and that it then references do_move() which then references copy(), that’s all there.

copy() is very big, and I can already see there is info a bugreport would need to include…
I see some method called a “punch hole” I’m not familiar with, and appears to be critical to avoiding a pre-allocation issue on XFS,
And, I see numerous methods specific to various file systems.

So, it seems that an issue might be specific to a file system so should be included in the bug report.

In fact, for anyone has been able to replicate the problem on their machine (particularly on XFS), it might be interesting if the same problem exists on BTRFS or any other file system that’s available. And that would probably also be the case if the problem is seen or not in another distro… it may be important to know what is the file system.

And, although I haven’t seen enough of the code to know how important it is, but I do see code in there which could be relating to streaming(various references to fifo) and when it might be used or not. So, a quick somewhat superficial skim of the embedded comments suggests that streaming <may> be implemented for some operations.

TSU

I tried to debug the issue and got to lib/get-permissions.c (coreutils source code), line 46:

ctx->acl = acl_get_fd (desc);

This function seems to return the error.
But that’s it for today. Maybe more tomorrow…

Hendrik

So it should continue in “libacl” within “acl_get_fd.c”
which in turn is using “fgetxattr”.

…or “acl_from_mode”