SOLVED: What's with "utf8=true" in the mount for vfat?

Hello

I’m curious about the option “utf8=true” that so may folk add into the advice they give here for mounting a fat32 / fat / vfat partition in openSUSE. It doesn’t exist as an option in the man pages for mount. Only these four options exist: “utf8”, “utf8=no”, “utf8=false” and “utf8=0” BUT NOT “utf8=true”.

So I suppose that using the false option “utf8=true” is like using any false option e.g. “rainbows=red” with the result that they are ignored. Thus it seems to me on first glance that using any of the non-existent options like “apples=pears” or “utf8=true” will have the reverse of the effect that was intended, i.e. it will leave utf8 disabled.

Am I making sense or am I missing something?

thanks
swerdna

PS: oh and can someone explain in simpler terms what’s intended by enabling utf8? I don’t understand enough about this stuff to figure it out easily.

I don’t see any reference to utf8=true in the doco. It’s not clear if utf8=true will have the true ignored and be the same as plain utf8, or utf8=anything will mean utf8 is off. I suspect the former.

For clarity I would suggest only these options in any howto you are writing: utf8 or utf8=0.

I think that was a poor specification by whoever put the utf8 support in. I would have preferred utf8 and noutf8 as the choices.

UTF8 is a way of encoding Unicode characters with one or more octets. Unicode handles all the world’s languages. utf8 is preferred unless there is some reason you have to deal with older encodings like iso8859-X (latin1) in existing filenames. Also utf8 is preferred in application messages, encoding web pages and such. It will futureproof your development where language handling is concerned.

SOLVED: Thanks ken_yap. I will stop advising utf8=true and advise only utf8 in future.

For interest: the Ubuntu forums advise this option: “iocharset=utf8”, is that an alternative to the option utf8, producing the same result, or does it have a different effect?

That appears to be a different option:

Character set to use for converting between 8 bit characters and 16 bit Unicode characters. The default is iso8859-1. Long filenames are stored on disk in Unicode format.

If you are starting to use utf8 then this would be advisable. It’s not clear exactly what this means, I’d have to look it up, but I suspect this refers to filenames that could have iso8859-1 characters in them. The first 128 codes of ASCII, ISO8859-1 and UTF8 are identical. For the European characters with diacritics (accents), they are different between ISO8859-1 and UTF8. So a conversion has to be done going from ISO8859-1 to UTF8.

This difference between ISO8859-1 and UTF8 is the reason why you sometimes see web pages with odd characters where left quotes and right quotes would normally appear, I think accented A is one of those interlopers. The text has been stored in ISO8859-1 and rendered in UTF8 or vice versa.

Thanks again.

I think that I’ll just leave reference to iocharset out altogether (too esoteric).

Reason for the iocharset=utf8:

On vfat and I think possibly NTFS and Joliet disks, what you write as a file name is not necessarily what is stored due to the Microsoft legacy. There is some name mangling going on. Capitalized and uncapitalized versions of file names coexist as well as long file names and the 8+3 filename. Try storing the file “AAO” in upper case and then the file “ÅÄÖ,” also in upper case, on a vfat disk without the iocharset variable set. See what “ls” returns.

The name iocharset is poorly chosen and doesn’t really have to do with input and output but is used as part of the file name mangling procedures.

I see in man mount there is also an option,shortname=, available to handle the 8+3 filenames.

Thanks for that thisoldman. I’m going to try that. I just plugged in a usb pen drive and ran sudo mount to see where it was and got this return:

/dev/sdd1 on /media/disk type vfat (rw,nosuid,nodev,shortname=lower,flush,utf8,uid=1000)

The interesting things for me here are the “shortname=lower” and the “utf8” options that the devs have programmed in. They clearly agree with ken_yap that “utf8” is the go.

I’ll get back to you on the iocharset experiment you suggested.

OK I tried the mount experiment.

If I have the mount with no mention of utf8 in the options and then copy a file named ÅÄÖ.txt onto it, then ls returns ÅÄÖ.txt.
If I then unmount it and remount it with the option “utf8” applied, ls then returns Ã?Ã?Ã?.txt.
Then if I unmount it and remount it with the option “iocharset=utf8” applied, ls returns Ã?Ã?Ã?.txt.

If I delete that file and mount the drive with the option “utf8” applied and re-add the file, then no matter what I have relating to utf8 in the mount subsequently, including nothing, it is faithfully reproduced by ls.

So I think I should advise always to use the option “utf8” as a future proofing of the filesystem (to quote ken_yap).

Does that all make sense?

PS @ken_yap I just used the ÅÄÖ.txt experiment to check whether “utf8=true” is recognised by the mount command as a legitimate option, and it is.

Useful experiment. I’ll try to remember these useful results. :slight_smile:

The thing that I don’t understand about the iocharset documentation is that it talks about 16 bit Unicode characters. That would be utf16 (which comes in Little Endian and Big Endian variants BTW), another way of encoding Unicode characters. I have some vague memory this has to do with how M$ stores Unicode characters in their filesystems but I never looked into it. So it’s all M$'s fault. :slight_smile:

Let’s whip them and shoot them to death like patriotic Linuxers rotfl!

Picked this up from an Arch Linux bug report discussion about UTF-8 and iocharset.
*
Comment by Roman Kyrylych…*
I have Russian Windows XP which stores files on FAT32 filesystem in cp866
codepage=866,iocharset=utf8 works great with uk_UA.UTF-8 locale.
Linux stores all files on FAT32 in cp866 as defined by codepage, but for displaying filenames it displays them in UTF-8 as defined by iocharset=utf8.
When I create file with Cyrillic chars in filename from Linux - Windows sees them correctly.