openSUSE 13.1: Lots of character encoding issues

Hello,

I have a newly installed openSUSE 13.1 with many character encoding issues.

Example 1 - Console:
linux@vm-os131:~/test> mkdir öäüßÖÄÜ
linux@vm-os131:~/test> ls
???

Example 2 - Dolphin under KDE: the folder from Example 1 is displayed correctly, but when I e.g. create a file inside it and want to delete it again, I get “the file or folder xyz does not exist”

Example 3 - Windows-Share: On a share mounted with mount.cifs even in Dolphin the special characters are not displayed, whether or not I use the paramter “iocharset=utf8”.

My Config:

linux@vm-os131:~/test> uname -a
Linux vm-os131 3.11.6-4-desktop #1 SMP PREEMPT Wed Oct 30 18:04:56 UTC 2013 (e6d4a27) x86_64 x86_64 x86_64 GNU/Linux

linux@vm-os131:~/test> locale
LANG=en_US.UTF-8
LC_CTYPE=“en_US.UTF-8”
LC_NUMERIC=“en_US.UTF-8”
LC_TIME=“en_US.UTF-8”
LC_COLLATE=“en_US.UTF-8”
LC_MONETARY=“en_US.UTF-8”
LC_MESSAGES=“en_US.UTF-8”
LC_PAPER=“en_US.UTF-8”
LC_NAME=“en_US.UTF-8”
LC_ADDRESS=“en_US.UTF-8”
LC_TELEPHONE=“en_US.UTF-8”
LC_MEASUREMENT=“en_US.UTF-8”
LC_IDENTIFICATION=“en_US.UTF-8”
LC_ALL=

Setting LANG to de_DE.UTF-8 makes no difference.

Greetings
SH

On 2014-03-31 13:06, SheriffHobbes wrote:
>
> Hello,
>
> I have a newly installed openSUSE 13.1 with many character encoding
> issues.
>
> Example 1 - Console:
> linux@vm-os131:~/test> mkdir öäüßÖÄÜ
> linux@vm-os131:~/test> ls
> ???

Please use code tags for printouts and commands (the ‘#’ button in the
forum editor). See photo

It makes things easier to read, and some times it is critical, the forum
software can make important alterations to the text otherwise.


Example 1 - Console:
linux@vm-os131:~/test> mkdir öäüßÖÄÜ
linux@vm-os131:~/test> ls
???????

Are you using a virtual machine? If so, what type, what host?

What filesystem are you using on Linux, on that “test” directory?

What options did you choose for the installation?


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

Sorry for not using the code tags.

Are you using a virtual machine? If so, what type, what host?

Yes. I’ve seen the same issue on Win2012R2 Hyper-V and VMware Workstation 8. What do you mean by host? I don’t think the underlying hardware the VM runs on can make any difference.

What filesystem are you using on Linux, on that “test” directory?

The default that was offered during installation: ext4.

What options did you choose for the installation?

I chose a standard installation with KDE (no Gnome), German keyboard, everything else en_US.

On 2014-03-31 14:06, SheriffHobbes wrote:
>
> Sorry for not using the code tags.

No, it is common for newcommers not to know about this, so we have to
tell them :slight_smile:

>> Are you using a virtual machine? If so, what type, what host?
>
> Yes. I’ve seen the same issue on Win2012R2 Hyper-V and VMware
> Workstation 8. What do you mean by host? I don’t think the underlying
> hardware the VM runs on can make any difference.

It should not, but maybe it does. Yours is a strange problem, I don’t
know (yet?) what could cause it.

On vmware, the default is autoinstallation controlled by vmware. I
recommend to deselect that, when creating the virtual machine, and tell
it you will do the installation yourself, later. The problem is that it
uses what I think is an obsolete or wrong autoyast profile.

>> What filesystem are you using on Linux, on that “test” directory?
>
> The default that was offered during installation: ext4.

Then that’s not the cause.

>> What options did you choose for the installation?
>
> I chose a standard installation with KDE (no Gnome), German keyboard,
> everything else en_US.

Did you verify the checksum of the installation ISO image you downloaded?

As you are using a virtual machine, did you do the installation using
the ISO file directly, or did you burn the DVD? In the second case, boot
the DVD and select the entry for media self verification. If you used an
USB stick, then you can not.

Did you deselect packages?

Did the installation report problems with any package?

Did you add extra repositories?

Did you run “yast online update”? If not, please do and allow it to
complete all the updates it wants.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

henk@boven:~/test> mkdir öäüßÖÄÜ
henk@boven:~/test> l
totaal 120
drwxr-xr-x  5 henk wij  4096 31 mrt 17:29 ./
drwxr-xr-x 84 henk wij  4096 31 mrt 17:24 ../
drwxr--r--  2 henk wij  4096 13 jul  2013 bestanden/
-rw-r--r--  1 henk wij    21  8 jul  2013 file
drwxr-xr-x  2 henk wij  4096 31 mrt 17:29 öäüßÖÄÜ/
-rw-------  1 henk wij     3 31 mrt  2013 spsp
-rw-r--r--  1 henk wij   805 10 dec 11:50 stderr
-rw-r--r--  1 henk wij 87874 10 dec 11:50 stdout
drwxr-xr-x  2 henk wij  4096  8 jul  2013 wdir/
henk@boven:~/test> cd öäüßÖÄÜ/
henk@boven:~/test/öäüßÖÄÜ> touch aap
henk@boven:~/test/öäüßÖÄÜ> l
totaal 8
drwxr-xr-x 2 henk wij 4096 31 mrt 17:30 ./
drwxr-xr-x 5 henk wij 4096 31 mrt 17:29 ../
-rw-r--r-- 1 henk wij    0 31 mrt 17:30 aap
henk@boven:~/test/öäüßÖÄÜ>

So, works correct here.

This locale does not exist. It should be en_US.utf8.

???

As root

boven:~ # locale
LANG=POSIX
LC_CTYPE=en_US.UTF-8
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=
boven:~ #

as user:

henk@boven:~> locale
LANG=nl_NL.UTF-8
LC_CTYPE="nl_NL.UTF-8"
LC_NUMERIC="nl_NL.UTF-8"
LC_TIME="nl_NL.UTF-8"
LC_COLLATE="nl_NL.UTF-8"
LC_MONETARY="nl_NL.UTF-8"
LC_MESSAGES="nl_NL.UTF-8"
LC_PAPER="nl_NL.UTF-8"
LC_NAME="nl_NL.UTF-8"
LC_ADDRESS="nl_NL.UTF-8"
LC_TELEPHONE="nl_NL.UTF-8"
LC_MEASUREMENT="nl_NL.UTF-8"
LC_IDENTIFICATION="nl_NL.UTF-8"
LC_ALL=
henk@boven:~> 

Full of UTF-8. And I did not do anything imho to change it to this manualy.

You are right, I made a typo when testing (there is no locale directory with this name but IIRC it is mapped internally).

Still symptoms are similar - may be looking under /usr/lib/locale and using one of exact names for a test?

On 2014-03-31 18:46, arvidjaar wrote:
>
> SheriffHobbes;2634147 Wrote:
>>>
> Code:
> --------------------
> > >
> > linux@vm-os131:~/test> uname -a
> > Linux vm-os131 3.11.6-4-desktop #1 SMP PREEMPT Wed Oct 30 18:04:56 UTC 2013 (e6d4a27) x86_64 x86_64 x86_64 GNU/Linux
> >
> > linux@vm-os131:~/test> locale
> > LANG=en_US.UTF-8
> > LC_CTYPE=“en_US.UTF-8”
> --------------------
>>>
>
> This locale does not exist. It should be en_US.utf8.


cer@Telcontar:~> mkdir öäüßÖÄÜ
cer@Telcontar:~> cd öäüßÖÄÜ
cer@Telcontar:~/öäüßÖÄÜ> locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=es_ES@euro
LC_TIME=en_DK.UTF-8
LC_COLLATE=POSIX
LC_MONETARY=es_ES@euro
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=es_ES@euro
LC_NAME=es_ES@euro
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE=es_ES@euro
LC_MEASUREMENT=es_ES@euro
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
cer@Telcontar:~/öäüßÖÄÜ>


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

@Carlos: I always do manual installation, never auto. I installed from ISO, but checked for correct md5sum first. I didn’t deselect packages, only added a few. No errors from the installer. I ran zypper update, it upgraded several hundred packages - no change.

@Henk: I have the same locals for root, the same as user if you change nl_NL for en_US (or de_DE, which makes no difference).

Another strange thing:

linux@vm-os131:~/test> cd öäüßÖÄÜ/
linux@vm-os131:~/test/öäüßÖÄÜ> ls ..
???????

Some further testing:

linux@vm-os131:~> locale
LANG=nl_NL.UTF-8
LC_CTYPE="nl_NL.UTF-8"
LC_NUMERIC="nl_NL.UTF-8"
LC_TIME="nl_NL.UTF-8"
LC_COLLATE="nl_NL.UTF-8"
LC_MONETARY="nl_NL.UTF-8"
LC_MESSAGES="nl_NL.UTF-8"
LC_PAPER="nl_NL.UTF-8"
LC_NAME="nl_NL.UTF-8"
LC_ADDRESS="nl_NL.UTF-8"
LC_TELEPHONE="nl_NL.UTF-8"
LC_MEASUREMENT="nl_NL.UTF-8"
LC_IDENTIFICATION="nl_NL.UTF-8"
LC_ALL=
linux@vm-os131:~> ls test/
???????

For root it gets even stranger: If I do the “ls test/” when I’m looged in via ssh, I get the same results as with user linux. However, when I start KDE as root, I get the bad result not only for the shell, but also for Dolphin (!). However, this time the Windows share is being displayed correctly!! I’m getting mad here! :slight_smile:

The only way to see this folder on the Console correctly is by setting locale to de_DE.ISO-8859-1. However, it doesn’t change the Dolphin problems and I want UTF-8 and not German on the Console (which it does as a side effect).

To begin with: NEVER log in in the the GUI as root.

Then, I doubt that the locale has anything to do with this. But I admit that I do not fully understand what the .UTF8 part realy induces there. In any case the language part (en_US or whatever) is imho not important. It sees that messages are ouput in that language, like:

henk@boven:~> df -h
Bestandssysteem Grootte Gebruikt Besch Geb% Aangekoppeld op
/dev/sda2           20G     5,3G   14G  29% /
devtmpfs           987M      44K  987M   1% /dev
tmpfs              999M      84K  999M   1% /dev/shm
tmpfs              999M     3,4M  996M   1% /run
tmpfs              999M        0  999M   0% /sys/fs/cgroup
tmpfs              999M     3,4M  996M   1% /var/run
tmpfs              999M     3,4M  996M   1% /var/lock
/dev/sda5           20G     5,3G   14G  29% /mnt/B
/dev/sda6           99G      68G   26G  73% /mnt/B/home
/dev/sda3           92G      52G   40G  57% /home
henk@boven:~> LANG=C df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda2        20G  5.3G   14G  29% /
devtmpfs        987M   44K  987M   1% /dev
tmpfs           999M   84K  999M   1% /dev/shm
tmpfs           999M  3.4M  996M   1% /run
tmpfs           999M     0  999M   0% /sys/fs/cgroup
tmpfs           999M  3.4M  996M   1% /var/run
tmpfs           999M  3.4M  996M   1% /var/lock
/dev/sda5        20G  5.3G   14G  29% /mnt/B
/dev/sda6        99G   68G   26G  73% /mnt/B/home
/dev/sda3        92G   52G   40G  57% /home
henk@boven:~>

But when I have a Dutch environment (as I have), I can use any Unicode character in a file name and it will be displayed correctly if there is a font installed which has that Unicode character (else you get replacements like empty boxes or inverted ?).

henk@boven:~/test> mkdir नमस्ते
henk@boven:~/test> l
totaal 124
drwxr-xr-x  6 henk wij  4096  1 apr 10:13 ./
drwxr-xr-x 84 henk wij  4096  1 apr 10:12 ../
drwxr--r--  2 henk wij  4096 13 jul  2013 bestanden/
-rw-r--r--  1 henk wij    21  8 jul  2013 file
drwxr-xr-x  2 henk wij  4096 31 mrt 17:30 öäüßÖÄÜ/
-rw-------  1 henk wij     3 31 mrt  2013 spsp
-rw-r--r--  1 henk wij   805 10 dec 11:50 stderr
-rw-r--r--  1 henk wij 87874 10 dec 11:50 stdout
drwxr-xr-x  2 henk wij  4096  8 jul  2013 wdir/
drwxr-xr-x  2 henk wij  4096  1 apr 10:13 नमस्ते/
henk@boven:~/test>

I guess we are searching in the wrong direction. Personaly, I would check what the file system type is, but you already told us it is ext4. Did you mount it without extra options outside the defauts?

To begin with: NEVER log in in the the GUI as root.

It was only for this test and the machine was not connected to the internet.

I guess we are searching in the wrong direction. Personaly, I would check what the file system type is, but you already told us it is ext4. Did you mount it without extra options outside the defauts?

I think that’s true. Maybe it’s a font issue?

This is my fstab. It’s made by Yast, I didn’t modify it:

linux@vm-os131:~> cat /etc/fstab
/dev/system/swap     swap                 swap       defaults              0 0
/dev/system/root     /                    ext4       acl,user_xattr        1 1
/dev/sda1            /boot                ext4       acl,user_xattr        1 2

Btw, I’ve made two CentOS 6.5 installations in the same virtual environments and they don’t show this behavior.

Did you also do the “Dolphin” test? Can create a file inside those special character folders and move it to trash with Dolphin? What about Windows fileshares (if you have a Windows)?

That is not what the word NEVER implies.

Maybe. But I assume you can create and see those characters in a text file?

Looks normal to me.

What about Windows fileshares (if you have a Windows)?[/QUOTE]
I just started Dolphin, moved into that directory नमस्ते, right clicked and chose Create New > Text file (I never do this normaly). It appeared. I right clicked on it and chose Move to Dustbin. It is now in the dustbin.

I do not have any Windows. But I can imagine that on a non-Linux (or very out-dated Linux) file system, names in Unicode are not stored correctly.

BTW all this is not from a VM.

I also have a little test for you to do. For this it is best to have that directory as only file in a test directory to minimalise output.

stat *

Go one step up and look what view says about the directory test (or how you named it):

cd ..
view test

(You leave view by typing :q and then hit Return).

It actually sounds like your desktop environment has different locale setting than what you see in shell. This would perfectly explain it. When you create file, your GUI program (terminal) translates input keysyms into 8859-1 code points. “mkdir” does not care and creates file. Later “ls” attempts to interpret filenames as UTF-8, and upper part of 8859-1 is invalid character in UTF-8.

It indeed seems that somewhere a non UTF8 encoding is used. That should not be the case in any Linux for a long time already.

But where thus this come from? It seems to be a new installation, thus who has introduced that silly 8859-1 and where?

I think the question is, what the OP means with “Console”. He never mentioned using a GUI program (terminal) AFAICS.
Sounds to me that’s in a text mode console, and maybe the UTF8 support isn’t loaded, causing the problem?
I even think there’s a bug report about that somewhere.

@SheriffHobbes:
Is the problem still there, when you run “mkdir” in xterm, konsole, or gnome-terminal (inside the graphical session)?
Or is this what you’re actually doing anyway?

Btw, at least in konsole, you can change the character coding in the profile’s settings and in the “View” menu. Maybe this is set wrong then?

@Henk: I know what the word NEVER implies, but what could I possibly damage in a root GUI what I couldn’t in a root shell?
Anyway, here’s the answer to your questions: Yes, I see the characters Ok in a program like Kate. The fileshare I use is the same I used in CentOS 6.5, it mounts to a Windows 7.
This is the output of your test:

" ============================================================================                                                                              
" Netrw Directory Listing                                        (netrw v149)
"   /home/linux/test
"   Sorted by      name
"   Sort sequence: \/]$,\<core\%(\.\d\+\)\=\>,\.h$,\.c$,\.cpp$,\~\=\*$,*,\.o$,\.obj$,\.info$,\.swp$,\.bak$,\~$
"   Quick Help: <F1>:help  -:go up dir  D:delete  R:rename  s:sort-by  x:exec
" ============================================================================
../
./
<f6><e4><fc><df><d6><c4><dc>/
.swp

@wolfi323: With Console I mean either connecting via ssh or using Terminal in KDE. It makes no difference. In KDE Terminal I can set UTF-8 or Western European ISO xyz, it doesn’t make a difference either.

There’s another weird thing I’ve found out: In Yast, I can go to system -> language -> details. There I can deselect UTF-8 Encoding. If I do that, the characters in the Console (Shell) are Ok, the Dolphin problems remain.

On 2014-04-01 14:06, SheriffHobbes wrote:

> @wolfi323: With Console I mean either connecting via ssh or using
> Terminal in KDE. It makes no difference. In KDE Terminal I can set UTF-8
> or Western European ISO xyz, it doesn’t make a difference either.

Using ssh means, I guess, using something like “putty” from the host
machine to the guest. That adds another complicated variable to the mix,
but it is interesting that it also fails.

> There’s another weird thing I’ve found out: In Yast, I can go to system
> -> language -> details. There I can deselect UTF-8 Encoding. If I do
> that, the characters in the Console (Shell) are Ok, the Dolphin problems
> remain.

Somehow, your system is in fact using ISO-8859-1 encoding, not utf8,
despite the configuration. My guess is that some important packages
failed to install or configure.

You could try another desktop, instead of KDE. LXDE is small, if not,
try XFCE.


Cheers / Saludos,

Carlos E. R.
(from 13.1 x86_64 “Bottle” at Telcontar)

<f6><e4><fc><df><d6><c4><dc>

This is definitely NOT UTF8 encoded Unicode for öäüßÖÄÜ.
It should be:

b6c3 a4c3 bcc3 9fc3 96c3 84c3 9cc3

(as it is on my system).
The notation in both is different, but I assume you see the hex code in both. When we write them in the same way and rearange the second because of the byte sequence in Intel CPUs, we get

f6 e4 fc df d6 c4 dc
c3b6 c3a4 c3bc c39f c396 c384 c39c

The first is one of the “old” ISO pages (probably the one you mentioned earlier) and can not be interpreted as correct UTF8 and thus leads to the ? on trying it.

What shows here is that it is going wrong on creation, but that it intepreting is done in the UTF8 way.

P.S.
Discussing the “root using GUI” subject would lead us wide off topic. Only thing I like to say is: the more you convince yourself “I know what I am doing” and “I took precautions”, the bigger your surprise will be.