How do I set locale

Hi!

When I update the system, I get get a warning after certain packages have been installed (e.g. LibreOffice, VLC etc) that says:


/etc/profile.d/lang.sh: line 80: warning: setlocale: LC_COLLATE: cannot change locale (en_DE.UTF-8): no such file or directory
/etc/profile.d/lang.sh: line 76: warning: setlocale: LC_CTYPE: Cannot change locale (en_DE.UTF-8): no such file or directory

When I click on a odt-file named

Att göra.odt

to open it with LibreOffice, I’m informed that the file named

Att g??ra.odt

does not exist.
However, I can open the file with Calligra.

Any advice in how I can set locale?

I’m on Tumbleweed, KDE Frameworks 5.36.

I am not sure that I understand all of what is going on in this area. Thus some observations that may help or not.

The problem you see with the incorrect interpreted file name is IMHO not so much about locale, but about character encoding. All file names in Linux should be encoded in UTF-8 since already a long time. Apparently some software here interprets the name as having another encoding.

The expression en_DE in your variables is a bit strange. I would expect de_DE for German as used in Germany. As you show it, it is English as used in Germany. I have no idea what the meaning is in this particular case and if many programs will support it. For me it is much clearer what e.g. en_AU means: English as used in Australia.

Thanks for helping out!

I found in KDE’s System settings -> Personalization -> Regional settings, that I have set:

  • Language: American English
  • Formats: Region Germany - English (en_DE)

From what I understand, having chosen en_DE means that the format for numbers, currency, date and time etc will be according to German customs, but the content written in English. For example, here’s the difference between three different formats:


en_US:
Time: Sunday, August27, 2017 3:02 PM
Currency: 24 €

en_DE: 
Time: Sunday, 27 August 2017 15.02  
Currency: $24

de_DE:
Time: Sonntag, 27. August 2017 15.02
Currency: 24 €

In other words, I want to read in English, but the content formatted according to German customs.
This is the only place where I’ve found that en_DE is being used.

YAST
I’ve also found in YAST → System → Language, that I have “English (US)” set as primary language.
When I click Details → Detailed Locale Setting, I can further choose between a number of locales, such as en_AU, en_GB, en_NZ etc.

If I switch primary language to “German - Deutsch”, and then click on Details → Detailed Locale Setting, I can choose between de_AT, de_BE etc, but there is no en_DE.

Final note, in YAST → System → etc/sysconfig Editor → System → Environment → Language, there are a bunch of settings, among others RC_LANG, RC_LC_CTYPE and RC_LC_COLLATE.

Since I don’t know what they stand for, I haven’t dared to experiment changing them. Under RC_LANG there is a explanation that goes:


File: /etc/sysconfig/language

Possible Values: POSIX, ca_ES.ISO-8859-1, ca_ES.UTF-8, cs_CZ.ISO-8859-2, cs_CZ.UTF-8, da_DE@euro, da_DK.ISO-8859-1, da_DK.UTF-8, de_DE@euro, de_DE.ISO-8859-1, de_DE.UTF-8, el_GR.ISO-8859-7, el_GR.UTF-8, en_GB.ISO-8859-1, en_GB.UTF-8, en_IE@euro, en_IE.ISO-8859-1, en_US.ISO-8859-1, es_ES@euro, es_ES.ISO-8859-1, es_ES.UTF-8, fr_FR@euro, fr_FR.ISO-8859-1, fr_FR.UTF-8, gl_ES@euro, gl_ES.ISO-8859-1, gl_ES.utf-8, hr_HR.ISO-8859-2, hu_HU.ISO-8859-2, hu_HU.UTF-8, it_IT@euro, it_IT.ISO-8859-1, it_IT.UTF-8, ja_JP.eucJP, ja_JP.UTF-8, lt_LT.ISO-8859-13, lt_LT.UTF-8, nl_NL@euro, nl_NL.ISO-8859-1, nl_NL.UTF-8, ru_RU.ISO-8859-5, ru_RU.KOI8-R, ru_RU.UTF-8, sk_SK.ISO-8859-2, sk_SK.UTF-8, tr_TR.ISO-8859-9, tr_TR.UTF-8, ko_KR.eucKR, ko_KR.UTF-8, zh_TW.Big5, zh_TW.UTF-8, zh_CN.GB2312, zh_CN.UTF-8 *or any value*

Default Value: 

Configuration Script: OpenOffice.org, groff, ispell, kde, kdm, profiles, susehelp, susewm, tetex, wdm

Description: 
Local users will get RC_LANG as their default language, i.e. the
environment variable $LANG . $LANG is the default of all $LC_*-variables,
as long as $LC_ALL is not set, which overrides all $LC_-variables.
Root uses this variable only if ROOT_USES_LANG is set to "yes"

Currently RC_LANG is set to en_US.UTF-8.

Hope the above is of any help. Any further thoughts/advice is much appreciated!

You are correct about the mix of one language with characteristics of another that is defined in expressions like en_DE.

I am not sure that each and every application will honour such definitions, but KDE might do it.

I also guess you are correct in all your other explanations about the LC_* variables. But as I said in my first post here, I doubt that those has anything to do with your problem. All those are basically about using a certain language and formatting according to the habits of certain region of output.

To put it short: your problem has nothing to do with locale.

You are not complaining about a program that uses the wrong language or format when it wants to tell you something, you are talking about a file name, that is in UTF-8 encoding on your system (coded as such in the directory where it lives in), and that is correct “decoded” by some programs, but wrongly done by at least one other program.
You could start by seeing yourself how it is in the directory with

ls -l <path/to/the/file>

(where you of course have to replace<path/to/the/file> with the correct path).

I will try to recreate your problem on Leap 42.2 (no TW available).
I tried already on 13.1, but no problem there.

This is what I did on a 42.2 (LibreOffice 5.3.3.2).

Start Writer and created a small document. Stored it as test.odt.

From the CLI made a copy

cp test.odt töst.odt

Started Dolphin and surfed to the directory.
Clicked on töst.odt
Result: LO started and opened the document.

Thus I can not recreate it, or I am doing something different from what you do.

Ah, got you!

Here’s the output from the command:


bv@linux-0xse:~> ls -l /home/bv/Documents/Att_göra.odt
-rw-r--r-- 1 bv users 11818 Jan  3  2017 /home/bv/Documents/Att_g??ra.odt

So, my system can’t read certain characters even though I’m using UTF-8 encoding.
I’m still wondering why the warning is that there is no such file as “en_DE”. Are odd characters, such as “å ä ö” part of en_US.UTF-8? (I don’t know if that’s a relevant/meaningful question to ask - “I’m a dentist by profession” so to speak :slight_smile:

Short before going to sleep.

My first reaction: this is ridiculous.
I hope I can come to a better conclusion tomorrow. Or of course there may be others that understand it.

did you change your locale with yast or edited /etc/profile because changing your language in plasma 5 settings is a per user config and installing with zypper (rpm) is done as root so root should not really see your de_EN setting?
another thing is that bash is not really dependent on plasma 5 settings and should see extra latin chars afaik you don’t need utf8 for those as they’re part of the extended ascii char set
now if you tweaked something in yast or changed a system wide setting strange things will happen

ps
I remember a similar issue with cyrilic and windows as cmd.exe uses a specific font that does not have the char for the letter ј (it’s not lattin j) so instead it prints out ? a file named боја.txt is printed out бо?а.txt I’m speculating you changed the default font in konsole to a one that does not have the extra latin gliphs?

a few more thoughts
en_DE is meaningless you can set american english (or canadian) as the default language for plasma 5 and the region Deutschland but that would still use en_US (or en_CA) locale there is no en_DE locale because en_US is not the same as en_CA or en_UK unlike deutsch english is not quite the same world wide what would be the setting if you chose British English en_UK_DE?
there is flow in your logic

did you maybe start plasma 5 as root?

Got some sleep, and thus got an idea.

henk@boven:~/test> LANG=de_DE.UTF-8 ls
a::b       chars  lönekonto  phptest  py      sp      stderr  t-chown     wdir
bestanden  ctest  oo         plaatje  red     sparkz  stdout  unicode     Лшадсщ
ccc        file   photos     ps       script  spsp    sw      urlmetraar  नमस्ते
henk@boven:~/test> LANG=en_DE.UTF-8 ls
a::b       chars  l??nekonto  phptest  py      sp      stderr  t-chown     wdir
bestanden  ctest  oo          plaatje  red     sparkz  stdout  unicode     ????????????
ccc        file   photos      ps       script  spsp    sw      urlmetraar  ??????????????????
henk@boven:~/test>

As you see here, the en_DE gives that bad results.

Thus a apologize for saying that the locale is not the problem.

The LANG variable en_DE.UTF-8 apparently formats the output as if UTF-8 is NOT part of it and uses a different encoding.

Now, which component to blame (and to submit a bug against)?

I am not with I_A. While it looks at first that en_US, en_AU, en_EN, etc. and fr_FR, fr_CN etc. make sense and en_DE does not, this is not the case.

As explained in posts a bit more up. en_DE means to use the English language, but use DE (Germany) as the regio, which means e.g.that English language is used for messages, but that numbers and the like are formatted according to DE habits. 10,000,000.00 vs. 10’000’000,00.

Not used by many, but something expats may love it.
There are definitions for this. See e.g. https://gist.github.com/heftig/4740516

The problem seems to be that the UTF-8 part is not correctly interpreted here.

And see this:

henk@boven:~/test> ls Лшадсщ
Лшадсщ
henk@boven:~/test> LANG=C ls Лшадсщ
????????????
henk@boven:~/test> LANG=C.UTF-8 ls Лшадсщ
????????????
henk@boven:~/test> 

I would have assumed glibc-locale? (This package supplies the various locale definitions.)

I am using:

karl@erlangen:~> locale
LANG=de_DE.UTF-8
LC_CTYPE="de_DE.UTF-8"
LC_NUMERIC="de_DE.UTF-8"
LC_TIME="de_DE.UTF-8"
LC_COLLATE="de_DE.UTF-8"
LC_MONETARY="de_DE.UTF-8"
LC_MESSAGES="de_DE.UTF-8"
LC_PAPER="de_DE.UTF-8"
LC_NAME="de_DE.UTF-8"
LC_ADDRESS="de_DE.UTF-8"
LC_TELEPHONE="de_DE.UTF-8"
LC_MEASUREMENT="de_DE.UTF-8"
LC_IDENTIFICATION="de_DE.UTF-8"
LC_ALL=
karl@erlangen:~> 

You may display available values:

karl@erlangen:~> locale -a | grep -i DE
de_AT
de_AT@euro
de_AT.utf8
de_BE
de_BE@euro
de_BE.utf8
de_CH
de_CH.utf8
de_DE
de_DE@euro
de_DE.utf8
de_IT
de_IT.utf8
de_LI.utf8
de_LU
de_LU@euro
de_LU.utf8
fy_DE
gez_ER@abegede
gez_ET@abegede
hsb_DE
hsb_DE.utf8
ks_IN@devanagari
nds_DE
sd_IN@devanagari
karl@erlangen:~> 

Any of the above should work and you should be able to use them. But you may not define your own values.

Thanks. That is indeed useful information. It shows what is supported on the system (and en_DE is not one of them, thus it falls back to C I guess).

Others do exist and are defined (see my link earlier), but apparently not on the system. I do not know if installing them is possible. E.g. putting the data I pointed too on the system in the correct place could be part of adding that one.
More Google search might help here (I remember I once did and got a pretty good idea how it should work, but forgot most of the details).

I’ve never edited /etc/profile. I have used Yast and KDE’s system settings to change language settings (didn’t know which one was responsible for what parts). But I haven’t changed locale settings in a while, so I don’t know if it is connected or how to backtrack to find the solution.

My Konsole is using the font “Hack”. Don’t know if it is the default font or not.

Here are my outputs:


bv@linux-0xse:~> locale
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_ALL to default locale: No such file or directory
LANG=en_DE.UTF-8
LC_CTYPE="en_DE.UTF-8"
LC_NUMERIC="en_DE.UTF-8"
LC_TIME="en_DE.UTF-8"
LC_COLLATE="en_DE.UTF-8"
LC_MONETARY="en_DE.UTF-8"
LC_MESSAGES="en_DE.UTF-8"
LC_PAPER="en_DE.UTF-8"
LC_NAME="en_DE.UTF-8"
LC_ADDRESS="en_DE.UTF-8"
LC_TELEPHONE="en_DE.UTF-8"
LC_MEASUREMENT="en_DE.UTF-8"
LC_IDENTIFICATION="en_DE.UTF-8"
LC_ALL=

And here is the second output:


bv@linux-0xse:~> locale -a | grep -i DE
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_COLLATE to default locale: No such file or directory
de_AT
de_AT.utf8
de_AT@euro
de_BE
de_BE.utf8
de_BE@euro
de_CH
de_CH.utf8
de_DE
de_DE.utf8
de_DE@euro
de_IT
de_IT.utf8
de_LI.utf8
de_LU
de_LU.utf8
de_LU@euro
fy_DE
gez_ER@abegede
gez_ET@abegede
hsb_DE
hsb_DE.utf8
ks_IN@devanagari
nds_DE
sd_IN@devanagari


Feels like we’re getting closer :slight_smile:

Well, as should be clear by now, there is no “en_DE” locale (so the system falls back to “C”).
IOW, there is no DE (“deutsch”/“german”) variant for english, which should be obvious.

You can set the system’s default locale in YaST->System->Language, in /etc/sysconfig/language (RC_LANG, although I’m not completely sure that this is still used in Tumbleweed) or via localectl.

You can override it for the user in KDE’s systemsettings5, “Region Settings”-> Formats.

Sorry, but that is NOT obvious. It might be true that the original definition only tried to care for cases like Portugese as in Portugal en Portugese as in Brazil (and for LANGUAGE that is still true as far as I can see because as you say there is no German variant of English)), but cases like en_DE, meaning the language English and formatting of date/time, numbers, currency, etc. done to German style are defined (Ilnk to the definition of en_DE somewhere above).

But this particular en_DE is not available (by default or at all?) on openSUSE. And indeed the fallback is to C.

Now as I see it, there is another aspect on what the OP reports.

Dolphin does not seem to have any problem in interpreting the bytes of a file name as UTF-8 and displaying it correct on the screen. Then the OP “clicks on it” and because of the file association with the file name suffix, Dolphin then starts LibreOffice Writer with that file name as parameter. I assume it simply copies the string of bytes into the parameter field of the Writer call. So the only thing that Writer has to do is opening that file. No interpretation whatsoever needed, just copying the argument into the file name field of the fopen() or similar call. But it looks as if Writer is interpreting the bits as if it were pure ASCII (which results in ? on those places were bytes do not have a printable ASCII character) and then concludes that that character string (and not the byte string it got originally) represents no existing file. :frowning:

The OP reports that (the same or similar) file can be opened with Calligra (a program I do not know). Now the questions are:

  • Is Calligra also started by clicking on a file name in Dolphin and using the file name suffix association?
  • When not, is this a case of starting Calligra and then using it’s Open menu item and dialogue?
  • When the second method is user with Calligra, what happens if that second method is used with Writer?

After al, when we compare Writer with Calligra for certain behaviour, we should compare the two cases as similar as possible.

???
It should be obvious that Germany only uses german (“deutsch”, “de”) as official language, not english (“en”).

It might be true that the original definition only tried to care for cases like Portugese as in Portugal en Portugese as in Brazil (and for LANGUAGE that is still true as far as I can see because as you say there is no German variant of English)),

$LANGUAGE sets the language (only), and normally uses just the 2-letter specifier, e.g. LANGUAGE=“de”.

but cases like en_DE, meaning the language English and formatting of date/time, numbers, currency, etc. done to German style are defined (Ilnk to the definition of en_DE somewhere above).

You can set the formatting of date/time, numbers, currency, etc, via the other LC variables, like LC_NUMERIC or LC_TIME.
And you can set the language (independent of the locale) via LANGUAGE.

But this particular en_DE is not available (by default or at all?) on openSUSE. And indeed the fallback is to C.

There is no locale named “en_DE” at all AFAIK.
It is possible to define custom locales though, but I have no idea how.

I don’t think it’s Writer that interprets the string in any way.
But the system (e.g. fopen, or rather the file system) does, and there is a difference in the encoding. UTF-8 uses 2! bytes for certain characters, if you interpret it as ISO-8859 (which the “C” locale does), it’s getting messed up.