Invalid encoding on file names

Hi all,

I’m having problems with files that have special characters, or non standard English characters, on my 11.3 install. I have a lot of files that have Portuguese language characters (like ç, â, ã, etc), however they all now have “(invalid encoding)” at the end of the file name, and the special character was substituted for a question mark.

I’m able to use special characters to write texts (OpenOffice, gedit, browser), however not on files. And now I have noticed that some sites also have this problem.

I’m attaching a few screenshots. My system language is set to english_us with UTF-8, however my environment shows different. My browser (chrome) is also set with Unicode-UTF8.

This is how this thread shows for me - Image
http://imagebin.org/index.php?mode=image&id=154751

File list in Gnome - Image
http://imagebin.org/index.php?mode=image&id=154752

System Settings - Image
http://imagebin.org/index.php?mode=image&id=154753

victor@opensuse:~> grep -v '^#' /etc/sysconfig/language | sed '/^ *$/d' ; set | grep -i lang ; grep -i lang .bashrc | grep -v '^#' ; grep -i lang .profile | grep -v '^#'
INPUT_METHOD=""
RC_LANG="en_US.UTF-8"
RC_LC_ALL=""
RC_LC_MESSAGES=""
RC_LC_CTYPE=""
RC_LC_COLLATE=""
RC_LC_TIME=""
RC_LC_NUMERIC=""
RC_LC_MONETARY=""
RC_LC_PAPER=""
ROOT_USES_LANG="ctype"
AUTO_DETECT_UTF8="no"
INSTALLED_LANGUAGES="pt_BR,en_US"
GDM_LANG=C
LANG=C
grep: .profile: No such file or directory

Any help is appreciated.

Thanks,

Vic.

You probably inherited those files from a release where Latin-1 was the charset and now the charset is UTF-8. There are some byte sequences in Latin-1 that are invalid in UTF-8. Just rename those files.

Web pages is a different problem, it’s due to the site not identifying the charset of the page correctly.

But I also have this problem when creating new files. That shouldn’t be the case right?

On 2011-05-24 20:36, victorbrca wrote:
>
> But I also have this problem when creating new files. That shouldn’t be
> the case right?

On what filesystem? How is it mounted? Check the output of “mount” and post
it here.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

Either on NTFS of ext4 I get the same error:

/dev/disk/by-id/ata-WDC_WD5001AALS-00L3B2_WD-WCASY8531358-part5 swap                 swap       defaults              0 0
/dev/disk/by-id/ata-WDC_WD5001AALS-00L3B2_WD-WCASY8531358-part6 /                    ext4       acl,user_xattr        1 1
/dev/disk/by-id/ata-WDC_WD5001AALS-00L3B2_WD-WCASY8531358-part7 /home                ext4       defaults              1 2
#/dev/disk/by-id/ata-WDC_WD10EADS-00L5B1_WD-WCAU47566382-part1 /media/movies        ext3       defaults              1 2
/dev/disk/by-id/ata-ST3500630A_9QG4X821-part1 /media/photos        ntfs-3g    defaults,locale=en_US.UTF-8 0 0

Image
http://imagebin.org/index.php?mode=image&id=154950

On 2011-05-25 04:36, victorbrca wrote:
>
> Either on NTFS of ext4 I get the same error:

On NTFS there are other restrictions. there is a translation involved,
depending on the fstab options used, and how the partition was formatted in
windows.

On ext4, no.

> ‘Image’ (http://imagebin.org/154950)
> [image: http://imagebin.org/index.php?mode=image&id=154950]

No, try with the command “mv” on the command line. A GUI command is not an
acceptable test.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

I can move it and rename from the shell without error, however I see the following two behavious:

1- I cannot type the special character within gnome-terminal; I have to paste it from somewhere else, like gedit
2- It displays the special character with a question mark, however name completion (tab) shows the right character

victor@opensuse:~/Desktop> touch mendonça
victor@opensuse:~/Desktop> ltr
total 760K
-rwxr-xr-x  1 victor users  234 2010-12-08 21:27 Windows.desktop*
drwxr-xr-x  5 victor users 4.0K 2011-04-08 19:17 ToMove/
drwxr-xr-x  9 victor users 4.0K 2011-05-11 14:25 Photos/
drwxr-xr-x 12 victor users 4.0K 2011-05-23 19:18 Temp/
-rw-r--r--  1 victor users 715K 2011-05-24 21:42 Screenshot-Speed Dial - Google Chrome.png
-rw-------  1 victor users  26K 2011-05-24 21:42 bookmrxs
-rw-r--r--  1 victor users    0 2011-05-25 15:46 mendon?a
victor@opensuse:~/Desktop> rm mendonça *## mame completion used here*
victor@opensuse:~/Desktop> ltr
total 760K
-rwxr-xr-x  1 victor users  234 2010-12-08 21:27 Windows.desktop*
drwxr-xr-x  5 victor users 4.0K 2011-04-08 19:17 ToMove/
drwxr-xr-x  9 victor users 4.0K 2011-05-11 14:25 Photos/
drwxr-xr-x 12 victor users 4.0K 2011-05-23 19:18 Temp/
-rw-r--r--  1 victor users 715K 2011-05-24 21:42 Screenshot-Speed Dial - Google Chrome.png
-rw-------  1 victor users  26K 2011-05-24 21:42 bookmrxs

On 2011-05-25 22:06, victorbrca wrote:
>
> I can move it and rename from the shell without error, however I see the
> following two behavious:

Ok, so that means the filesystem is alright. You have problems with the
user interface.

> 1- I cannot type the special character within gnome-terminal; I have to
> paste it from somewhere else, like gedit
> 2- It displays the special character with a question mark, however name
> completion (tab) shows the right character

Interesting.

What is the output of the command ‘locale’? Paste it here.

I just tried to type the same filename as you in my gnome-terminal: no
problem. I have the US locale, though (actually, a mixture of US and ES
locales).

Gnome uses the same locale as the CLI, so it is important it is correctly set.

I’m also testing another user with the es_ES locale …] No problems in
gnome-terminal.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

I was thinking it’s a discrepancy somewhere within my system variables… Look at the diffence between two outputs:

victor@opensuse:~> grep -v '^#' /etc/sysconfig/language | sed '/^ *$/d'
INPUT_METHOD=""
RC_LANG="en_US.UTF-8"
RC_LC_ALL=""
RC_LC_MESSAGES=""
RC_LC_CTYPE=""
RC_LC_COLLATE=""
RC_LC_TIME=""
RC_LC_NUMERIC=""
RC_LC_MONETARY=""
RC_LC_PAPER=""
ROOT_USES_LANG="ctype"
AUTO_DETECT_UTF8="no"
INSTALLED_LANGUAGES="pt_BR,en_US"
victor@opensuse:~> locale
LANG=C
LC_CTYPE="C"
LC_NUMERIC="C"
LC_TIME="C"
LC_COLLATE="C"
LC_MONETARY="C"
LC_MESSAGES="C"
LC_PAPER="C"
LC_NAME="C"
LC_ADDRESS="C"
LC_TELEPHONE="C"
LC_MEASUREMENT="C"
LC_IDENTIFICATION="C"
LC_ALL=

Tried changing it to “en_US.UTF-8” and creating a folder, shell automatically substituted “ç” for a “?” as I typed it:

victor@opensuse:~/Desktop> echo $LANG
C
victor@opensuse:~/Desktop> LANG="en_US.UTF-8"
victor@opensuse:~/Desktop> echo $LANG
en_US.UTF-8
victor@opensuse:~/Desktop> locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
victor@opensuse:~/Desktop> mkdir mendon?a

Tried setting GDM_LANG and then running nautilus and repeating test (create new folder named “mendonça”)… same issue.

victor@opensuse:~/Desktop> set | grep -i lang
GDM_LANG=C
LANG=en_US.UTF-8
victor@opensuse:~/Desktop> GDM_LANG="en_US.UTF-8"
victor@opensuse:~/Desktop> export GDM_LANG
victor@opensuse:~/Desktop> nautilus

On 2011-05-26 07:06, victorbrca wrote:
>
> I was thinking it’s a discrepancy somewhere within my system
> variables… Look at the diffence between two outputs:
>
>
> Code:
> --------------------
> victor@opensuse:~> grep -v ‘^#’ /etc/sysconfig/language | sed ‘/^ *$/d’
> INPUT_METHOD=""
> RC_LANG=“en_US.UTF-8”
> RC_LC_ALL=""
> RC_LC_MESSAGES=""
> RC_LC_CTYPE=""
> RC_LC_COLLATE=""
> RC_LC_TIME=""
> RC_LC_NUMERIC=""
> RC_LC_MONETARY=""
> RC_LC_PAPER=""
> ROOT_USES_LANG=“ctype”
> AUTO_DETECT_UTF8=“no”
> INSTALLED_LANGUAGES=“pt_BR,en_US”
> --------------------

More or less typical, I have nearly the same.


cer@Telcontar:~> grep -v '^#' /etc/sysconfig/language | sed '/^ *$/d'
RC_LANG="en_US.UTF-8"
RC_LC_ALL=""
RC_LC_MESSAGES=""
RC_LC_CTYPE=""
RC_LC_COLLATE="POSIX"
RC_LC_TIME=""
RC_LC_NUMERIC=""
RC_LC_MONETARY=""
ROOT_USES_LANG="ctype"
AUTO_DETECT_UTF8="no"
RC_LC_PAPER="es_ES@euro"
INSTALLED_LANGUAGES="en_GB,en_US,es_ES"
INPUT_METHOD=""
cer@Telcontar:~>


The ‘locale’ setting is what mandates your user session.

Code:

victor@opensuse:~> locale

LANG=C
LC_CTYPE=“C”
LC_NUMERIC=“C”
LC_TIME=“C”
LC_COLLATE=“C”
LC_MONETARY=“C”
LC_MESSAGES=“C”
LC_PAPER=“C”
LC_NAME=“C”
LC_ADDRESS=“C”
LC_TELEPHONE=“C”
LC_MEASUREMENT=“C”
LC_IDENTIFICATION=“C”
LC_ALL=

And that is your problem.


cer@Telcontar:~> locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC=es_ES@euro
LC_TIME=en_DK.UTF-8
LC_COLLATE=POSIX
LC_MONETARY=es_ES@euro
LC_MESSAGES="en_US.UTF-8"
LC_PAPER=es_ES@euro
LC_NAME=es_ES@euro
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE=es_ES@euro
LC_MEASUREMENT=es_ES@euro
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
cer@Telcontar:~>

Mine is atypical, because some settings are US, some are ES (and one DK).
But it is correct, it works (I can explain how I do it, when the below
works for you).

The main setting is done in “/etc/sysconfig/language”, and is used in
“/etc/profile.d/lang.sh”.

You should have RC_LANG=“en_US.UTF-8”, or the equivalent for your language,
and then, log in again.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

Carlos E. R. wrote:
> LC_TIME=en_DK.UTF-8
>
> Mine is atypical, because some settings are US, some are ES (and one DK).

Yes, it’s annoying to have to use DK for that. Sometimes I’ve had to
generate that locale just to get the time.

On 2011-05-26 12:09, Dave Howorth wrote:
> Carlos E. R. wrote:
>> LC_TIME=en_DK.UTF-8
>>
>> Mine is atypical, because some settings are US, some are ES (and one DK).
>
> Yes, it’s annoying to have to use DK for that. Sometimes I’ve had to
> generate that locale just to get the time.

I use .i18n for those changes :slight_smile:

I don’t remember who told me to use en_DK for the time, but it is fortunate
that it is so easy.


Cheers / Saludos,

Carlos E. R.
(from 11.2 x86_64 “Emerald” at Telcontar)

Gracias for all the replies Carlos. I figured out the solution… which surprised me.

I made a few changes in my system… including “/etc/profile.d/lang.sh”, “/etc/X11/xim.d/none” and “INPUT_METHOD” in “/etc/sysconfig/language”… and surprisingly what fixed was to use “LANG=en_US” without UTF-8.

So two options:

1- Change for user in $HOME/.i18n - export LANG=“en_US”
2- Change system wide with in /etc/sysconfig/language - RC_LANG=“en_US”

Vic.