Problem with french character display

Hi,

I am working on a English (US) KDE installation of opensuse 11.2 with a US keyboard and I have some troubles to configure my system in order to work on some French documents using Kate, Kwrite or Eclipse.
For instance, when I save the content of this link http://www.cecill.info/licences/Licence_CeCILL_V2-fr.txt on my hard-drive and open the file in Kate, Kwrite or Eclipse, all accentuated characters are displayed as black lozenge signs with a ? inside. What should I fix in my configuration to have those characters show correctly?

A related but maybe slightly different situation is the following: In Eclipse, when I use a French keyboard layout in Eclipse (using the shortcut Ctrl + Alt + K after setup in Configure Desktop > Regional & Language), I can type and display accentuated characters correctly. However, as soon as I spell check my document (I believe Eclipse use aspell for that), all the words containing the accentuated characters that I typed are shown as being wrong. Eclipse suggests some corrections that show up with black lozenge signs with a ? inside… frustrating :\

Any idea would be appreciated. Please, forgive me if some important setting information are missing, I am fairly new to the Linux world.

Sebastien

You say “when I save …”. That is the crucial moment. You do not tell how you save (from which program). In any case, when I click the link and open in Kwrite, I see the bad characters. I am also warned that it is opened with the assumption that it is UTF-8 (encoded Unicode), but that non-UTF-8 bytes where found. Thus it seems that your document is created in e.g. Latin-1, but nevertheless interpreted as UTF-8 encoded Unicode.

Also remember that Linux is UTF-8 Unicode by default and when you imported this file from a non Linux system (as I stated above, you do not tell where it originated, thus this is a guess), this could be a problem. Conversion on your Linux system is of course possible.

EDIT: I downloaded it and opened with vi. vi sees that it is Latin1 (vi tries to guess by interpreting all the bytes before it decides what to use).

Hi,

I used firefox 3.5.9 on my opensuse box and went to Licenses. I then right-clicked on the “as a plain text document” link in front of License in French version 2 and selected “saved the link as …”.

Does that mean that I should change the default encoding system somehow?

It means imho that the site you load from uses ISO 8859-1 (Latin-1) as character set (it does so also in the HTML page you refer to). And the plain text file is downloaded as it is.

That HTML page isn’t a problem because the web-site declares it (in te HTTP protocol annd insitd the page itself) that it is ISO 8859-1 and thus your web-browser knows what to do.

The text file as stored on your system is just bytes. It has no external property on the system (though we now) what it is. It also has no defined way of telling inside the file. Now the software could do an intelligent guess and that is what vi does. It seems that Kwrite doesn’t, but just assumes it to be the Linux (and nowadays worldwide accepted) default UTF-8 encoded Unicode.

When this is just one (or maybe a few) of these files I can tell you how to change them to UTF-8 Unicode. This is however not an automated process to be done to hundreds of files (maybe there is such a method, but we have o search for it.

Open a terminal (e.g. konsole).
Make the directory where the file is your working directory:

cd <the-directory>

and open the file with vi:

vi Licence_CeCILL_V2-fr.txt

then type:

:set

and hit return. You will now see at the bottom:

:set
--- Options ---
  helplang=nl         ruler               showmatch           ttymouse=xterm2
nomodeline            scroll=14           ttyfast
  backspace=indent,eol,start
  fileencoding=latin1
  fileencodings=ucs-bom,utf-8,default,latin1
Press ENTER or type command to continue

And you see that the preesent encoding is latin1. Now type:

:set fileencoding=utf-8

which will convert all special latin1 bytes into multy-byte utf-8 codes.
Then type:

:wq

to write the file back and to quit vi.

It should be in UTF-8 now.

pomchip wrote:
> Any idea would be appreciated.

i admit it a kludge, but what i did works here:

  1. go to http://www.cecill.info/licences/Licence_CeCILL_V2-fr.txt (i
    used firefox, but i suppose most would work)

  2. with mouse drag to copy text to clipboard

  3. open KWrite (i guess other text editors or maybe even OpenOffice
    Writer would work)

  4. paste text into the open, blank document

  5. name and save doc…done.

no, i don’t know why it didn’t work the more obvious way (which you
tried)–but i suspect if you put a network sniffer on the line and
closely inspect the stream from www.cecill.info i guess you will see
it specifies a ‘code-page’ or unicode whatzit that tells the browser
how to render with correct French do-dads [sorry for all the technical
lingo:]…

> Please, forgive me if some important setting information are
> missing, I am fairly new to the Linux world.

-=welcome=- fairly new to the Linux world…and, do NOT give up as you
proceed…there are thousands of examples of little ways that make it
hard for all of the computers and languages on earth to work together…

hang around and ask more when you find other frustrations in path to
happiness with Linux…do NOT take this as a brush off: you might
find a real (non-kludge) solution is known and available to you in our
French lingo sister site, here:
http://forums.opensuse.org/language-specific-forums/francais-french/


DenverD (Linux Counter 282315)
CAVEAT: http://is.gd/bpoMD
posted via NNTP w/TBird 2.0.0.23 | KDE 3.5.7 | openSUSE 10.3
2.6.22.19-0.4-default SMP i686
AMD Athlon 1 GB RAM | GeForce FX 5500 | ASRock K8Upgrade-760GX |
CMedia 9761 AC’97 Audio

On 2010-06-11, pomchip <pomchip@no-mx.forums.opensuse.org> wrote:
> I used firefox 3.5.9 on my opensuse box and went to ‘Licenses’
> (http://www.cecill.info/licences.en.html). I then right-clicked on the
> “as a plain text document” link in front of License in French version 2
> and selected “saved the link as …”.

No such thing like a “Plain text document” if you need anything more than
ascii. Any use of accents in a text file will assume some coding. And since
the coding is not specified in the document, most of the time you’re left to
guess which one was used. You might even stumble on a file that contain a
mixture…

> Does that mean that I should change the default encoding system
> somehow?

No, what you’re doing is OK. Save the file to your hd, and try to open it
with your text editor. Just don’t just click on the file. Start Kwrite and
use File/Open. In that menu, you can select an encoding manually.

If you need to keep that text for reading, print it to a PDF. If you need to
edit it, use copy-paste (from the web page or from kwrite) to put it in a
word processing document (ie. OpenOffice Write). At least, those specify
what is in them.


Any time things appear to be going better, you have overlooked
something.

You normally have to configure documents as French in each application in order to get the correct Aspell dictionary to work.

But I have also had a recent problem with the Aspell French dictionary suddenly not working with documents where it worked satisfactorily in the past.