How change from UTF-8 to ISO-8859

I’m banging my head here, desperately searched everywhere & tried everything for 2 days now…
Problem is, I’m running OpenSuse 12.1 on a sftp-server with LANG=en_US.UTF-8, and I need to change to ISO-8859 so both Linux & Windows users can send files with Swedish characters in filename.

This is a minimal install, so no X.
I have tried changing language & locale with YAST, still “# locale” shows sv_SE.UTF-8 - although primary language is English!?
sv_SE.ISO-8859-1 seems not to exist, tried with en_US.ISO-8859-1. Yast allows the change, after reboot still sv_SE.UTF-8.
If I manually edit /etc/sysconfig/console /etc/sysconfig/language and reboot I still have sv_SE.UTF-8.
Error message something like “cannot use sv_SE.ISO-8859-1, no such file”
Then I execute "SuSEconfig and locale shows sv_SE.ISO-8859-1 - no error message.
But filenames are still weird, still all UTF-8.

The closest I got is after also creating /etc/bash.bashrc.local with one line “export LANG=en_US.ISO-8859-1”.
After reboot “# locale” shows en_US.ISO-8859-1 - but look what happens:
(Oops, can’t attach image??) I’ll try to describe:

2 files created, one with LANG set to UTF-8, one set to ISO-8859-1 
* ls shows both files with weird 'ÅÄÖ' (Swedish characters) in filename - only uppercase are weird though.
* #ls <tab> - now tab-completion shows files with correct filenames - 'ÅÄÖ' displays correct.
* checking files charset with "#file -bi " shows both are UTF-8
* In WinSCP on Win7 both filenames are weird.
* Files sent in by WinSCP are also weird, charset=ISO-8859-1

And I need to automate file transfer, using WinSCP for that on Windows. Haven’t found any other win-app that works better.
So, what I want to do is switch from UTF-8 to ISO-8859.
Anybody knows how I can do that?

Edit: Oh yes, System Keyboard is set to Swedish.

On 06/13/2012 11:26 AM, pingu 2 wrote:
> Anybody knows how I can do that?

sorry, i can’t directly solve your problem…however, having fought the
problem since '92 in four different operating systems (MS DOS, Windows,
OS/2 and now Linux) i can tell you how to avoid the problem:

never use non-ASCII characters in file or directory names [Swedish i
don’t know, but for Danish i use å=aa, æ=ae and ø=oe, and though the
problem is not solved, it is avoided]

yes, i know that is inconvenient and i hope someone can help you solve
the problem (so far, i have found avoiding it is the easiest for me,
especially over time…that is, you may have to solve it AGAIN and AGAIN
and AGAIN, each time you move to a different OS or distro, or even just
a new version of openSUSE! and don’t forget that you might solve it for
openSUSE 12.1 with ext4 and next week wanna switch to ext3 or brtfs or
LVM or whatever, and find even more problems with non-ASCII characters…)

ymmv


dd

pingu 2 wrote:
> I’m banging my head here, desperately searched everywhere & tried
> everything for 2 days now…
> Problem is, I’m running OpenSuse 12.1 on a sftp-server with
> LANG=en_US.UTF-8, and I need to change to ISO-8859 so both Linux &
> Windows users can send files with Swedish characters in filename.

That sounds like a bad idea. I’m no expert but as I understand it UTF-8
is the future and ISO-8859 is the past. And both Linux and Windows
understand Unicode AFAIK, so it would be better to figure out what is
not working with a Unicode setup. Hopefully somebody who really knows
will come along …

On 2012-06-13 12:05, Dave Howorth wrote:
> pingu 2 wrote:
>> I’m banging my head here, desperately searched everywhere & tried
>> everything for 2 days now…
>> Problem is, I’m running OpenSuse 12.1 on a sftp-server with
>> LANG=en_US.UTF-8, and I need to change to ISO-8859 so both Linux &
>> Windows users can send files with Swedish characters in filename.
>
> That sounds like a bad idea. I’m no expert but as I understand it UTF-8
> is the future and ISO-8859 is the past. And both Linux and Windows
> understand Unicode AFAIK, so it would be better to figure out what is
> not working with a Unicode setup. Hopefully somebody who really knows
> will come along …

I concur, I don’t think it is possible to change the entire system back to ISO.

It would rather be a mapping problem with the application.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

On 06/13/2012 12:43 PM, Carlos E. R. wrote:
> I don’t think it is possible to change the entire system back to ISO.
>
> It would rather be a mapping problem with the application.

excellent point!
look at the configuration of ftp being used…

however: still, my recommendation is to avoid non-ASCII characters


dd

**dd:
Thanks, but can’t do that.
The problem is that lots of people are sending in files, there’s no way we can make them all avoid Swedish characters.
**
djh_novell, robin_listas

Well, me & dd argue against you.
UTF-8 is a major headache when you need to share files between Windows and Linux.
It does not matter what distro & what Windows-version, it’s always the same:
Use Swedish characters in one system, read the file from other system = Swedish characters are corrupted.
UTF-8 might be great when you only use English, but here in Sweden people normally speak & write in Swedish…

But let’s take a closer look, maybe there’s some other way around it.
We’re dealing with very sensitive information sent in from all over Sweden.
To protect this information we have a setup with one SFTP-server running OpenSuse 12.1 as kind of a “flood-gate”.

  • Files are sent in to the SFTP-server. This server can not reach anything outside itself, including Internet.
  • A server running Win2008R2 fetches the files automatically every 3 minutes. This is done with a few small scripts and a “cron-job” on Win-server (whatever the name is for that function in Windows).
  • The actual copying from SFTP-server to Win-server is done with winscp:s commandline (scripted).
    I have created a session in WinSCP:s gui, there I can choose to set UTF-8: Auto, On or Off.
    Auto does the same as **Off, **never recognizes UTF-8, resulting in failure to read filenames correctly -files are not found and thus not handled.
    On makes winscp read filename fine, but when they are copied can’t be written. Error message shows it wants to write file with weird characters in place of Swedish.
    So in short:
    pscp & winscp can either read & write in UTF-8 OR read & write in cp1252.

If I can’t change charset in Suse, can you think of any other way to pull the files from SFTP-server?
We don’t want any unnecessary services running on SFTP-server.
We don’t want to map a drive from Suse - would have to be samba since Windows support for NFS is broken and Dokan doesn’t refresh.
Ideas?

On 2012-06-13 13:05, dd@home.dk wrote:
> On 06/13/2012 12:43 PM, Carlos E. R. wrote:
>> I don’t think it is possible to change the entire system back to ISO.
>>
>> It would rather be a mapping problem with the application.
>
> excellent point!
> look at the configuration of ftp being used…

When we mount Windows filesystems, one parameter is precisely the charset
to use on it.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

pingu 2 wrote:
> DD:
> THANKS, BUT CAN’T DO THAT.
> THE PROBLEM IS THAT LOTS OF PEOPLE ARE SENDING IN FILES, THERE’S NO WAY
> WE CAN MAKE THEM ALL AVOID SWEDISH CHARACTERS.
> *
> DJH_NOVELL, ROBIN_LISTAS
> Well, me & dd argue against you.

Well no, I don’t see anything that DD says that argues against me.

> UTF-8 is a major headache when you need to share files between Windows
> and Linux.
> It does not matter what distro & what Windows-version, it’s always the
> same:
> Use Swedish characters in one system, read the file from other system =
> Swedish characters are corrupted.

Just not true. There are many people who do this all the time. I agree
that it is a headache because you need to understand far more about how
every component of each system works. The key is to make completely sure
exactly what encoding is used at every point in the system, and exactly
what code is responsible for any transcoding needed. As DD said, that
includes all the filesystem implementations.

I don’t use Windows any more and have forgotten pretty much everything I
knew about it. But http://winscp.net/eng/docs/faq_utf8 says “SFTP
protocol specification requires that client and server uses UTF-8
encoding (Unicode) for file names.” So apparently you have no choice. It
insists you use UTF-8. Get used to it.

> But let’s take a closer look, maybe there’s some other way around it.
> We’re dealing with very sensitive information sent in from all over
> Sweden.
> To protect this information we have a setup with one SFTP-server
> running OpenSuse 12.1 as kind of a “flood-gate”.
> * Files are sent in to the SFTP-server. This server can not reach
> anything outside itself, including Internet.

I don’t understand that. How are files sent to the server via SFTP if it
cannot reach the Internet?

> * A server running Win2008R2 fetches the files automatically every 3
> minutes. This is done with a few small scripts and a “cron-job” on
> Win-server (whatever the name is for that function in Windows).
> * The actual copying from SFTP-server to Win-server is done with
> winscp:s commandline (scripted).
> I have created a session in WinSCP:s gui, there I can choose to set
> UTF-8: Auto, On or Off.
> *Auto does the same as *Off, *never recognizes UTF-8, resulting in
> failure to read filenames correctly -files are not found and thus not
> handled.
> On makes winscp read filename fine, but when they are copied can’t be
> written. Error message shows it wants to write file with weird
> characters in place of Swedish.

I think it highly unlikely that WinSCP has such fundamental bugs. It
seems much more likely that you have misconfigured it or misused it or
somesuch.

Instead of saying “weird characters”, you should certainly be able to
tell us exactly what encoding is being used, and then we should be able
to figure out what is going wrong.

Make a detailed list of what character encoding is in use at every step
of the way, and explain what setting of each program accounts for the
encoding that is in use.

> So in short:
> pscp & winscp can -either- read & write in UTF-8 -OR- read & write in
> cp1252.

So if they can read & write in UTF-8, what is the problem?

> If I can’t change charset in Suse, can you think of any other way to
> pull the files from SFTP-server?

I don’t believe you need another way. You simply need to figure out why
the way you are using is not working.
**

On 2012-06-13 13:42, Dave Howorth wrote:
> pingu 2 wrote:
>> DD:
>> THANKS, BUT CAN’T DO THAT.
>> THE PROBLEM IS THAT LOTS OF PEOPLE ARE SENDING IN FILES, THERE’S NO WAY
>> WE CAN MAKE THEM ALL AVOID SWEDISH CHARACTERS.
>> *
>> DJH_NOVELL, ROBIN_LISTAS
>> Well, me & dd argue against you.
>
> Well no, I don’t see anything that DD says that argues against me.

Me neither, I agree with dd and you.

> I don’t use Windows any more and have forgotten pretty much everything I
> knew about it. But http://winscp.net/eng/docs/faq_utf8 says “SFTP
> protocol specification requires that client and server uses UTF-8
> encoding (Unicode) for file names.” So apparently you have no choice. It
> insists you use UTF-8. Get used to it.

Right.

>> On makes winscp read filename fine, but when they are copied can’t be
>> written. Error message shows it wants to write file with weird
>> characters in place of Swedish.
>
> I think it highly unlikely that WinSCP has such fundamental bugs. It
> seems much more likely that you have misconfigured it or misused it or
> somesuch.

Dunno, winscp might be trying to do translation.

Maybe he can use samba to retrieve the files from the ftp server, because
samba will convert the names not valid for Windows. If the server is not
accessible via samba, then import to another Linux machine via sftp (ssh
transport), and from this one, inside the network, via samba to the Windows
machine.

> Instead of saying “weird characters”, you should certainly be able to
> tell us exactly what encoding is being used, and then we should be able
> to figure out what is going wrong.

And what Windows version…


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

The key is to make completely sure
exactly what encoding is used at every point in the system, and exactly
what code is responsible for any transcoding needed.
Exactly what I’m trying to do! But I’m told I can’t do that, encoding is locked to UTF-8.

So apparently you have no choice. It
insists you use UTF-8. Get used to it.

Well, Windows doesn’t use UTF-8…

How are files sent to the server via SFTP if it
cannot reach the Internet?

The server can’t reach out, but sftp/ssh can get in.
(And the Win-server is not reachable from Internet at all.)

I think it highly unlikely that WinSCP has such fundamental bugs. It
seems much more likely that you have misconfigured it or misused it or
somesuch.

I have only configured the “UTF-8” Auto, On, Off - and started with that because filenames were corrupted.

Instead of saying “weird characters”, you should certainly be able to
tell us exactly what encoding is being used, and then we should be able
to figure out what is going wrong.

Sigh…
Encoding on SFTP-server: UTF-8.
Encoding in Windows: cp1252.
Winscp can either read the filenames correct on SFTP-server when UTF-8 is On, or write files correct on Win-server when UTF-8 is Auto or Off.

I don’t believe you need another way. You simply need to figure out why
the way you are using is not working.

I have figured it out - it’s because Suse uses UTF-8, Windows don’t.

Windows versions used: Windows XP, Windows 7 Pro, Windows Server 2008R2

>> > Well, me & dd argue against you.
> Well no, I don’t see anything that DD says that argues against me.

+1


dd

dd, now I’m confused.
You didn’t mean this?

sorry, i can’t directly solve your problem…however, having fought the
problem since '92 in four different operating systems (MS DOS, Windows,
OS/2 and now Linux) i can tell you how to avoid the problem:

never use non-ASCII characters in file or directory names [Swedish i
don’t know, but for Danish i use å=aa, æ=ae and ø=oe, and though the
problem is not solved, it is avoided]

yes, i know that is inconvenient and i hope someone can help you solve
the problem (so far, i have found avoiding it is the easiest for me,
especially over time…that is, you may have to solve it AGAIN and AGAIN
and AGAIN, each time you move to a different OS or distro, or even just
a new version of openSUSE! and don’t forget that you might solve it for
openSUSE 12.1 with ext4 and next week wanna switch to ext3 or brtfs or
LVM or whatever, and find even more problems with non-ASCII characters…)

On 2012-06-13 14:16, pingu 2 wrote:
> Well, Windows doesn’t use UTF-8…

No, that is not correct, Windows does support unicode. It uses UTF-16, in
fact. XP not. FAT doesn’t, you need NTFS.

Wikipedia

> I -have- figured it out - it’s because Suse uses UTF-8, Windows don’t.

Forget it, you can not force Linux to use ISO pages nowdays. Terminals
perhaps, but not the filesystem.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

pingu 2 wrote:

… some pretty unreadable text …

>> The key is to make completely sure
>> exactly what encoding is used at every point in the system, and exactly
>> what code is responsible for any transcoding needed.Exactly what I’m trying to do! But I’m told I can’t do that, encoding is
> locked to UTF-8.
>> So apparently you have no choice. It
>> insists you use UTF-8. Get used to it.
> Well, Windows doesn’t use UTF-8…
>> How are files sent to the server via SFTP if it
>> cannot reach the Internet?
> The server can’t reach out, but sftp/ssh can get in.
> (And the Win-server is not reachable from Internet at all.)
>> I think it highly unlikely that WinSCP has such fundamental bugs. It
>> seems much more likely that you have misconfigured it or misused it or
>> somesuch.
> I have only configured the “UTF-8” Auto, On, Off - and started with
> that because filenames were corrupted.> Instead of saying “weird characters”, you should certainly be able to
>> tell us exactly what encoding is being used, and then we should be able
>> to figure out what is going wrong.
> Sigh…
> Encoding on SFTP-server: UTF-8.
> Encoding in Windows: cp1252.
> -Winscp can either read the filenames correct on SFTP-server when UTF-8
> is On, or write files correct on Win-server when UTF-8 is Auto or Off.-
>> I don’t believe you need another way. You simply need to figure out why
>> the way you are using is not working.
> I -have- figured it out - it’s because Suse uses UTF-8, Windows don’t.

No, it’s not as simple as that. Character sets aren’t properties of the
system as a whole. Every separate piece of text uses some particular
character set. Every separate program has its own encode and decode
routines to write its outputs and read its inputs respectively. Any one
of them can be different to what you’re expecting. You need to do a
detailed examination of what’s actually happening, not keep saying “my
head hurts”.

This looks interesting and it’s by a Scandinavian; perhaps you’ll
believe him http://www.cs.tut.fi/~jkorpela/chars.html

And here’s Microsoft:
http://msdn.microsoft.com/en-us/library/windows/desktop/dd374083(v=vs.85).aspx

“Microsoft Windows provides support for the many different written
languages of the international marketplace through Unicode and
traditional character sets.”

“Unicode is a worldwide character encoding standard … It is supported
by many operating systems, all modern browsers, and many other products.
New Windows applications should use Unicode to avoid the
inconsistencies of varied code pages and to aid in simplifying
localization.”

[my emphasis]

On 06/13/2012 02:16 PM, pingu 2 wrote:
>
> encoding is locked to UTF-8. . .
> So apparently you have no choice. . .
> Encoding on SFTP-server: UTF-8. . .
> Encoding in Windows: cp1252. . .
> Suse uses UTF-8, Windows don’t.

from where i sit i see several alternatives you might want to pursue:

  • run a different ftp server which is not “locked” to UTF-8

  • set all Windows input machines for non-proprietary encoding (good luck!!)

  • refuse inputs from machines not using UTF-8 :slight_smile:

  • refuse inputs with non-ASCII characters

  • insert an inbound script to purge Swedish characters and replace with
    the recognized two letter (ASCII) alternatives…

  • if there are no usable ftp daemons, don’t use Linux… (i’m sure MS
    will happily sell you an overly expensive and horribly easy to crack
    “server” which requires more hardware capability to do as much work)

the bottom line is this is not a Linux problem…Linux uses the world
wide standard Universal Character Set Transformation Format
(http://en.wikipedia.org/wiki/UTF-8) which is the required Internet
Standard and flows effortlessly to Unix, AIX, Solaris, BSD, OS X,
Android, and many other operating systems…while MS uses something else.


dd

On 06/13/2012 02:46 PM, pingu 2 wrote:
> dd, now I’m confused.
> You didn’t mean this?

sure i meant it…but, apologize if i wrote so unclearly that you
misunderstood it to the point that you thought i was arguing against
Dave and Carlos (who both first posted AFTER me)…

anyway, both Dave and Carlos know a whole lot more about this than i do
(i’ve only used Linux since '98 and am not a real hacker, like them)…

so, if i disagree with them (which i seldom do) there is a REAL good
chance that i am wrong (or at least confused)…


dd

On 2012-06-13 14:03, Carlos E. R. wrote:
> On 2012-06-13 13:42, Dave Howorth wrote:

Problem verified, solution found.

>> I think it highly unlikely that WinSCP has such fundamental bugs. It
>> seems much more likely that you have misconfigured it or misused it or
>> somesuch.
>
> Dunno, winscp might be trying to do translation.
>
> Maybe he can use samba to retrieve the files from the ftp server, because
> samba will convert the names not valid for Windows. If the server is not
> accessible via samba, then import to another Linux machine via sftp (ssh
> transport), and from this one, inside the network, via samba to the Windows
> machine.

I have verified the procedure.

I temporarily installed W7 into a virtual setup with vmplayer. I created in
Linux a set of files with complicated names (UTF-8):


cer@Telcontar:~> ls tmp/aaa
España  balón  diëresis  p

Then I looked at those files with winscp from Windows 7, and they are
indeed wrong.

Then I copied those same files to a share that I export via samba, and
looked at them from Windows: they are correct.

susepaste

So the procedure that works is:

External Linux server —> via scp —> Internal Linux machine —> via
samba —> Internal Windows machine.

It has to be done, WinSCP doesn’t do the correct transfer of UTF filenames.
It is not the fault of Linux, nor of Windows. Linux-Windows transfer via
samba works.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)

Carlos E. R. wrote:
> It has to be done, WinSCP doesn’t do the correct transfer of UTF filenames.
> It is not the fault of Linux, nor of Windows. Linux-Windows transfer via
> samba works.

Did you read and make use of the information in the link posted earlier
in the thread:

http://winscp.net/eng/docs/faq_utf8

On 2012-06-21 12:19, Dave Howorth wrote:
> Carlos E. R. wrote:
>> It has to be done, WinSCP doesn’t do the correct transfer of UTF filenames.
>> It is not the fault of Linux, nor of Windows. Linux-Windows transfer via
>> samba works.
>
> Did you read and make use of the information in the link posted earlier
> in the thread:
>
> http://winscp.net/eng/docs/faq_utf8

No… ok, reading.

I have default setttings. “UTF-8 encoding for filenames” is set to “auto”;
I put it to “on” and connect - then names are treated correctly.

Interesting.

But the OP said:

> I have created a session in WinSCP:s gui, there I can choose to set
> UTF-8: Auto, On or Off.
> *Auto does the same as *Off, *never recognizes UTF-8, resulting in
> failure to read filenames correctly -files are not found and thus not
> handled.
> On makes winscp read filename fine, but when they are copied can’t be
> written. Error message shows it wants to write file with weird
> characters in place of Swedish.

ON did not work for him apparently, so I did not even try. I believed the
statement.

Or… he mentions “scripts”. The problem could be in the scripting, the
names there do not use the same convention as in the filenames and they fail.


Cheers / Saludos,

Carlos E. R.
(from 11.4 x86_64 “Celadon” at Telcontar)
**