Why is MySQL now packaged for utf8 ?

This is a rant and not a cry for help. Installing openSUSE 11.2 I noticed that mysql now defaults to using utf8. Contacting older hosts with the mysql client gives a complete mess. Changing the system wide encoding in yast2 to use latin1 gives a mess too.

And best of all: this major change was made in a minor release and it gets no mention in the file README.SuSE. All the documentation still claims that latin1 is the default.

Solution? Get the source rpm, undo the change in the spec file, comment out patch35, re-build, re-install. :open_mouth:

I presume because the latin1 with swedish collation was no longer suitable for many users.

Changing things in YaST won’t do any good; you need to use the charset and collation commands in mysql to work with specific character sets and collations other than the default. I suspect more users already do this already than don’t.

Better question; why the hell are people still NOT using UTF-8?

@Chrystine
yes, I agree with you, however, I found a problem recently with utf-8. I had to do some job in Delphi, version 7 (2002), I believe.
There was total “aerobatics” in code to make connection to UTF working as back in those days, there were no real UTF-8 support widely.
Closest I could get was WideChar, and that had serious problems as components that talks with database do it as ANSI string, and as soon as some portuguese accented character enters. had insertion errors.

What I find as a problem is that MySQL and PostgreSQL does not have ability to convert already created database from UTF-8 to something else.

+1

Or autodetect and convert when inserting or updating the DB?

The mysql command line client does not getenv the LANG environment variable. It uses the compiled in default which can be overridden with a command line option. But, as far as I can see, the variable character_set_client can not be set in the configuration files. Changing the default will break a lot of scripts.

Everyone was free to adapt the charset and collation used by the server with the config file or even within the init script. That works with every setup. MySQL - by design - is multi charset capable. But the client must produce the correct charset for the local setting of the terminal.

Many scripts (and their programmers) have a hard time dealing with utf8. When the length of the playground on the screen no longer matches the number of bytes for the string the display is broken and alignment problems wait around the corner. There are many good reasons to use an 8-bit charset, as long as it does the job.

Setting the default to latin1 gives everyone the freedom to choose a complex charset instead and treat the data accordingly, but setting the default to utf8 effectively reduces mysql to ‘utf-8 only’. That’s a bad choice IMHO.

That is simply bad programming, to assume that strlen(s) = number of characters in s. It’s a Eurocentric view. And even within Europe, Russians will take issue. It’s a little bit like the US/UK programmers of an earlier age, assuming that it was ok to discard the 8th bit in octets. How did you Europeans like that assumption? It’s an international world now.

Setting the default to latin1 gives everyone the freedom to choose a complex charset instead and treat the data accordingly, but setting the default to utf8 effectively reduces mysql to ‘utf-8 only’. That’s a bad choice IMHO.

There is no problem with setting the default either way. As I have pointed out in another thread, internally they are all octets anyway. You just have to make sure that your input and output characteristics match up with the charset you chose. The charset declaration of the server, database, table and column, in that order of priority, simply serve as metadata to aid in conversion and collation. If you control exactly how the data is accessed, the metadata can be even set wrong and it would not affect your app.

I actually went through this experience. I have a 5.0 server which defaults to Latin-1 but I was storing UTF-8 data in it. My web app is UTF-8 throughout. I wondered what would happen if I went and changed the column characteristics to UTF-8 after the data had been put in. Answer: absolutely nothing, because I was not using any collation or conversion functions, and my app sticks with UTF-8 all the way.

I am always astonished to see how people confuse Switzerland and Sweden. :cry:

Pray explain yourself. No one in this thread has confused Switzerland and Sweden so far.

Pray explain yourself ??? What is it you want him to do ?? rotfl!

He knew I was going to post, I confuse both Switzerland and Sweden with everything else in the world.

ok guys, latin1_swedish_ci is a perfectly valid collation for a couple of alpine paysants who did NOT invent edam cheese. At least this thread now meets my orthodox anarchistic attitude (it was a rant from the beginning). rotfl!

rotfl!rotfl!

And sorry for my previous OT post :wink:

Switzerland, isn’t that where Mandela comes from? Or was it Mma. Ramotswe? No, that was Botswana. Or was it Lesotho? Or Ubuntu? Or Obongo? Oh I am so confused. rotfl!