Results 1 to 8 of 8

Thread: just wondering what happens to utf-8 chars in posts

  1. #1
    Join Date
    Jun 2008
    Location
    UTC+10
    Posts
    9,686
    Blog Entries
    4

    Default just wondering what happens to utf-8 chars in posts

    Euro sign €
    a umlaut ä
    e grave è
    c cedilla ç
    n tilde ñ

    Admins feel free to delete this post anytime.

    Edit: looks like the forum software does the right thing with European characters. I should try Asian characters sometime.

  2. #2
    Join Date
    Jun 2008
    Location
    UTC+10
    Posts
    9,686
    Blog Entries
    4

    Default Re: just wondering what happens to utf-8 chars in posts

    The current time from BBC's Chinese site:

    2008年07月17日 格林尼治标准时间03:18北京时间 11:18发表

  3. #3
    Join Date
    Jun 2008
    Location
    Earth - Denmark
    Posts
    10,730

    Default Re: just wondering what happens to utf-8 chars in posts

    > Edit: looks like the forum software does the right thing with European
    > characters.


    unfortunately, those do not make it to all forum members in a readable
    format, yet..

    because the nntp gateway just doesn't know how to send'em...
    a KNOWN problem that i am NOT :-) complaining about...just informing
    ken_yapa that not all will see those characters the same way he saw them
    in his test... here they looked more like _&_#_2_4_1_;_ without the _s
    (which i put in so the web software wouldn't read them out correctly
    again

    --
    DenverD (Linux Counter 282315)
    A Texan in Denmark

  4. #4
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    26,529
    Blog Entries
    15

    Default Re: just wondering what happens to utf-8 chars in posts

    On Thu, 17 Jul 2008 15:49:31 GMT
    DenverD <spam.trap@Texan.dk> wrote:

    > > Edit: looks like the forum software does the right thing with
    > > European characters.

    >
    > unfortunately, those do not make it to all forum members in a
    > readable format, yet..
    >
    > because the nntp gateway just doesn't know how to send'em...
    > a KNOWN problem that i am NOT :-) complaining about...just informing
    > ken_yapa that not all will see those characters the same way he saw
    > them in his test... here they looked more like _&_#_2_4_1_;_ without
    > the _s (which i put in so the web software wouldn't read them out
    > correctly again
    >

    Hi
    If the text was a cut/paste then that's why we see the said funny
    characters.

    --
    Cheers Malcolm (Linux Counter #276890)
    SLED 10 SP2 i586 Kernel 2.6.16.60-0.23-default
    up 19:31, 2 users, load average: 1.20, 1.43, 0.88
    GPU GeForce Go 6600 TE/6200 TE Version: 173.14.09


  5. #5
    Join Date
    Jun 2008
    Location
    UTC+10
    Posts
    9,686
    Blog Entries
    4

    Default Re: just wondering what happens to utf-8 chars in posts

    Actually the European diacritics were done with the Compose key and the Chinese characters with cut and paste because I haven't got CJK input yet. Too bad the NNTP gateway doesn't do the right thing for you. But, seeing as it's apparently encoded as XML entities correctly, isn't it an issue with the NNTP reading software? But it's quite a complex issue and it could still be displayed wrongly if the server assumes Latin-1 and the client UTF-8 or v.v. You'd have to look at the Content-charset header in the NNTP headers plus the exact entities sent for each diacritic.

    Not that I'm expecting a flood of users of diacritics, though it would be nice for European users to spell their names correctly, let alone CJK users. But I'm curious about how international the forum software is. Some of the software out there is in the dark ages. You often see smart quotes like these: which are also non-ASCII characters, mess up webpages.

  6. #6
    Join Date
    Jun 2008
    Location
    UTC+10
    Posts
    9,686
    Blog Entries
    4

    Default Re: just wondering what happens to utf-8 chars in posts

    BTW, this is what happens when UTF-8 is not properly converted to ISO8859-1 when rendered by the forum software:

    openSUSE Weekly News, Issue 31 - openSUSE Forums

    That ’ is actually the right quote character . I don't have any idea where it got mangled. And <deity> knows what you NNTP subscribers are seeing.

  7. #7
    Join Date
    Jun 2008
    Location
    Podunk
    Posts
    26,529
    Blog Entries
    15

    Default Re: just wondering what happens to utf-8 chars in posts

    On Fri, 18 Jul 2008 07:36:04 GMT
    ken yap <ken_yap@no-mx.forums.opensuse.org> wrote:

    >
    > BTW, this is what happens when UTF-8 is not properly converted to
    > ISO8859-1 when rendered by the forum software:
    >
    > 'openSUSE Weekly News, Issue 31 - openSUSE Forums'
    > (http://tinyurl.com/5w6u9w)
    >
    > That ’ is actually the right quote character . I don't have any
    > idea where it got mangled. And <deity> knows what you NNTP
    > subscribers are seeing.
    >
    >

    Hi
    I see an a with a ^ above it rather than what I assume should be a '

    --
    Cheers Malcolm (Linux Counter #276890)
    SLED 10 SP2 i586 Kernel 2.6.16.60-0.23-default
    up 1 day 15:03, 1 user, load average: 0.26, 0.27, 0.25
    GPU GeForce Go 6600 TE/6200 TE Version: 173.14.09


  8. #8
    Join Date
    Jun 2008
    Location
    Earth - Denmark
    Posts
    10,730

    Default Re: just wondering what happens to utf-8 chars in posts

    > And <deity> knows

    :-) yep, sometimes kinda ugly...

    --
    DenverD (Linux Counter 282315)
    A Texan in Denmark

Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •