which is correct. I assume that your case is alo correct because the output is interpreted as one character only. The only thing is that you seem not have a font in that application (terminal emulator) that contains the character. I have.
BTW, it could be that you will NOT see the character I see on my screen and in this post, because you may not have the font that contains this character in your browser.
You mean, linux text mode console? This is not going to work - text mode font has space for 512 characters only, so anything it does not know about is printed as this funny question mark.
which, IMHO shows that the output is correct, but that no font is available to show the glyph FULL MOON SYMBOL.
The Application interpreted the bytes it got (F09F8C95) and translated that back into U+1F315. Not finding a glyph for that in any of the installed fonts, it created a box with 01F 315 in it. A replacement glyph, but nevertheless correct.
I originally did not post to this thread because I have no Python knowledge, nor did I understand the word “surrogates” in this context. I only posted later because I saw the OP trying to prove something with echo, where I thought his interpretation was not correct.
In the meantime I have read more in this thread and I understand that surrogates have something to do with UUTF-16. I do know something about Unicode and UTF-8, but, UTF-16 being something of a niche in Linux and certainly not the default encoding used, it may be that my post is not very interesting for the IP’s problem.
which, IMHO shows that the output is correct, but that no font is available to show the glyph FULL MOON SYMBOL.
The Application interpreted the bytes it got (F09F8C95) and translated that back into U+1F315. Not finding a glyph for that in any of the installed fonts, it created a box with 01F 315 in it. A replacement glyph, but nevertheless correct.
I originally did not post to this thread because I have no Python knowledge, nor did I understand the word “surrogates” in this context. I only posted later because I saw the OP trying to prove something with echo, where I thought his interpretation was not correct.
In the meantime I have read more in this thread and I understand that surrogates have something to do with UUTF-16. I do know something about Unicode and UTF-8, but, UTF-16 being something of a niche in Linux and certainly not the default encoding used, it may be that my post is not very interesting for the IP’s problem. Henk van Velden
Yours and the others replies are always not only interesting, but very helpful and are always appreciated. From what I’ve read, UTF-16 is the default Unicode for Windows (leave it to them to be contrary). I installed Freefont on the Windows comp and its equivalent, (I think) TEXfreefont on the openSuse comp. It supposedly supports oodles of Unicode glyphs. No joy though, except…(please see my next post)
[FONT=system][FONT=verdana][FONT=Arial]So I wanted to check what my new fonts gave me and wrote this small script (sorry if it’s not pretty):
import codecs
#
# First run (0-55295) use "w" option, second run (57344-1114111)
# use "a" option
#
file = codecs.open("unicode_symbols", "w", "utf-8")
#
# For Plane 0 (BMP) change values to '"0, 55295". Note: these two ranges
# exclude 55296-57543, which are used as surrogate pairs for UTF-16
#
for a in range(57344, 1114111):
file.write('Decimal: ')
file.write(str(a))
file.write(' Hex: ')
file.write(str(hex(a)))
file.write(' Binary: ')
file.write(str(bin(a)))
file.write(' Character: ')
file.write(str(chr(a)))
file.write("
")
a += a
file.close()
Lo and behold, there was the elusive full moon So it’s there, I just can’t display it on the screen. I am now beginning to suspect it might be the pyCharm IDE I’m using, since it’s the only common thread between the Windows comp and openSuse comp. My next step will be to write a small script in notepad (Windows) and Kwrite (openSuse) and see if that makes a difference.[/FONT][/FONT][/FONT]
[FONT=system][FONT=verdana][FONT=Arial]Lo and behold, there was the elusive full moon
I’m confused. This sounds like echoing UTF-8 sequence by shell does not work, but outputting the same sequence on the same terminal by Python does. I have hard time to believe it … but whatever works for you.
[/FONT][/FONT][/FONT]
Actually the little script outputs to a text file, not the terminal. Neither the shell nor python will output the characters to the screen. So, I’m confused too. Well, openSuse 13.2 is coming out soon, I’ll do a clean install and see if maybe something I did jazzed things up.
Which of course changes everything, because Python can easily use different encoding when printing to stdout and when printing to file. You omit critical details which makes it pointless to continue to play guess games. Good luck.