openSUSE Forums > Looking For Something Other Than Support? » Which fonts contain which Unicode points?

Go Back   openSUSE Forums > Looking For Something Other Than Support?
Forums FAQ Members List Search Today's Posts Mark Forums Read


Looking For Something Other Than Support? If you are looking for manuals, books, repositories, hardware, software, etc. this is the place to see if someone can help you find it.

Reply
Page 1 of 3 1 23
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 09-Jul-2009, 08:21
hcvv's Avatar
Wise Penguin
 
Join Date: Jun 2008
Location: Netherlands
Posts: 1,904
hcvv 's reputation will be famous soon enoughhcvv 's reputation will be famous soon enoughhcvv 's reputation will be famous soon enough
Default Which fonts contain which Unicode points?

I am looking for a tool (or something else) that can tell me if an Unicode point is available in one of the installed fonts on my system. And when 'yes', in which one.
Example: in which font(s) on my system is a glyph for U+1E5B (LATIN SMALL LETTER R WITH DOT BELOW from Latin Extended Additional)?

And the reverse. Can I get a list of all the characters (Unicode points) that are available in a font on my system?
Example: which Unicode points are covered in the Lucida Sans font on my system?

Background:
On visiting some websites I found that I am missing some glyphs, the characters are displayed as open rectangles. When this happens in a page with e.g. Devanagari you install indic-fonts from the OSS repo and done you are. But, apart from the name 'indic' there is not much to tell you that you should install this. And even when you have installed indic-fonts as a try, I can not find out what I have exactly. The Devanagari works, but in fact there is also Bengali and more in it.

On the other hand I am still missing glyphs, I researched that they must be like LATIN SMALL LETTER R WITH DOT BELOW, but which font to install?
__________________
Henk van Velden
Reply With Quote
  #2 (permalink)  
Old 09-Jul-2009, 09:27
FeatherMonkey's Avatar
Wise Penguin
 
Join Date: Mar 2008
Posts: 1,545
FeatherMonkey has a spectacular reputation aura aboutFeatherMonkey has a spectacular reputation aura aboutFeatherMonkey has a spectacular reputation aura aboutFeatherMonkey has a spectacular reputation aura about
Default Re: Which fonts contain which Unicode points?

Mmm well can't specifically help but did find something that should, I did try fc-list :lang:hi which if my understand is correct will show the fonts with that language coding.. But even though I had plenty turn up and can find ṛ (So guess you get a sqr) but I have others I can't find.

Knowing bugger all about fonts really, and even less about indic languages, I was surprised by how many I did get perhaps this page will assist. DebianInstaller/GUIFonts - Debian Wiki

Then though I can see ṛ I can't see DEVANAGARI LETTER SHORT A as determined from here Devanagari - Test for Unicode support in Web browsers but on going to this page Unicode Character 'DEVANAGARI LETTER SHORT A' (U+0904) I can now find a font that does support it and browser test.

Well hopefully that lot will help you. Me I'm even more confused how all this font stuff works and respect to all those font designers out there. (You can have a play with fontforge but couldn't see a way of directly extracting what you want, 1 font set at a time may take a while)
__________________
Man first, have a try at Info, have a look at Wiki, if all that fails Scroogle!!!!!
If I've helped click on the Rep button I don't know what it does but it sounds cool.
Reply With Quote
  #3 (permalink)  
Old 09-Jul-2009, 10:16
FeatherMonkey's Avatar
Wise Penguin
 
Join Date: Mar 2008
Posts: 1,545
FeatherMonkey has a spectacular reputation aura aboutFeatherMonkey has a spectacular reputation aura aboutFeatherMonkey has a spectacular reputation aura aboutFeatherMonkey has a spectacular reputation aura about
Default Re: Which fonts contain which Unicode points?

OK seem to of solved that try this font Download Code2000 its added support for a fair bit.

I ended up here What is Unicode? after reading about the cursive script problem here Unicode fonts and tools for X11

I then added the code2000 due to the earlier recommendation now on the unicode page I notice it's only Deseret not encoding correctly. And the DEVANAGARI page is encoding correctly for all characters.

Kudos to who ever is offering code2000 had I a need for the additional language support I for one would give him a little.

Well I learnt a little bit more today
__________________
Man first, have a try at Info, have a look at Wiki, if all that fails Scroogle!!!!!
If I've helped click on the Rep button I don't know what it does but it sounds cool.
Reply With Quote
  #4 (permalink)  
Old 09-Jul-2009, 10:22
hcvv's Avatar
Wise Penguin
 
Join Date: Jun 2008
Location: Netherlands
Posts: 1,904
hcvv 's reputation will be famous soon enoughhcvv 's reputation will be famous soon enoughhcvv 's reputation will be famous soon enough
Default Re: Which fonts contain which Unicode points?

Hello FM,

Thanks for the post. It gives some nice starting points, especialy those about the DEVANAGARI LETTER SHORT A. I have the same as you, a square allthough other Devanagari characters are shown correct. I will try to dig further in these links.

For your information I will try to explain a bit about these font stuff.
Let we skip all the earlier encoding schemas and stick to the all singing, all dancing solution of Unicode. Unicode has millions of places for character encoding and as such can accomodate all scripts of the world (and more, I even think they included Klingon). Now al those codes must end up as readable glyphs on the screen. For the rendering of these glyphs font descriptions are used. When you have U+0041 (that is the notation: U+ so you see it is a Unicode point and 0041 as hex number), you want to see a A on the screen, but is might also be an A.

Now when you design such a font, you normaly design it for e.g. users of Latin and you do not include all the million characters possible. So every font you find to install has either only Latin, or may be Cyrilic and Greek included, but not Klingon. So when you want to see Klingon, you need at least one font that includes glyphs for the Unicode points of Klingon. But which font does have Klingon?
When you go to YaST > Software and search for 'font' you get a lot of packages, but the descriptions are very scanty about what is inside. E.g. the description of indic-fonts says:
Quote:
This package contains many professional Indian language TrueType fonts contributed by the community and some also donated by organizations to open source. All fonts are available under GPL.
Which does not help anybody who types 'Bengali' or 'Devanagari' in the search field!

At the moment I am not looking for Devanagari (like you, I have it, except for the SHORT A, which is not much used in Hindi). But I was looking for the Latin ones that are used for transcribing these Devanagari into Latin. These contain letters with dots below (a t without the dot is pronounced with the tongue against the teeth, with the dot belowis pronounced with the tongue against the palate, they are differnet characters in Devanagari). I found a font Gentium (one click install via Webpin) and I can inlude R WITH DOT BELOW in a document with OO, but the website still shows a square. I failed to find if the website used th Unicode point I thought it would use.

You will understand that I then sought for a method to find out which Unicode points are covered on my system, hence my thread.

That was a lot of typing. Now back to your links and further investigation
__________________
Henk van Velden
Reply With Quote
  #5 (permalink)  
Old 09-Jul-2009, 10:40
FeatherMonkey's Avatar
Wise Penguin
 
Join Date: Mar 2008
Posts: 1,545
FeatherMonkey has a spectacular reputation aura aboutFeatherMonkey has a spectacular reputation aura aboutFeatherMonkey has a spectacular reputation aura aboutFeatherMonkey has a spectacular reputation aura about
Default Re: Which fonts contain which Unicode points?

I found out more but I don't know why this is strange after installing code2000

Now on the distro I'm on I have to go back to the old way and run fc-cache, but look at this ... in FF 3.5
With out


And then with



That is strange it is not until I install code2000 which did give me a load more than I had prior, that it recognises I can have U+0971 with other font sets. Something is going on here ...

Well still not really sure what I'm talking about but the menu page from here http://www.alanwood.net/unicode/index.html has some character sets I still can't render. I suspect a little reading would solve that. i.e I noticed the same person that did code2000 does a code2001 that does Aegean Numbers as for me needing them shrug hehe..

OK well I'll leave you to it.
__________________
Man first, have a try at Info, have a look at Wiki, if all that fails Scroogle!!!!!
If I've helped click on the Rep button I don't know what it does but it sounds cool.
Reply With Quote
  #6 (permalink)  
Old 09-Jul-2009, 10:46
hcvv's Avatar
Wise Penguin
 
Join Date: Jun 2008
Location: Netherlands
Posts: 1,904
hcvv 's reputation will be famous soon enoughhcvv 's reputation will be famous soon enoughhcvv 's reputation will be famous soon enough
Default Re: Which fonts contain which Unicode points?

Looks that Code 2000 has a lot. But did you notice that again the exact info about what points are there is missing? And again, I am not searching for Devanagari, but for (what the Unicode website you also found calls) Latin Extended Additional.

The Code2000 person also talks about 'Basic Multilingual Plane of Unicode', which I do not understand at all. IMHO there are no planes in Unicode. The planes where part of the old ISO solution which had several 0080 - 00FF tables for different languages. Unicode has them all in one big table.

He also warns for things like 'Code2000 can populate charts for Oriya and Kannada — but can not yet properly display text in those scripts.' I do not know if e.g. a web page has such characters and you have a font that does display text correct and C2000, if the correct displaying font is chosen by the browser.
So because it contains unfinished parts, it could corrupt your installation.

To find out one has to test. And for testing I would like to have the tool I asked for
__________________
Henk van Velden
Reply With Quote
  #7 (permalink)  
Old 09-Jul-2009, 10:56
FeatherMonkey's Avatar
Wise Penguin
 
Join Date: Mar 2008
Posts: 1,545
FeatherMonkey has a spectacular reputation aura aboutFeatherMonkey has a spectacular reputation aura aboutFeatherMonkey has a spectacular reputation aura aboutFeatherMonkey has a spectacular reputation aura about
Default Re: Which fonts contain which Unicode points?

I think I can find out what points are missing from here Unicode characters not supported by the Code2000 font

As for Kannada script - Wikipedia, the free encyclopedia I don't get sqrs but god knows, whether that is due to code2000.

And Latin Extended Additional - Test for Unicode support in Web browsers renders fine with code2000 and without I seem to have a few providing it..
__________________
Man first, have a try at Info, have a look at Wiki, if all that fails Scroogle!!!!!
If I've helped click on the Rep button I don't know what it does but it sounds cool.
Reply With Quote
  #8 (permalink)  
Old 09-Jul-2009, 11:04
malcolmlewis's Avatar
Global Moderator
 
Join Date: Jun 2008
Location: Podunk
Posts: 4,680
malcolmlewis has great reputationmalcolmlewis has great reputationmalcolmlewis has great reputationmalcolmlewis has great reputationmalcolmlewis has great reputationmalcolmlewis has great reputation
Default Re: Which fonts contain which Unicode points?

Quote:
Originally Posted by hcvv
Looks that Code 2000 has a lot. But did you notice that again the exact
info about what points are there is missing? And again, I am not
searching for Devanagari, but for (what the Unicode website you also
found calls) Latin Extended Additional.

The Code2000 person also talks about 'Basic Multilingual Plane of
Unicode', which I do not understand at all. IMHO there are no planes in
Unicode. The planes where part of the old ISO solution which had several
0080 - 00FF tables for different languages. Unicode has them all in one
big table.

He also warns for things like 'Code2000 can populate charts for Oriya
and Kannada — but can not yet properly display text in those
scripts.' I do not know if e.g. a web page has such characters and you
have a font that does display text correct and C2000, if the correct
displaying font is chosen by the browser.
So because it contains unfinished parts, it could corrupt your
installation.

To find out one has to test. And for testing I would like to have the
tool I asked for
Hi
What desktop are you using? If Gnome, use the Character Map application
to view all your fonts, I see the following info in Character Details;

Code:
á¹›

U+1E5B LATIN SMALL LETTER R WITH DOT BELOW

General Character Properties

In Unicode since: 1.1
Unicode category: Letter, Lowercase
Canonical decomposition: U+0072 LATIN SMALL LETTER R + U+0323 COMBINING
DOT BELOW

Various Useful Representations

UTF-8: 0xE1 0xB9 0x9B
UTF-16: 0x1E5B

C octal escaped UTF-8: \341\271\233
XML decimal entity: ṛ

Annotations and Cross References

Notes:
• Indic transliteration
• see ISO U+15919 <not assigned> on the use of dot below versus ring
below in Indic transliteration

See also:
• U+0325 COMBINING RING BELOW

Equivalents:
• U+0072 LATIN SMALL LETTER R U+0323 COMBINING DOT BELOW
--
Cheers Malcolm °¿° (Linux Counter #276890)
SUSE Linux Enterprise Desktop 11 (x86_64) Kernel 2.6.27.23-0.1-default
up 12 days 1:29, 2 users, load average: 0.23, 0.14, 0.09
GPU GeForce 8600 GTS Silent - Driver Version: 185.18.14

Reply With Quote
  #9 (permalink)  
Old 09-Jul-2009, 12:37
hcvv's Avatar
Wise Penguin
 
Join Date: Jun 2008
Location: Netherlands
Posts: 1,904
hcvv 's reputation will be famous soon enoughhcvv 's reputation will be famous soon enoughhcvv 's reputation will be famous soon enough
Default Re: Which fonts contain which Unicode points?

First a try: ṛṙṭṣ
These are four characters in the 1E00 - 1EFF range. I generated them in OO. Looks fine in OO, but cut and paste here gives open squares to me.

And this भारत , done the same way, works perfect to me.

@malcolmlewis.
That is a bunch of information from Gnome, but I am using KDE 3.5.
You not, by change, know the name of the program behind it?
__________________
Henk van Velden
Reply With Quote
  #10 (permalink)  
Old 09-Jul-2009, 12:43
hcvv's Avatar
Wise Penguin
 
Join Date: Jun 2008
Location: Netherlands
Posts: 1,904
hcvv 's reputation will be famous soon enoughhcvv 's reputation will be famous soon enoughhcvv 's reputation will be famous soon enough
Default Re: Which fonts contain which Unicode points?

@FM
The Kannada I have, but that is because of indic-fonts (or another one, I installed several others of those kind).
But the Latin Extended Additional are all squares.

In short, you have convinced me, I am going to install that C2000.
__________________
Henk van Velden
Reply With Quote
Reply
Page 1 of 3 1 23

Bookmarks


Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On




 

Search Engine Friendly URLs by vBSEO 3.3.0 RC2