I am looking for a tool (or something else) that can tell me if an Unicode point is available in one of the installed fonts on my system. And when ‘yes’, in which one.
Example: in which font(s) on my system is a glyph for U+1E5B (LATIN SMALL LETTER R WITH DOT BELOW from Latin Extended Additional)?
And the reverse. Can I get a list of all the characters (Unicode points) that are available in a font on my system?
Example: which Unicode points are covered in the Lucida Sans font on my system?
On visiting some websites I found that I am missing some glyphs, the characters are displayed as open rectangles. When this happens in a page with e.g. Devanagari you install indic-fonts from the OSS repo and done you are. But, apart from the name ‘indic’ there is not much to tell you that you should install this. And even when you have installed indic-fonts as a try, I can not find out what I have exactly. The Devanagari works, but in fact there is also Bengali and more in it.
On the other hand I am still missing glyphs, I researched that they must be like LATIN SMALL LETTER R WITH DOT BELOW, but which font to install?
Mmm well can’t specifically help but did find something that should, I did try fc-list :lang:hi which if my understand is correct will show the fonts with that language coding… But even though I had plenty turn up and can find ṛ (So guess you get a sqr) but I have others I can’t find.
Well hopefully that lot will help you. Me I’m even more confused how all this font stuff works and respect to all those font designers out there. (You can have a play with fontforge but couldn’t see a way of directly extracting what you want, 1 font set at a time may take a while)
Thanks for the post. It gives some nice starting points, especialy those about the DEVANAGARI LETTER SHORT A. I have the same as you, a square allthough other Devanagari characters are shown correct. I will try to dig further in these links.
For your information I will try to explain a bit about these font stuff.
Let we skip all the earlier encoding schemas and stick to the all singing, all dancing solution of Unicode. Unicode has millions of places for character encoding and as such can accomodate all scripts of the world (and more, I even think they included Klingon). Now al those codes must end up as readable glyphs on the screen. For the rendering of these glyphs font descriptions are used. When you have U+0041 (that is the notation: U+ so you see it is a Unicode point and 0041 as hex number), you want to see a A on the screen, but is might also be an A.
Now when you design such a font, you normaly design it for e.g. users of Latin and you do not include all the million characters possible. So every font you find to install has either only Latin, or may be Cyrilic and Greek included, but not Klingon. So when you want to see Klingon, you need at least one font that includes glyphs for the Unicode points of Klingon. But which font does have Klingon?
When you go to YaST > Software and search for ‘font’ you get a lot of packages, but the descriptions are very scanty about what is inside. E.g. the description of indic-fonts says:
This package contains many professional Indian language TrueType fonts contributed by the community and some also donated by organizations to open source. All fonts are available under GPL.
Which does not help anybody who types ‘Bengali’ or ‘Devanagari’ in the search field!
At the moment I am not looking for Devanagari (like you, I have it, except for the SHORT A, which is not much used in Hindi). But I was looking for the Latin ones that are used for transcribing these Devanagari into Latin. These contain letters with dots below (a t without the dot is pronounced with the tongue against the teeth, with the dot belowis pronounced with the tongue against the palate, they are differnet characters in Devanagari). I found a font Gentium (one click install via Webpin) and I can inlude R WITH DOT BELOW in a document with OO, but the website still shows a square. I failed to find if the website used th Unicode point I thought it would use.
You will understand that I then sought for a method to find out which Unicode points are covered on my system, hence my thread.
That was a lot of typing. Now back to your links and further investigation
That is strange it is not until I install code2000 which did give me a load more than I had prior, that it recognises I can have U+0971 with other font sets. Something is going on here …
Well still not really sure what I’m talking about but the menu page from here http://www.alanwood.net/unicode/index.html has some character sets I still can’t render. I suspect a little reading would solve that. i.e I noticed the same person that did code2000 does a code2001 that does Aegean Numbers as for me needing them shrug hehe…
Looks that Code 2000 has a lot. But did you notice that again the exact info about what points are there is missing? And again, I am not searching for Devanagari, but for (what the Unicode website you also found calls) Latin Extended Additional.
The Code2000 person also talks about ‘Basic Multilingual Plane of Unicode’, which I do not understand at all. IMHO there are no planes in Unicode. The planes where part of the old ISO solution which had several 0080 - 00FF tables for different languages. Unicode has them all in one big table.
He also warns for things like ‘Code2000 can populate charts for Oriya and Kannada — but can not yet properly display text in those scripts.’ I do not know if e.g. a web page has such characters and you have a font that does display text correct and C2000, if the correct displaying font is chosen by the browser.
So because it contains unfinished parts, it could corrupt your installation.
To find out one has to test. And for testing I would like to have the tool I asked for lol!
What desktop are you using? If Gnome, use the Character Map application
to view all your fonts, I see the following info in Character Details;
U+1E5B LATIN SMALL LETTER R WITH DOT BELOW
General Character Properties
In Unicode since: 1.1
Unicode category: Letter, Lowercase
Canonical decomposition: U+0072 LATIN SMALL LETTER R + U+0323 COMBINING
Various Useful Representations
UTF-8: 0xE1 0xB9 0x9B
C octal escaped UTF-8: \341\271\233
XML decimal entity: ṛ
Annotations and Cross References
• Indic transliteration
• see ISO U+15919 <not assigned> on the use of dot below versus ring
below in Indic transliteration
• U+0325 COMBINING RING BELOW
• U+0072 LATIN SMALL LETTER R U+0323 COMBINING DOT BELOW
I installed C2000. I tried allenwood’s test page and had squares. Then I tried FF instead of Konqui and, behold, it worked. Now the big question: I did not test FF on this problem earlier, so I still do not know if CC2000 did it, or the earlier insall of Gentium (but I will go on testing this, deinstalling C200, etc.).
@FF. Those Allenwood pages are as good as a local tool I think. (Using FF that is).
@Malcolm. I will look into gucharmap, thanks for that one.
That you see the first test line (with the dotted r and t and so on) means you have such a font (as I have now two I think). That you do not see the Devanagari means you do NOT have such a font (I can recommend indic-fonts from the OSS repo ).
Still strange that Konqui does see the indic-fonts from the moment I installed them and refuses to see the Gentium/C2000 ones where FF and OO do.
And as a last word I will admit that I tried to see this thread through FF because I wanted of course see my own test lines. BUT FF GIVES AN ERROR on forums.opensuse.org??? It warns for a redirection loop!!!
More investigating lucida sans is providing indic… Well the basic(Missing the few we found out about) as it is the only one showing. As for the other not sure had plenty of fonts with it yet the top one in browser check didn’t give me a font.
I installed gucharmap. Had a bit of a search in my KDEmenu to find it. It is under (I now translate back from dutch into english) Special Tools > Editor > Gnome Special Characters (Character Table). Bit strange place, but the tool is OK. I think that this is as close as you can get yo my original question. Thanks.