Which fonts contain which Unicode points?

I am looking for a tool (or something else) that can tell me if an Unicode point is available in one of the installed fonts on my system. And when ‘yes’, in which one.
Example: in which font(s) on my system is a glyph for U+1E5B (LATIN SMALL LETTER R WITH DOT BELOW from Latin Extended Additional)?

And the reverse. Can I get a list of all the characters (Unicode points) that are available in a font on my system?
Example: which Unicode points are covered in the Lucida Sans font on my system?

Background:
On visiting some websites I found that I am missing some glyphs, the characters are displayed as open rectangles. When this happens in a page with e.g. Devanagari you install indic-fonts from the OSS repo and done you are. But, apart from the name ‘indic’ there is not much to tell you that you should install this. And even when you have installed indic-fonts as a try, I can not find out what I have exactly. The Devanagari works, but in fact there is also Bengali and more in it.

On the other hand I am still missing glyphs, I researched that they must be like LATIN SMALL LETTER R WITH DOT BELOW, but which font to install?

Mmm well can’t specifically help but did find something that should, I did try fc-list :lang:hi which if my understand is correct will show the fonts with that language coding… But even though I had plenty turn up and can find ṛ (So guess you get a sqr) but I have others I can’t find.

Knowing bugger all about fonts really, and even less about indic languages, I was surprised by how many I did get perhaps this page will assist. DebianInstaller/GUIFonts - Debian Wiki

Then though I can see ṛ I can’t see DEVANAGARI LETTER SHORT A as determined from here Devanagari - Test for Unicode support in Web browsers but on going to this page Unicode Character ‘DEVANAGARI LETTER SHORT A’ (U+0904) I can now find a font that does support it and browser test.

Well hopefully that lot will help you. Me I’m even more confused how all this font stuff works and respect to all those font designers out there. (You can have a play with fontforge but couldn’t see a way of directly extracting what you want, 1 font set at a time may take a while)

OK seem to of solved that try this font Download Code2000 its added support for a fair bit.

I ended up here What is Unicode? after reading about the cursive script problem here Unicode fonts and tools for X11

I then added the code2000 due to the earlier recommendation now on the unicode page I notice it’s only Deseret not encoding correctly. And the DEVANAGARI page is encoding correctly for all characters.

Kudos to who ever is offering code2000 had I a need for the additional language support I for one would give him a little.

Well I learnt a little bit more today :wink:

Hello FM,

Thanks for the post. It gives some nice starting points, especialy those about the DEVANAGARI LETTER SHORT A. I have the same as you, a square allthough other Devanagari characters are shown correct. I will try to dig further in these links.

For your information I will try to explain a bit about these font stuff.
Let we skip all the earlier encoding schemas and stick to the all singing, all dancing solution of Unicode. Unicode has millions of places for character encoding and as such can accomodate all scripts of the world (and more, I even think they included Klingon). Now al those codes must end up as readable glyphs on the screen. For the rendering of these glyphs font descriptions are used. When you have U+0041 (that is the notation: U+ so you see it is a Unicode point and 0041 as hex number), you want to see a A on the screen, but is might also be an A.

Now when you design such a font, you normaly design it for e.g. users of Latin and you do not include all the million characters possible. So every font you find to install has either only Latin, or may be Cyrilic and Greek included, but not Klingon. So when you want to see Klingon, you need at least one font that includes glyphs for the Unicode points of Klingon. But which font does have Klingon?
When you go to YaST > Software and search for ‘font’ you get a lot of packages, but the descriptions are very scanty about what is inside. E.g. the description of indic-fonts says:

This package contains many professional Indian language TrueType fonts contributed by the community and some also donated by organizations to open source. All fonts are available under GPL.
Which does not help anybody who types ‘Bengali’ or ‘Devanagari’ in the search field!

At the moment I am not looking for Devanagari (like you, I have it, except for the SHORT A, which is not much used in Hindi). But I was looking for the Latin ones that are used for transcribing these Devanagari into Latin. These contain letters with dots below (a t without the dot is pronounced with the tongue against the teeth, with the dot belowis pronounced with the tongue against the palate, they are differnet characters in Devanagari). I found a font Gentium (one click install via Webpin) and I can inlude R WITH DOT BELOW in a document with OO, but the website still shows a square. I failed to find if the website used th Unicode point I thought it would use.

You will understand that I then sought for a method to find out which Unicode points are covered on my system, hence my thread.

That was a lot of typing. Now back to your links and further investigation :slight_smile:

I found out more but I don’t know why this is strange after installing code2000

Now on the distro I’m on I have to go back to the old way and run fc-cache, but look at this … in FF 3.5
With out
http://img43.imageshack.us/img43/1708/without.png

And then with

http://img66.imageshack.us/img66/1433/withx.png

That is strange it is not until I install code2000 which did give me a load more than I had prior, that it recognises I can have U+0971 with other font sets. Something is going on here …

Well still not really sure what I’m talking about but the menu page from here http://www.alanwood.net/unicode/index.html has some character sets I still can’t render. I suspect a little reading would solve that. i.e I noticed the same person that did code2000 does a code2001 that does Aegean Numbers as for me needing them shrug hehe…

OK well I’ll leave you to it.

Looks that Code 2000 has a lot. But did you notice that again the exact info about what points are there is missing? And again, I am not searching for Devanagari, but for (what the Unicode website you also found calls) Latin Extended Additional.

The Code2000 person also talks about ‘Basic Multilingual Plane of Unicode’, which I do not understand at all. IMHO there are no planes in Unicode. The planes where part of the old ISO solution which had several 0080 - 00FF tables for different languages. Unicode has them all in one big table.

He also warns for things like ‘Code2000 can populate charts for Oriya and Kannada — but can not yet properly display text in those scripts.’ I do not know if e.g. a web page has such characters and you have a font that does display text correct and C2000, if the correct displaying font is chosen by the browser.
So because it contains unfinished parts, it could corrupt your installation.

To find out one has to test. And for testing I would like to have the tool I asked for lol!

I think I can find out what points are missing from here Unicode characters not supported by the Code2000 font

As for Kannada script - Wikipedia, the free encyclopedia I don’t get sqrs but god knows, whether that is due to code2000.

And Latin Extended Additional - Test for Unicode support in Web browsers renders fine with code2000 and without I seem to have a few providing it…
http://img403.imageshack.us/img403/4862/latin.png

Hi
What desktop are you using? If Gnome, use the Character Map application
to view all your fonts, I see the following info in Character Details;


ṛ

U+1E5B LATIN SMALL LETTER R WITH DOT BELOW

General Character Properties

In Unicode since: 1.1
Unicode category: Letter, Lowercase
Canonical decomposition: U+0072 LATIN SMALL LETTER R + U+0323 COMBINING
DOT BELOW

Various Useful Representations

UTF-8: 0xE1 0xB9 0x9B
UTF-16: 0x1E5B

C octal escaped UTF-8: \341\271\233
XML decimal entity: ṛ

Annotations and Cross References

Notes:
• Indic transliteration
• see ISO U+15919 <not assigned> on the use of dot below versus ring
below in Indic transliteration

See also:
• U+0325 COMBINING RING BELOW

Equivalents:
• U+0072 LATIN SMALL LETTER R U+0323 COMBINING DOT BELOW


Cheers Malcolm °¿° (Linux Counter #276890)
SUSE Linux Enterprise Desktop 11 (x86_64) Kernel 2.6.27.23-0.1-default
up 12 days 1:29, 2 users, load average: 0.23, 0.14, 0.09
GPU GeForce 8600 GTS Silent - Driver Version: 185.18.14

First a try: ṛṙṭṣ
These are four characters in the 1E00 - 1EFF range. I generated them in OO. Looks fine in OO, but cut and paste here gives open squares to me.

And this भारत , done the same way, works perfect to me.

@malcolmlewis.
That is a bunch of information from Gnome, but I am using KDE 3.5.
You not, by change, know the name of the program behind it?

@FM
The Kannada I have, but that is because of indic-fonts (or another one, I installed several others of those kind).
But the Latin Extended Additional are all squares.

In short, you have convinced me, I am going to install that C2000.

Hi
It’s gucharmap, I’m sure there is an equivalent KDE3 version…

Just had a look at the web forum page for your post and the First line
renders correct, the second line is gobble-de-gook using firefox and
default sans-serif.


Cheers Malcolm °¿° (Linux Counter #276890)
SUSE Linux Enterprise Desktop 11 (x86_64) Kernel 2.6.27.23-0.1-default
up 12 days 3:19, 2 users, load average: 0.15, 0.29, 0.17
GPU GeForce 8600 GTS Silent - Driver Version: 185.18.14

Mmm well I can get both of those without c2000 neither show up funny… So I’ll try the sets I’ve got…

ttf.msfonts
gsfonts
xorg-fonts-100dpi, 75dpi and misc

I suspect you have a FF tweak as it looks fine on my default 11.1 to. What happens if you just test by renaming .mozilla

http://img113.imageshack.us/img113/7046/suselinux20090709191748.png

I installed C2000. I tried allenwood’s test page and had squares. Then I tried FF instead of Konqui and, behold, it worked. Now the big question: I did not test FF on this problem earlier, so I still do not know if CC2000 did it, or the earlier insall of Gentium (but I will go on testing this, deinstalling C200, etc.).

@FF. Those Allenwood pages are as good as a local tool I think. (Using FF that is).

@Malcolm. I will look into gucharmap, thanks for that one.
That you see the first test line (with the dotted r and t and so on) means you have such a font (as I have now two I think). That you do not see the Devanagari means you do NOT have such a font (I can recommend indic-fonts from the OSS repo :slight_smile: ).

Still strange that Konqui does see the indic-fonts from the moment I installed them and refuses to see the Gentium/C2000 ones where FF and OO do.

And as a last word I will admit that I tried to see this thread through FF because I wanted of course see my own test lines. BUT FF GIVES AN ERROR on forums.opensuse.org??? It warns for a redirection loop!!!

I’m going to take a complete stab in the dark here and guess that it would be konq font settings and being so tied into kde I would guess system font choices would make the change.

I just checked and konqueror is the same as FF I have no problem with either set and my Suse is very vanilla don’t even think it has media playback set up.

11.1 4.2.4 release 2 though so not apples and apples but I’m inclined to think a system setting over de choice.

More investigating lucida sans is providing indic… Well the basic(Missing the few we found out about) as it is the only one showing. As for the other not sure had plenty of fonts with it yet the top one in browser check didn’t give me a font.

Kde set with all at sans serif

Hi
As root user run fonts-config and then as user run fc-cache to update
your font files.


Cheers Malcolm °¿° (Linux Counter #276890)
SUSE Linux Enterprise Desktop 11 (x86_64) Kernel 2.6.27.23-0.1-default
up 12 days 4:04, 2 users, load average: 0.11, 0.06, 0.03
GPU GeForce 8600 GTS Silent - Driver Version: 185.18.14

Deinstalled C2000. FF still works (on Gentium?).

As I had installed C2000 for the user, I now installed it for the system. Konqui still wrong. openSUSE 10.3 and KDE 3.5 here. All at sans serif.

I have the Lucida sans already for years (for Devanagari), but installed the indic-fonts when someone on the forum asked for Bengali and I wanted to test it out.

I got the forums in FF. It seems that when I am still loged in in Konqui, I cannot use FF for the forums. But the error is ridiculous.

I now see the dotted characters in the forums, at last :stuck_out_tongue:

And indeed, I now tried the page on Wikipedia that originaly gave me squares on trancribed Sanskriet and that also renders correct.

So it seems that my problem was severely worsened by the fact that I tested with Konqui (which normaly does not let me down).

@malcolm. Just saw your latest post and run both tools. Will now log out/in to see what happens. Will be back.

Konqui still wrong!

Maybe I should switch over to FF as default browser. My wife did already and I never hear any complants about it.
That will make the use of NoScript also the default. Maybe I feel more secure then :wink:

@Malcolm
I installed gucharmap. Had a bit of a search in my KDEmenu to find it. It is under (I now translate back from dutch into english) Special Tools > Editor > Gnome Special Characters (Character Table). Bit strange place, but the tool is OK. I think that this is as close as you can get yo my original question. Thanks.