Working towards a unified text layout engine for the free desktop software stack

I attended a meeting October last year in Boston where various parties dealing with text layouting met. This included members of the Pango/Gtk+, KOffice, OpenOffice/ICU and Scribus projects. Right now we duplicate a fair amount of work to support complex scripts and exotic languages. Only Pango and Qt share some code already that originates from the FreeType project for interpreting OpenType tables. On that meeting we agreed that it would be nice to share more of the work, to work on a common layer that can support not only OpenType but also SIL or Apple's AAT. And most importantly to provide a single place in the free desktop software stack to add support for new complex languages and provide consistent behavior across applications when it comes to text shaping.

As a first important step Trolltech decided to relicense and contribute their existing code. Lars and I have been working on this in the past month(s) and we now finally got around pushing out our changes to a git repository on Based on the existing HarfBuzz code we've created a first version of a common API, ported our shaping engines that operate on top of OpenType and provide some fall backs if a font does not provide the necessary tables for shaping.

So now it's screenshot time to demonstrate the beauty of shaped complex text. Thanks to frequent recent visits at a Sri Lankian restaurant here in Oslo and Girish's help I decided to use some Tamil words for demonstration. We usually start the meal with a some of non-sweet Donut called Vadai (வடை). The "main" meal then consists of a kind of crêpe called Dosa (தோசை) where pieces are teared off with your hands and dipped into some chutney or Sambar (சாம்பார்). You'd be surprised to see how well Trolltech engineers manage to eat with just their hands! Here's a picture linked in from Wikipedia that shows this kind of pan-cake:


With our changes Harfbuzz now shapes the word Dosa like this (rendered using FreeType):

Rendering of Dosa

If a toolkit renders the characters just individually it'll appear incorrect and look like this:

Incorrect rendering of Dosa

As you can see by comparing the rendered glyphs they sometimes need to be swapped. This re-ordering can happen on the level of characters (think QChars in a QString) as well as on a level of glyph indices. Without this procedure the resulting glyphs form just garbage and don't make any sense. (Well, they don't make sense to me either way because I unfortunately can't read Tamil, but Girish confirmed that only the first form is correct).

The delicious Sambar that we dip Dosa pieces in looks shaped with Harfbuzz like this:

Rendering of Sambar

This is the incorrect unshaped rendering for comparison

Incorrect rendering of Sambar

... where you can see that the dot marks are not placed nicely above the letters where they belong to.

Now none of this is a really new feature, the free software desktops as well as Mac OS X and Windows have been able to support these kind of advanced text rendering techniques for a while (although it's sad to see that for example Ubuntu's default shipped Firefox seems to render this correctly only if I set my locale to some indic one while Konqueror gets it always right). But as scripts change and as the software needs to be adapted sometimes it is even more important that this can be done in one central place. For example we've recently received patches for Qt that adapt tables in the shaper to recent developments in the Bengali script to permit certain previously disallowed combinations of vowels and consonants. If we succeed with Harfbuzz then it will only be necessary to adjust the software at one place instead of patching Qt, OpenOffice, Pango and perhaps others.

Blog Topics: