The Unicode Standard and Unicode Fonts

Unicode fonts are different from non-Unicode fonts because they are mapped using Unicode codes, allowing many more characters to be included compared to ANSI mapping. Unicode also allows scripting, which substitutes characters when a certain combination is typed.

Currently at version 5.0, the Unicode standard makes typing in Gujarati very easy. Microsoft Windows offers support for Unicode fonts at the system level, making it possible to type filesnames in Gujarati, search in Gujarati, and even change the whole interface of Windows with Gujarati menus and dialog boxes. Microsoft Windows provides the Shruti font, the default Gujarati Unicode font. The font is used in conjunction with the Gujarati keyboard layout designed specifically for the language. The keyboard layout will be discussed later.

Unicode fonts are an attempt to map all the characters of all the languages in character tables. The site Unicode Character Code Chart by Script lists all the languages and their respective code ranges. Each language has fixed range of codes assigned them. However, Unicode allows flexibility with ligature (or character) substitutions.

Languages in Unicode Fonts

Not every Unicode font has all characters of each language, however. Some Unicode fonts are specific to a language. Shruti, for example, has characters for English and Gujarati. Windows 7 contains many new fonts, but most are for Devanagari and other Indic languages. Unfortunately, there aren't many Gujarati Unicode fonts available as most developed are non-Unicode. Arial Unicode MS, weighing in at 20 MB, is one font that contains multiple languages, including the Latin family, Indic family, Hebrew, Greek, and East Asian languages. The image on the right shows the Symbol dialog box for inserting characters found in Microsoft Word. You can click on Subset to see which languages are included with a particular font. Some of the Indic languages are shown for Arial Unicode MS in the image.

Ligature Substitutions and Positioning

A well developed font has thoroughly defined ligature substitution tables. These substitution tables are basically "scripts" that automatically process your input and substitute appropriate characters when a certain sequence of characters are typed. Punctuation positions can also be modified as necessary with these tables.

shcha

A great example is the substitution of શ + ચ to yield શ્ચ. The table above shows the variety in the way this conjunct can be formed. Shruti font yields a single character which is seen most often in type. Arial Unicode MS and Lohit Gujarati both substitute the combination with two separate ligatures, one half form and one full form. These are probably more often used in writing. The difference is determined by the developers of the font. Note that શ and ચ have fixed character codes defined by the standard (see Tables 2 and 5), while શ્ચ would be defined with substitution tables created by the developer. Hence, there is rigidity with the basic characters, but flexibility with the half forms, conjuncts, and certain punctuation positioning.

Punctuation Positioning

The placement of the "reph" (the character for half form of an R) is an example of punctuation positioning. In ર્દ, we see the reph is aligned to the middle of દ. In ર્પ, it's aligned to the right.

Gujarati Unicode Tables

The Tables 1, 2, and 3 below show each Gujarati character and their Unicode code assignment. Tables 4, 5, and 6 also show the names of these characters with Windows ALT codes and HEX codes. In Windows applications, ALT codes can be used to input the character with the combination of ALT+Code. Unicode codes can be used to input Gujarati in HTML. Using a Gujarati keyboard layout, however, one does not need any of these codes.

Table 1: Gujarati Vowels
અ આ ઇ ઈ ઉ ઊ ઋ
ૠ ઍ એ ઐ ઑ ઓ ઔ

Table 2: Gujarati Consonants
ક ખ ગ ઘ ઙ ચ છ જ ઝ ઞ
ટ ઠ ડ ઢ ણ ત થ દ ધ ન
પ ફ બ ભ મ ય ર લ ળ વ
શ ષ સ હ

Table 3: Gujarati Punctuations (shown with ક as an example)
કઁ કં કઃ ક઼ કઽ કા કિ કી કુ કૂ
ઁ ં ઃ ઼ ઽ ા િ ી ુ ૂ
કૃ કૄ કૅ કે કૈ કૉ કો કૌ ક્
ૃ ૄ ૅ ે ૈ ૉ ો ૌ ્

Table 4: Numbers
૦ ૧ ૨ ૩ ૪ ૫ ૬ ૭ ૮ ૯
੬ ૱


The following tables list the names of each Gujarati character as they are defined in the Unicode tables, along with their ANSI codes, which are used for Microsoft Windows based computers, and HEX codes, which are used for Linux variants and Mac OS X.