----------------------------------------------------------------------- 20Sep2002##########IndicComputing Bytes#########################Issue02 -----------------------------------------------------------------------
PEOPLE INTERESTED IN THE FIELD: This is an impressive list of people working/interested in the Indic Computing field. It is based on those who attended (or could not make it) for the Sept 15-16 Indic-Computing Workshop at Bangalore.
If you would like to get in touch with any of them, you can locate their contact details via Tapan S. Parikh tap2k@yahoo.com:
Dr. U B Pavanaja (Kannada Ganaka Parishad, Bangalore), Joseph Koshy (Hewlett-Packard, Bangalore), Brij Sethi (H-P Bangalore), Sunil Abraham (Mahiti, Bangalore) RVS Sastry (IISc, Bangalore) C.V. Srinatha Sastry (KGP, Bangalore), Kalika Bali (Picopeta Simputers, Bangalore), N Anitha (IISc, Bangalore), Abraham K Mathen (H-P, Bangalore), K Nagarajan (H-P, Bangalore), Sayamindu Dasgupta (ILUG-Calcutta).
Also on the list are Manoj R Annadurai and Aboo Thanish (Chennai Kavigal), Dr. Hema Murthy (IIT-Madras Chennai), Rajkumar S (Free Software Foundation, Kerala), Arun M (FSF, Tiruvananthapuram), Prof Pat Hall (Open University, London), G Karunakar (Netcore, Mumbai), Tapan Parikh (Mumbai), Venkatesh Hariharan (IndLinux, Mumbai), G. Nagarjuna (TIFR/FSF-Mumbai), Prakash Advani (Netcore, Mumbai), Raveesh Gupta (Microsoft, New Delhi), Ravi Kant and Pankaj Kaushal (Sarai, New Delhi), Mita Radhakrishnan and Tapas Desrousseaux (Aurovillle Language Lab, Pondicherry), Ashish Kotamkar (Mithi, Pune), Ravi Pande (font designer, Pune), Vijay Pratap Singh Aditya (Ahmedabad), Ms Neepa Shah (Gujarat Vidyapeeth, Ahmedabad), Dr Samir Kelekar (KonkaniNet, Goa/Bangalore), Susan Uskudarli (Bangalore), Abhas Abhinav and Vikram Singh (DeepRoot Linux, Bangalore), KSR Anjaneyulu (H-P, Bangalore), Durgesh Rao (NCST, Mumbai), Narasimha Murthy, TB Dinesh, CS Ramalingam, Naveen and Suzanne (H-P, Bangalore).
Other members who could not participate, but are interested in/working on the subject are:
Bala Pillai (Tamil Net, Australia), Manoranjan Kumar Singh (NCST, Bangalore), CV Radhakrishnan (River Valley Technologies, Kerala), Dr Srinath Srinivasa (IIIT-B, Bangalore), Dr Vinay L Deshpande (Ncore Technologies, Blore), Prof Swami Manohar (Picopeta Simputers, Blore), Dr Sri Ganesh and Prof A G Ramakrishnan (H-P, Banglore), Abhijit Das (IISc-Bangalore), Swayandipta Pal Chaudhuri (Perl Mongers, Calcutta), Vinay Chhajalani (Webduniya, Indore) Suresh Babu (INAPP Thiruvananthapuram), Baiju M (FSF-Tvm), Keyur Shroff (NCST Mumbai), Srinath Shanbag (NCST Mumbai), Dr. Pushpak Bhattacharya (IIT-Bombay, Mumbai), Osama Manzar (4Cplus.net New Delhi), Aman Grewal (CHiPS Raipur), M K Saravanan (Centre for Singapore Internet Research), Frank Pohlmann, Mahesh Pai, Edward Cherlin, Owen Taylor, Eric Mader, Gaspar Sinai (Yudit), Deborah W Anderson (Script Encoding Initiative), Free Standards Group, Asmus Freytag and Joseph Becker and Kenneth Whistler (Unicode), Prof Ken Kenniston (MIT), Supreet (Sarai). +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
INDIAN TONGUES, NOT AVAILABLE: Dulce Felix dulce@cybermultilingual.com of http://www.cityradio.nu offers submissions to Japanese search engines, Chinese search engines, German search engines, Hispanic search engines etc.
Felix says: "Please note that at this point we do not provide website promotion services in any of the Indian languages." Chinese, Korean and Japanese are among the Asian languages offered. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
COMMENT FROM KOLKATA: In a discussion arising on the Linux-Bangalore non-tech list linux-bangalore-non-tech@yahoogroups.com P.K.Sharma pksharma@cal.vsnl.net.in of Calcutta had a point to make.
Responding to a report on the recent Bangalore Indic-Computing meet, he argued: "I find this info quite useful. In Calcutta we are working on bringing Bengali into Linux. A member claims success in it too!..." +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
LINKS TO GETTEXT AND EMACS INTERNATIONALIZATION: Richard Stallman rms@gnu.org founder of the Free Software Foundation (FSF), responded to a query about who were the right persons to contact re. the internationalization of GNU/Linux (specially to Indian languages). He wrote: "The maintainer of GNU Gettext is haible@ilog.fr. handa@etl.go.jp works oninternationalization of Emacs." Maybe we should be contacting such quarters more regularly, to place our concerns in mind. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
NOTES FROM AN AMAZING LINGUIST IN THE US: Edward Cherlin edward@webforhumans.com creates international, multilingual Web sites, and is active in Internationalization standards and implementation. He's based in Cupertino CA 95014
He offered some interesting comments:
ON FONTS AND TOOLS: Responding to Dr Pavanaja's point that Pfaedit can create only glyph sets and cannot make an Opentype font with embedded tables for glyph substitution, glyph positioning, distance, etc, Cherlin argues: "Right. However, it is open source, so adding the ability to write Opentype tables should be straightforward. See also GOTE (GNU OpenType Editor, currently described as "rather alpha"."
"There are other commercial font editors. Fontlab 4.0 from Tiro Typeworks can create Opentype fonts, and in a future version will be able to handle non-BMP character codes," he says.
"Graphite's (a good toolkit for rendering) developers are trying to revive it, perhaps in combination with Pango, which has joined Li18nux, which has joined the Free Standards Group."
WITH AN OPENTYPE FONT AND RENDERING MECHANISM, WRITING A KEYBOARD DRIVER IS QUITE EASY: Says Dr Cherlin: "Right. For Unix, it is a matter of looking up the correct codes to enter into a text file. Mac is more work, and Windows requires membership in MSDN to handle keyboard layouts completely. Tavultesoft Keyman is a free program to create keyboard layouts, but it operates at a different level from Microsoft's own keyboards."
SORTING TEXT: "Text to be sorted must go through several steps before strings can be compared. UTR#10 discusses preprocessing, normalization, array formation, and forming sort keys. There is also consideration of 'override mechanisms (tailoring) for creating language-specific orderings.'," says Cherlin.
Dr Cherlin has written a market research study, "Non-Latin Font Technology and Markets" (1990), and in 1994, wrote and published a study, "The Worldwide Impact of the Unicode Character Set Standard". He is in the process of taking over maintenance of the Unicode HOWTO for Linux from Bruno Haible.
Some of the languages he has learnt in life include Hebrew at the synagogue starting at age eight, a year of Latin in eighth grade, French and Russian in High School, Swahili and a little Chinese in an after-school club, more French and Russian in college, Korean in the Peace Corps, Japanese in Japan, a little Pali and Sanskrit in his Buddhist training, Chinese at Durham University in the UK, APL from his father, Tolkien's Dwarvish and Elvish, Classical Greek (Euclid), Yiddish, Spanish, German, and a little Italian and Portuguese on his own, the invented language Loglan on his won, the invented language Lojban with the Logical Language Group, Various Slavic languages plus Georgian and Armenian with the Slavyanka Russian Chorus, Tabla bols in both Devanagari and Arabic script. Amazing! He is currently helping Tex Texin on his Compelling Unicode Demo with Yiddish, Cherokee, Azeri, and Burmese examples.
Says he: "If I had time, I would look at Farsi next, particularly the astronomical and mathematical works of Omar Khayyam, and of course his poetry, too. But for now I am sticking with writing systems rather than languages. I am creating a Unicode APL font, and prodding people to do the necessary Indic and South Asian Opentype fonts and rendering so that everyone else can get on with the real work." He's available for consulting contracts, or even a full-time job. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
MORE FROM CHERLIN: This is the factual position, as he desscribes it -- "India has 18 official languages written in 10 different alphabets: Devanagari (used for Hindi, Marathi, and others), Bengali, Gurmukhi (Punjabi), Gujarati, Oriya, Malayalam, Kannada, Tamil, Telugu, and Latin (English). In addition, more than 800 other languages spoken in India do not have official status. Mandrake Linux, one popular distribution, includes keyboards and fonts for Bengali, Devanagari, Gujarati, Gurmukhi, and Tamil, five of the nine Indic writing systems. Unfortunately, many applications do not accept these characters, and those that accept them may not handle them correctly." +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
WHAT'S THE REAL DIFFICULTY?: Edward Cherlin of Web for Humans, an international Web development company based in Cupertino, California, says, "The problems of rendering each standard Indic script are reasonably well understood, and will be solved soon in Pango. The real difficulty is with languages that have never been written, or are written in non-standard variants of the official scripts. The only organization I know of that has been working seriously on this problem is Summer Institute of Linguistics (SIL), and their work is stalled for lack of funds." Cherlin is active in Unicode, L18nux, Pango, Free Standards Group, and other organizations working on Indic and other unsupported writing systems, especially on the problem of getting all of the interested parties into contact with each other. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
NOTE, SOMETHING ABOUT PANGO AND LI18NUX: Owen Taylor otaylor@redhat.com is the founder of the Pango project. Li18nux is working on standards for keyboard input, among other things, in conjunction with the Linux Standard Base of the Free Standards Group. They have focused first on Input Methods for Chinese, Japanese, and Korean, but when Pango's Indic support is complete they will extend their standard to include it.
At the toolkit level, Gtk and Qt are the most used toolkits. This helps. Gtk already has a good framework through Pango project, and basic level support for Indian languages. Qt also now has Unicode level support for all languages, but rendering is not yet ready. However, Pango is independent of Gtk, and can be used with Qt or any other software.
GNU, Li18nux and Pango are focusing on Opentype, which is the only font format that provides the glyph mapping tables needed to support Indic conjuncts. GOTE, the GNU OpenType Editor, will be the essential tool for this effort when it is completed. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
WinXP AND APPLE: Cherlin explains other issues too. As he puts it: "Some argue today that only Microsoft's WinXP has any kind of Indian language support worth speaking about, even though Apple has provided Indian language kits for many years.
"There is confusion about Unicode support for Indic writing systems, since Unicode does not provide character codes for conjunct glyphs. Many in India still think that this is a design flaw in Unicode, whereas the Unicode designers argue that it is a necessary design decision so that we can escape from the current broken Indic rendering techniques.
"The set of conjuncts is needed is not determined solely by the writing system and language. It is font-specific, and can therefore only be supported by font glyphs, not character encoding. Unfortunately, PostScript and TrueType fonts do not support the correct mapping tables, and the problem can only be solved with Opentype fonts.
"In contrast, rendering Indic scripts using PostScript or TrueType fonts requires encoding the conjuncts directly in the text stream, rather than the letters composing them, and requires non-standard software to translate between the sequence of letters from the keyboard and the sequence of conjunct characters in a non-standard font. The result is text that cannot be sorted and searched properly, where spelling and grammar checkers cannot operate. It is hard on users to have to wait so long for proper support of Indic scripts through Unicode, but the results are guaranteed to justify the delay." +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
TEX USERS' CONTRIBUTION: Indian TeX Users Group have a project now to fund font designers in all the Indian languages who are ready to write fonts and donate under GPL to TUGIndia. They've thus secured 'Keli' a Malayalam font family in various weights and shapes written by Hashim and released under GPL. "We do hope to get more fonts in other languages to fill up the gaps. We hope to use the savings generated with TUG2002 (to be held in India in September 2002) exclusively for this purpose," says Radhakrishnan in Thiruvananthapuram. Maybe these friends should get in touch with Pango and Li18nux. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
UNITYPE GLOBAL OFFICE, AN ADD-ON TO MS-OFFICE: Cherlin suggests, for those who can afford it, Unitype Global Office, an add-on to Microsoft Office which supports Hindi, Marathi, Nepali, Sanskrit, Punjabi, Gujarati, Bengali, Assamese, Tamil, Telugu, Maldivian, Kannada, Malayalam, Urdu, Pashto, Dari, and many other languages. See http://www.unitype.com/globaloffice.htm. Although it uses non-Unicode encoded fonts and a non-standard rendering engine, Global Office and Microsoft Office together are capable of writing Unicode files that can be viewed correctly with Opentype Indic fonts when they become available. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
HARDLY ANY GPL-ed: SIL's Fonts in Cyberspace pages at http://www.sil.org/computing/fonts/ and Alan Wood's Unicode Resources at http://www.hclrss.demon.co.uk/unicode/fontsbyrange.html both list fonts for every major writing system, but hardly any are GPL-ed. This is about to change, according to Cherlin. "Several projects and numerous individuals are working on Free Unicode fonts, now that commercial Opentype font editors such as Tiro Typeworks Fontlab 4.0 are available. Finishing the GNU OpenType Editor (GOTE) will speed things up much more." +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
SOME INTERESTING LINKS: Pango http://www.pango.org Graphite http://www.sil.org/computing/graphite/ Li18nux http://www.li18nux.org Free Standards Group http://www.freestandards.org/ Mandrake http://www.mandrake.com ----------------------------------------------------------------------- Compiled in public interest from material on the Net by: ----------------------------------------------------------------------- Frederick Noronha * Freelance Journalist * Goa * India 832.409490 / 409783 BYTESFORALL www.bytesforall.org * GNU-LINUX http://linuxinindia.pitas.com Email fred@bytesforall.org * Mobile +9822 122436 (Goa) * Saligao Goa India Writing with a difference... on what makes *the* difference