Re: [ILUG-BOM] Special chars, how to type

1 Aug 2001


      On Tue, 31 Jul 2001, Manish Jethani wrote:
...
BTW what are the chars with the 8th bit set (128-255) doing?  Is
that by any means standard?  I've seen fonts having the (C) and
(TM) symbols lurking somewhere in that range.
Manish J.
Before ISO-10646 standard came, there were ISO-8859-* standards. In
each of these standard the range 0-127 was reserved for ASCII
characters and range 128-256 was reserved for other foreign language
characters. For example, the following codesets are defined:
8859-1 - Europe, Latin America (also known as Latin 1)
8859-2 - Eastern Europe
8859-5 - Cyrillic
8859-8 - Hebrew
This way, if a codeset have copyright or trademark character then 
it will fall in the extended-ASCII range (128-256). These ISO-8859-*
standards are very much similar to our ISCII (Indian Script Code for
Information Interchange) where the range 128-256 is kept reserved for
various Indic scripts. When a user select Hindi then Hindi characters
take their place in the range and if the user select Tamil then Tamil
characters take their place in the same range. The advantage of this
scheme (ISCII as well as ISO-8859-*) is that trasliteration from one
script to another script is possible directly. However there is a
disadvantage that not more than two scripts can be displayed
simultaneously because of overlapping of character codes.
The international standard ISO-10646 defines the Universal Character
Set (UCS). UCS is a superset of all other character set standards. UCS
contains the characters required to represent practically all known
languages. Not all systems are expected to support all the advanced
mechanisms of UCS such as combining characters. Therefore, ISO-10646
specifies the following three implementation levels:
Level 1 :
Combining characters and Hangul Jamo characters (a special,
more complicated encoding of the Korean script, where Hangul syllables
are coded as two or three subcharacters) are not supported.
Level 2 :
Like level 1, however in some scripts, a fixed list of combining
characters is now allowed (e.g., for Hebrew, Arabic, Devangari,
Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugo, Kannada, Malayalam,
Thai and Lao). These scripts cannot be represented adequately in UCS
without support for at least certain combining characters.
Level 3 : 
All UCS characters are supported, such that for example mathematicians
can place a tilde or an arrow (or both) on any arbitrary character.
The Unicode Standard published by the Unicode Consortium contains
exactly the ISO 10646-1 Basic Multilingual Plane at implementation
level 3. All characters are at the same positions and have the same
names in both standards.
The Unicode Standard defines in addition much more semantics
associated with some of the characters and is in general a better
reference for implementors of high-quality typographic publishing
systems. Unicode specifies algorithms for rendering presentation forms
of some scripts (say Arabic), handling of bi-directional texts that
mix for instance Latin and Hebrew, algorithms for sorting and string
comparison, and much more.
The ISO 10646 standard on the other hand is not much more than a
simple character set table, comparable to the well-known ISO 8859
standard. It specifies some terminology related to the standard,
defines some encoding alternatives, and it contains specifications of
how to use UCS in connection with other established ISO standards such
as ISO 6429 and ISO 2022. There are other closely related ISO
standards, for instance ISO 14651 on sorting UCS strings.
- Keyur

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

Re: [ILUG-BOM] Special chars, how to type