Special chars, how to type

List overview All Threads
Download

newer

older

How to configure a Diskless node...

Re: [ILUG-BOM] [OT} IRC Channel...

Manish Jethani

31 Jul 2001 31 Jul '01

3:29 p.m.

I would like to know how I can type special characters, like the copyright and trademark symbols, on the terminal. There must be a set of key combinations documented somewhere. I believe these characters fall within the ASCII range.

Please don't give emacs-specific solutions. I don't use emacs.

Tnx.

Manish

Show replies by date

Philip S Tellis

31 Jul 31 Jul

4:01 p.m.

On Tue, 31 Jul 2001, Manish Jethani wrote:

...

I would like to know how I can type special characters, like the copyright and trademark symbols, on the terminal. There must be a set of key combinations documented somewhere. I believe these characters fall within the ASCII range.

I don't think they fall in the ASCII range. Remember, ASCII is 7 bit (0-127). The first 31 are non-printable control characters. 127 is backspace. There are 62 alphanumeric values, and 32 non alnum printables. I think that accounts for all of them. (c) and TM don't fall in here.

Philip

-- When you don't know what to do, walk fast and look worried. Visit my webpage at http://www.ncst.ernet.in/~philip/ Read my writings at http://www.ncst.ernet.in/~philip/writings/ MSN philiptellis Yahoo! philiptellis

Keyur Shroff

8:17 p.m.

On Tue, 31 Jul 2001, Manish Jethani wrote:

...

I would like to know how I can type special characters, like the copyright and trademark symbols, on the terminal. There must be a set of key combinations documented somewhere. I believe these characters fall within the ASCII range.

Please don't give emacs-specific solutions. I don't use emacs.

No. These characters don't fall within ASCII range. In fact copyright symbol has Unicode value U+00A9 and TradeMark symbol has Unicode value U+2122. Not all fonts contains these characters. So first of all make sure that your font contains these characters.

If you want non-X solution then you can define a keyboard map and load it using 'loadkeys' utility. In your keyboard map define a compose key as follows : alt keycode 100 = Compose

You can see keycode of a particualr key by using 'showkey' utility. Here we are using Alt and AltGr key combination (Left and Right Alt keys together) as combining key.

Then you can define your compose defination as compose 'c', 'o' to copyright

Instead of creating new keyboard map, you can use an existing keyboard map by sending output of 'dumpkeys' to some file and then append the above two lines to that file.

e.g., dumpkeys > mykeyboard.map

Now load the keyboard map using loadkeys mykeyboard.map

Finally to type copyright symbol, press compose key (Alt AltGr combination in this case), followed by 'c' and then 'o'.

For more information see man keymaps man showkey man dumpkeys man loadkeys

In X You can see all glyphs in your font by running xfd -fn "<your XLFD font name>"

XLFD (X Logical Font Description) font name is that which you can see by running 'xlsfonts'.

If you are using Unicode font then you can easily bind this Unicode value to some key and then use that keyboard map to type that particular character.

If you are not using Unicode font then you must know which character code is mapped to that glyph code and then bind that character code value to some key code.

Apart from this there is a concept of combining characters. You can use xkb utility to load a keyboard map in which you have predefined combining characters and a combining key. Load that particular keyboard map. XIM (X Input Method) will take care of these combining characters.

- Keyur.

Manisha Joshi

9:26 p.m.

hi, I don't know much abt the multi lingual support X-windows provides. I wanted to know abt the XIM (X input method) and the xkb support for Indian langauges. Do we have to have ur own XIM installed to support input in Devnagri ? What is this XIM all about, is it like a key-board handler routine which maps conjuncts and combination of characters to a single glyph (which is required in Devnagri) ?

manisha

----- Original Message ----- From: "Keyur Shroff" keyur@konark.ncst.ernet.in To: linuxers@mm.ilug-bom.org.in Sent: Tuesday, July 31, 2001 3:07 PM Subject: Re: [ILUG-BOM] Special chars, how to type

...

On Tue, 31 Jul 2001, Manish Jethani wrote:

...
I would like to know how I can type special characters, like the copyright and trademark symbols, on the terminal. There must be a set of key combinations documented somewhere. I believe these characters fall within the ASCII range.

Please don't give emacs-specific solutions. I don't use emacs.

No. These characters don't fall within ASCII range. In fact copyright symbol has Unicode value U+00A9 and TradeMark symbol has Unicode value U+2122. Not all fonts contains these characters. So first of all make sure that your font contains these characters.

If you want non-X solution then you can define a keyboard map and load it using 'loadkeys' utility. In your keyboard map define a compose key as follows : alt keycode 100 = Compose

You can see keycode of a particualr key by using 'showkey' utility. Here we are using Alt and AltGr key combination (Left and Right Alt keys together) as combining key.

Then you can define your compose defination as compose 'c', 'o' to copyright

Instead of creating new keyboard map, you can use an existing keyboard map by sending output of 'dumpkeys' to some file and then append the above two lines to that file.

e.g., dumpkeys > mykeyboard.map

Now load the keyboard map using loadkeys mykeyboard.map

Finally to type copyright symbol, press compose key (Alt AltGr combination in this case), followed by 'c' and then 'o'.

For more information see man keymaps man showkey man dumpkeys man loadkeys

In X You can see all glyphs in your font by running xfd -fn "<your XLFD font name>"

XLFD (X Logical Font Description) font name is that which you can see by running 'xlsfonts'.

If you are using Unicode font then you can easily bind this Unicode value to some key and then use that keyboard map to type that particular character.

If you are not using Unicode font then you must know which character code is mapped to that glyph code and then bind that character code value to some key code.

Apart from this there is a concept of combining characters. You can use xkb utility to load a keyboard map in which you have predefined combining characters and a combining key. Load that particular keyboard map. XIM (X Input Method) will take care of these combining characters.

Keyur.

Linuxers mailing list Linuxers@mm.ilug-bom.org.in http://mm.ilug-bom.org.in/mailman/listinfo/linuxers

Keyur Shroff

1 Aug 1 Aug

5:23 p.m.

On Tue, 31 Jul 2001, Manisha Joshi wrote:

...

I wanted to know abt the XIM (X input method) and the xkb support for Indian langauges.

XKB keyboard description files are in /usr/X11R6/lib/X11/xkb/symbols directory. In the newest X releases (X11R61 and higher) there are two "standard" input methods: the original one, working through the xmodmap utility, and the new one called Xkb (X KeyBoard). Both require a keyboard description file. With xmodmap you can directly use keyboard map that you have defined. xkb requires you to compile keyboard defination file and create one with .xkm extention. This .xkm file can be used later on by xkb.

...

Do we have to have ur own XIM installed to support input in Devnagri ? What is this XIM all about, is it like a key-board handler routine which maps conjuncts and combination of characters to a single glyph (which is required in Devnagri) ?

XIM/XOM support was built into Xlib originally by Japanese people to enable complex Japanese language. It was originally designed and developed with CJK (Chinese-Japanese-Korean) script in their mind. In XIM an input method server runs that handles all input mechanism. XIM is not very much suitable for Indic scripts and it is overcomplicated. Moreover there is no document available describing how to write an XIM. XOM too is not very much suitable for Indic script as it is built on top of underlying "draw string" functions in Xlib. The underlying API functions in Xlib don't pass information that is required for our complex Indic scripts. For example, Indic scripts are phonetic and characters may change their position and appearance as the new characters are typed in. There is no client or server side support available to handle such thing.

As XIM and XOM are not very much suitable for Indic script, providing a solution based on this is very difficult.

There are two alternative solutions to handle these things. The first is to modify underlying framework (sometimes called "soft changes") so that "draw string" functions change their behaviour to render Indic script characters _properly_. The other solution is to completely bypass "draw string" functions and introduce new API in Xlib so that client can pass necessary informations to the server.

Documents on XOM/XIM API are available with source code of XFree86.

- Keyur

Manish Jethani

31 Jul 31 Jul

11:04 p.m.

Sometime today, Keyur Shroff wrote:

...

No. These characters don't fall within ASCII range. In fact copyright symbol has Unicode value U+00A9 and TradeMark symbol has Unicode value U+2122. Not all fonts contains these

Thanks, Keyur. That was really comprehensive. I'll try it out, though I may not want to use it since these chars don't fall in the ASCII range.

BTW what are the chars with the 8th bit set (128-255) doing? Is that by any means standard? I've seen fonts having the (C) and (TM) symbols lurking somewhere in that range.

Manish J.

Philip S Tellis

11:24 p.m.

Sometime Today, Manish Jethani assembled some asciibets to say:

...

BTW what are the chars with the 8th bit set (128-255) doing? Is that by any means standard? I've seen fonts having the (C) and (TM) symbols lurking somewhere in that range.

For the IBM PC, those are Extended ASCII, or something like that. Basically, anything with the 7th bit (there is no 8th bit) set is not portable across devices. Some terminals use 7 bit characters with one parity bit, others use the 7th bit as a stop bit. In general, the 7th bit is stripped from a character to get its ASCII code.

Philip

-- But let me tell you, the slim lazy Homer you knew is dead. Now I'm a big fat dynamo. -- Homer Simpson King-Size Homer Visit my webpage at http://www.ncst.ernet.in/~philip/ Read my writings at http://www.ncst.ernet.in/~philip/writings/ MSN philiptellis Yahoo! philiptellis

Manish Jethani

1 Aug 1 Aug

12:59 a.m.

Sometime today, Philip S Tellis wrote:

...

Basically, anything with the 7th bit (there is no 8th bit) set

87654321 -------- 01111111 = 127 = backspace ^ That's the 7th bit? The lowest is 0th, is it?

...

In general, the 7th bit is stripped from a character to get its ASCII code.

Tnx.

Manish J.

Philip S Tellis

10:53 a.m.

Sometime on Jul 31, Manish Jethani assembled some asciibets to say:

...

87654321

01111111 = 127 = backspace ^ That's the 7th bit? The lowest is 0th, is it?

Yup. Welcome to the wholesome world of computer scientists. Whole numbers start with zero. Counting starts with zero. Indices start with zero. Computers understand zero.

Philip

-- This dungeon is owned and operated by Frobozz Magic Co., Ltd. Visit my webpage at http://www.ncst.ernet.in/~philip/ Read my writings at http://www.ncst.ernet.in/~philip/writings/ MSN philiptellis Yahoo! philiptellis

Keyur Shroff

3:20 p.m.

On Tue, 31 Jul 2001, Philip S Tellis wrote:

...

Sometime Today, Manish Jethani assembled some asciibets to say:

...
BTW what are the chars with the 8th bit set (128-255) doing? Is that by any means standard? I've seen fonts having the (C) and (TM) symbols lurking somewhere in that range.

For the IBM PC, those are Extended ASCII, or something like that. Basically, anything with the 7th bit (there is no 8th bit) set is not portable across devices. Some terminals use 7 bit characters with one parity bit, others use the 7th bit as a stop bit. In general, the 7th bit is stripped from a character to get its ASCII code.

Three variables should be set on order to make bash understand the 8-bit characters. The best place is ~/.inputrc file. The following should be set:

set meta-flag on set convert-meta off set output-meta on

For terminal, set stty pass8 or stty -istrip cs8

- Keyur.

Keyur Shroff

3:11 p.m.

On Tue, 31 Jul 2001, Manish Jethani wrote:

...

BTW what are the chars with the 8th bit set (128-255) doing? Is that by any means standard? I've seen fonts having the (C) and (TM) symbols lurking somewhere in that range.

Manish J.

Before ISO-10646 standard came, there were ISO-8859-* standards. In each of these standard the range 0-127 was reserved for ASCII characters and range 128-256 was reserved for other foreign language characters. For example, the following codesets are defined:

8859-1 - Europe, Latin America (also known as Latin 1) 8859-2 - Eastern Europe 8859-5 - Cyrillic 8859-8 - Hebrew

This way, if a codeset have copyright or trademark character then it will fall in the extended-ASCII range (128-256). These ISO-8859-* standards are very much similar to our ISCII (Indian Script Code for Information Interchange) where the range 128-256 is kept reserved for various Indic scripts. When a user select Hindi then Hindi characters take their place in the range and if the user select Tamil then Tamil characters take their place in the same range. The advantage of this scheme (ISCII as well as ISO-8859-*) is that trasliteration from one script to another script is possible directly. However there is a disadvantage that not more than two scripts can be displayed simultaneously because of overlapping of character codes.

The international standard ISO-10646 defines the Universal Character Set (UCS). UCS is a superset of all other character set standards. UCS contains the characters required to represent practically all known languages. Not all systems are expected to support all the advanced mechanisms of UCS such as combining characters. Therefore, ISO-10646 specifies the following three implementation levels:

Level 1 : Combining characters and Hangul Jamo characters (a special, more complicated encoding of the Korean script, where Hangul syllables are coded as two or three subcharacters) are not supported.

Level 2 : Like level 1, however in some scripts, a fixed list of combining characters is now allowed (e.g., for Hebrew, Arabic, Devangari, Bengali, Gurmukhi, Gujarati, Oriya, Tamil, Telugo, Kannada, Malayalam, Thai and Lao). These scripts cannot be represented adequately in UCS without support for at least certain combining characters.

Level 3 : All UCS characters are supported, such that for example mathematicians can place a tilde or an arrow (or both) on any arbitrary character.

The Unicode Standard published by the Unicode Consortium contains exactly the ISO 10646-1 Basic Multilingual Plane at implementation level 3. All characters are at the same positions and have the same names in both standards.

The Unicode Standard defines in addition much more semantics associated with some of the characters and is in general a better reference for implementors of high-quality typographic publishing systems. Unicode specifies algorithms for rendering presentation forms of some scripts (say Arabic), handling of bi-directional texts that mix for instance Latin and Hebrew, algorithms for sorting and string comparison, and much more.

The ISO 10646 standard on the other hand is not much more than a simple character set table, comparable to the well-known ISO 8859 standard. It specifies some terminology related to the standard, defines some encoding alternatives, and it contains specifications of how to use UCS in connection with other established ISO standards such as ISO 6429 and ISO 2022. There are other closely related ISO standards, for instance ISO 14651 on sorting UCS strings.

- Keyur

8543

Age (days ago)

8544

Last active (days ago)

linuxers@mm.ilug-bom.org.in

10 comments

4 participants

tags (0)

participants (4)

Keyur Shroff
Manish Jethani
Manisha Joshi
Philip S Tellis