----- Forwarded message from Pushpak Bhattacharya pb@cse.iitb.ac.in -----
Date: Sun, 2 Feb 2003 22:13:52 +0530 (IST) From: Pushpak Bhattacharya pb@cse.iitb.ac.in X-Scanned: By Symantec Carrier Scan Server Subject: AI Seminar To: btech3@cse.iitb.ac.in, btech4@cse.iitb.ac.in, mtech1@cse.iitb.ac.in, mtech2@cse.iitb.ac.in, mtech01@it, mtech02@it
I am giving a talk in the CSE seminar hall at 3.30 PM on Monday (tomorrow; particulars below). You might be interested in attending it.
Title: ----- Development and Application of Indian Language Wordnets.
Abstract: -------- It is now an accepted fact that no meaningful research and development in natural language processing, information extraction and machine translation can be carried out without WORDNETS. In India, wordnets are being built for the following languages: Hindi and Marathi at IIT Bombay, Tamil at Anna University Knowledge Based Center Chennai, Gujarathi at MS University Baroda (under the supervision of IIT Bombay), Oriya and Sanskrit at Utkal University Bhubaneswar and Bengali at IIT Kharagpur. The Hindi wordnet- which pioneered all these efforts- is at an advanced stage of development with about 11000 semantically linked synsets and with the associated software and the user interface. The final aim is to construct wordnets for Indian languages, link them internally to produce the INDO-WORDNET and then link the Indo-wordnet with the English wordnet and the Euro-Wordnet (a conglommeration of European language wordnets). This will facilitate automatic translation, multilingual information search and multilingual text processing. The Indo-wordnet project is being spearheaded by IIT Bombay.
In the presentation with the above title, we describe the computational and linguistic issues involved in building Indian language wordnets with specific reference to Hindi and Marathi. Synsets are the building blocks of any wordnet, and the basic principles behind construcing the synsets, viz., Minimality, Coverage and Replacability will be discussed. In Indian language computing, verbs pose difficult problems- particularly in the storage structure of the wordnet- due to compounding, causative formation and onomaetopia. Our method of dealing with verbs will be spelt out. Certain semantic relations of the English and the Euro-wordnet have been further refined in the Hindi wordnet structure. Also cross-part of speech semantic relations have been explicitly set up for facilitating word sense disambiguation.
We conclude the talk with two important applications of the wordnet- word sense disambiguation and automatic lexicon generation. We have used the verb noun associations to disambiguate the nouns through a semantic cluster formed in the wordnet. As for the second application, the wordnet hierarchy is used to create the disambiguators and the semantic attributes of the words in the document. This facilitates extracting knowledge from the document and automatic translation.
Dr. Pushpak Bhattacharyya Department of Computer Science and Engineering Indian Institute of Technology Mumbai- 400 076 India.
Tel: 91-22-25767718 (o), 25768718 (r), 25721955 (r) Fax: 91-22-25720290/25723480 email: pb@cse.iitb.ac.in homepage: http://www.cse.iitb.ac.in/~pb
----- End forwarded message -----