LREC 2000 2nd International Conference on Language Resources & Evaluation | |
Conference Papers
Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377. |
Previous Paper Next Paper
Title | The Concede Model for Lexical Databases |
Authors |
Erjavec Tomaz (Dept. for Intelligent Systems, Jozef Stefan Institute, Ljubljana, Slovenia, tomaz.erjavecg@ijs.si) Evans Roger (Information Technology Research Institute, University of Brighton, Lewes Rd, Brighton, UK, rags@itri.brighton.ac.uk, http:/www.itri.brighton.ac.uk/projects/rags) Ide Nancy (Department of Computer Science, Vassar College, Poughkeepsie, NY 12604-0520 USA, ide@cs.vassar.edu) Kilgarriff Adam (ITRI, University of Brighton, Brighton, England, adam@itri.bton.ac.uk) |
Keywords | Dictionary, Lexical Database, TEI, Up-Translation, XML |
Session | Session WP1 - Lexicon |
Abstract | The value of language resources is greatly enhanced if they share a common markup with an explicit minimal semantics. Achieving this goal for lexical databases is difficult, as large-scale resources can realistically only be obtained by up-translation from pre-existing dictionaries, each with its own proprietary structure. This paper describes the approach we have taken in the Concede project, which aims to develop compatible lexical databases for six Central and Eastern European languages. Starting with sample entries from original presentation-oriented electronic representations of dictionaries, we transformed the data into an intermediate TEI-compatible represen-tation to provide a common baseline for evaluating and comparing the dictionaries. We then developed a more restrictive encoding, formalised as an XML DTD with a clearly-defined semantic interpretation. We present this DTD and discuss a sample conversion from TEI, together with an application which hyperlinks a HTML representation of the dictionary to on-line concordancing over a corpus. |