LREC 2000 - Papers

LREC 2000 2^nd International Conference on Language Resources & Evaluation

Conference Papers

Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377.

List of all papers and abstracts.

Previous Paper Next Paper

Title The Concede Model for Lexical Databases

Authors Erjavec Tomaz (Dept. for Intelligent Systems, Jozef Stefan Institute, Ljubljana, Slovenia, tomaz.erjavecg@ijs.si)
Evans Roger (Information Technology Research Institute, University of Brighton, Lewes Rd, Brighton, UK, rags@itri.brighton.ac.uk, http:/www.itri.brighton.ac.uk/projects/rags)
Ide Nancy (Department of Computer Science, Vassar College, Poughkeepsie, NY 12604-0520 USA, ide@cs.vassar.edu)
Kilgarriff Adam (ITRI, University of Brighton, Brighton, England, adam@itri.bton.ac.uk)

Keywords Dictionary, Lexical Database, TEI, Up-Translation, XML

Session Session WP1 - Lexicon

Abstract The value of language resources is greatly enhanced if they share a common markup with an explicit minimal semantics. Achieving this goal for lexical databases is difficult, as large-scale resources can realistically only be obtained by up-translation from pre-existing dictionaries, each with its own proprietary structure. This paper describes the approach we have taken in the Concede project, which aims to develop compatible lexical databases for six Central and Eastern European languages. Starting with sample entries from original presentation-oriented electronic representations of dictionaries, we transformed the data into an intermediate TEI-compatible represen-tation to provide a common baseline for evaluating and comparing the dictionaries. We then developed a more restrictive encoding, formalised as an XML DTD with a clearly-defined semantic interpretation. We present this DTD and discuss a sample conversion from TEI, together with an application which hyperlinks a HTML representation of the dictionary to on-line concordancing over a corpus.

="Verdana">

Title	The Concede Model for Lexical Databases
Authors	Erjavec Tomaz (Dept. for Intelligent Systems, Jozef Stefan Institute, Ljubljana, Slovenia, tomaz.erjavecg@ijs.si) Evans Roger (Information Technology Research Institute, University of Brighton, Lewes Rd, Brighton, UK, rags@itri.brighton.ac.uk, http:/www.itri.brighton.ac.uk/projects/rags) Ide Nancy (Department of Computer Science, Vassar College, Poughkeepsie, NY 12604-0520 USA, ide@cs.vassar.edu) Kilgarriff Adam (ITRI, University of Brighton, Brighton, England, adam@itri.bton.ac.uk)
Keywords	Dictionary, Lexical Database, TEI, Up-Translation, XML
Session	Session WP1 - Lexicon
Abstract	The value of language resources is greatly enhanced if they share a common markup with an explicit minimal semantics. Achieving this goal for lexical databases is difficult, as large-scale resources can realistically only be obtained by up-translation from pre-existing dictionaries, each with its own proprietary structure. This paper describes the approach we have taken in the Concede project, which aims to develop compatible lexical databases for six Central and Eastern European languages. Starting with sample entries from original presentation-oriented electronic representations of dictionaries, we transformed the data into an intermediate TEI-compatible represen-tation to provide a common baseline for evaluating and comparing the dictionaries. We then developed a more restrictive encoding, formalised as an XML DTD with a clearly-defined semantic interpretation. We present this DTD and discuss a sample conversion from TEI, together with an application which hyperlinks a HTML representation of the dictionary to on-line concordancing over a corpus.