Title |
Enhancing an English-Polish Electronic Dictionary for Multiword Expression Research |
Authors |
Piotr Bański and Radosław Moszczyński |
Abstract |
This paper describes a project aimed at converting a legacy representation of English idioms into an XML-based format. The project is set in the context of a large electronic English-Polish dictionary which contains several hundred formalized idiom descriptions and which has been released under the terms of a free license. In short, the project consists of three phases: cleaning up the dictionary markup, extracting the legacy idiom representations, and converting them into TEI P5 XML constrained by a RelaxNG grammar created for this purpose and constituting a module that can be included as part of the TEI P5 schema. The paper contains general descriptions of the individual phases and several examples of XML-encoded idioms. It also suggests some directions for further research, which include abstracting the XML-ized idiom representations into general syntactic patterns and using the representations to automatically identify idioms in tagged corpora. |
Language |
Single language |
Topics |
MultiWord Expressions & Collocations, Lexicon, lexical database, LR Infrastructures and Architectures |
Full paper |
Enhancing an English-Polish Electronic Dictionary for Multiword Expression Research |
Slides |
- |
Bibtex |
@InProceedings{BASKI08.651,
author = {Piotr Bański and Radosław Moszczyński},
title = {Enhancing an English-Polish Electronic Dictionary for Multiword Expression Research},
booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
year = {2008},
month = {may},
date = {28-30},
address = {Marrakech, Morocco},
editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
publisher = {European Language Resources Association (ELRA)},
isbn = {2-9517408-4-0},
note = {http://www.lrec-conf.org/proceedings/lrec2008/},
language = {english}
} |