Title |
Idioms in Context: The IDIX Corpus |
Authors |
Caroline Sporleder, Linlin Li, Philip Gorinski and Xaver Koch |
Abstract |
Idioms and other figuratively used expressions pose considerable problems to natural language processing applications because they are very frequent and often behave idiosyncratically. Consequently, there has been much research on the automatic detection and extraction of idiomatic expressions. Most studies focus on type-based idiom detection, i.e., distinguishing whether a given expression can (potentially) be used idiomatically. However, many expressions such as ""break the ice"" can have both literal and non-literal readings and need to be disambiguated in a given context (token-based detection). So far relatively few approaches have attempted context-based idiom detection. One reason for this may be that few annotated resources are available that disambiguate expressions in context. With the IDIX corpus, we aim to address this. IDIX is available as an add-on to the BNC and disambiguates different usages of a subset of idioms. We believe that this resource will be useful both for linguistic and computational linguistic studies. |
Topics |
Corpus (creation, annotation, etc.), MultiWord Expressions & Collocations, Word Sense Disambiguation |
Full paper |
Idioms in Context: The IDIX Corpus |
Slides |
- |
Bibtex |
@InProceedings{SPORLEDER10.618,
author = {Caroline Sporleder and Linlin Li and Philip Gorinski and Xaver Koch}, title = {Idioms in Context: The IDIX Corpus}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |