SUMMARY : Session P17-E
Title | Recognizing Acronyms and their Definitions in Swedish Medical Texts |
---|---|
Authors | D. Kokkinakis, D. Dannélls |
Abstract | This paper addresses the task of recognizing acronym-definition pairs in Swedish (medical) texts as well as the compilation of a freely available sample of such manually annotated pairs. A material suitable not only for supervised learning experiments, but also as a testbed for the evaluation of the quality of future acronym-definition recognition systems. There are a number of approaches to the identification described in the literature, particularly within the biomedical domain, but none of those addresses the variation and complexity exhibited in a language other than English. This is realized by the fact that we can have a mixture of two languages in the same document and/or sentence, i.e. Swedish and English; that Swedish is a compound language that significantly deteriorates the performance of previous approaches (without adaptations) and, most importantly, the fact that there is a large variation of possible acronym-definition permutations realized in the analysed corpora, a variation that is usually ignored in previous studies. |
Keywords | acronyms and definitions, medical corpus, machine learning |
Full paper | Recognizing Acronyms and their Definitions in Swedish Medical Texts |