Title |
Towards a Strategy for a Representation of Collocations - Extending the Danish PAROLE-lexicon |
Authors |
Braasch Anna (Center for Sprogteknologi Njalsgade 80, DK-2300, Denmark, e-mail: anna@cst.ku.dk) Olsen Sussi (Center for Sprogteknologi Njalsgade 80, DK-2300, Denmark, e-mail: sussi@cst.ku.dk) |
Keywords |
Collocation, NLP-Lexicon, PAROLE, Word Combinations |
Session |
Session WP4 - Lexicon: Semantic and Multilingual Issues |
Full Paper |
47.ps, 47.pdf |
Abstract |
We describe our attempts to formulate a pragmatic definition and a partial typology of the lexical category of ’collocation’ taking both lexicographical and computational aspects into consideration. This provides a suitable basis for encoding collocations in an NLP-lexicon. Further, this paper explains the principles of an operational encoding strategy which is applied to a core section of the typology, namely to subtypes of verbal collocation. This strategy is adapted to a pre-defined lexicon model which has been developed in the PAROLE-project. The work is carried out within the framework of the STO-project the aim of which is to extend the Danish PAROLE-lexicon. The encoding of collocations, in addition to single-word lemmas, greatly increases the lexical and linguistic coverage and thereby also the usability of the lexicon as a whole. Decisions concerning the selection of the most frequent types of collocation to be encoded are made on empirical data i.e. corpus-based recognition. We present linguistic descriptions with focus on some characteristic syntactic features of collocations that are observed in a newspaper corpus. We then give a few prototypical examples provided with formalised descriptions in order to illustrate the restriction features. Finally, we discuss the
perspectives of the work done so far. |