Title |
Extraction of German Multiword Expressions from Parsed Corpora Using Context Features |
Authors |
Marion Weller and Ulrich Heid |
Abstract |
We report about tools for the extraction of German multiword expressions (MWEs) from text corpora; we extract word pairs, but also longer MWEs of different patterns, e.g. verb-noun structures with an additional prepositional phrase or adjective. Next to standard association-based extraction, we focus on morpho-syntactic, syntactic and lexical-choice features of the MWE candidates. A broad range of such properties (e.g. number and definiteness of nouns, adjacency of the MWEs components and their position in the sentence, preferred lexical modifiers, etc.) along with relevant example sentences, are extracted from dependency-parsed text and stored in a data base. A sample precision evaluation and an analysis of extraction errors are provided along with the discussion of our extraction architecture. We furthermore measure the contribution of the features to the precision of the extraction: by using both morpho-syntactic and syntactic features, we achieve a higher precision in the identification of idiomatic MWEs, than by using only properties of one type. |
Topics |
MultiWord Expressions & Collocations, Lexicon, lexical database, Parsing |
Full paper |
Extraction of German Multiword Expressions from Parsed Corpora Using Context Features |
Slides |
Extraction of German Multiword Expressions from Parsed Corpora Using Context Features |
Bibtex |
@InProceedings{WELLER10.428,
author = {Marion Weller and Ulrich Heid}, title = {Extraction of German Multiword Expressions from Parsed Corpora Using Context Features}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |