Title |
Tools for Collocation Extraction: Preferences for Active vs. Passive |
Authors |
Ulrich Heid and Marion Weller |
Abstract |
We present and partially evaluate procedures for the extraction of noun+verb collocation candidates from German text corpora, along with their morphosyntactic preferences, especially for the active vs. passive voice. We start from tokenized, tagged, lemmatized and chunked text, and we use extraction patterns formulated in the CQP corpus query language. We discuss the results of a precision evaluation, on administrative texts from the European Union: we find a considerable amount of specialized collocations, as well as general ones and complex predicates; overall the precision is considerably higher than that of a statistical extractor used as a baseline. |
Language |
|
Topics |
MultiWord Expressions & Collocations, Lexicon, lexical database, Acquisition, Machine Learning |
Full paper |
Tools for Collocation Extraction: Preferences for Active vs. Passive |
Slides |
Tools for Collocation Extraction: Preferences for Active vs. Passive |
Bibtex |
@InProceedings{HEID08.323,
author = {Ulrich Heid and Marion Weller},
title = {Tools for Collocation Extraction: Preferences for Active vs. Passive},
booktitle = {Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)},
year = {2008},
month = {may},
date = {28-30},
address = {Marrakech, Morocco},
editor = {Nicoletta Calzolari (Conference Chair), Khalid Choukri, Bente Maegaard, Joseph Mariani, Jan Odijk, Stelios Piperidis, Daniel Tapias},
publisher = {European Language Resources Association (ELRA)},
isbn = {2-9517408-4-0},
note = {http://www.lrec-conf.org/proceedings/lrec2008/},
language = {english}
} |