Title |
Using the Text Corpus to Create a Comprehensive List of Phrasal Verbs |
Authors |
Heiki-Jaan Kaalep (University of Tartu, Tiigi 78-203, 50090 Tartu, Estonia) Kadri Muischnek (University of Tartu, Tiigi 78-203, 50090 Tartu, Estonia) |
Session |
WO2: Acquisition Of Lexical Information |
Abstract |
The paper describes extraction of Estonian multi-word verbs from text corpora, using a language- and task-specific software tool SENVA, which is based on a statistical language-independent software tool SENTA (Dias et al, 2000). The outcome is a comprehensive list of 16,000 phrasal verbs. We describe the extraction tool, manual post-editing principles, and evaluate the outcome in terms of precision and recall, comparing the results with man-made electronic dictionaries, and with the results of a manual extraction experiment of a sub-set of the MWV-s. |
Keywords |
Text corpus |
Full Paper |