Title |
Bootstrapping Language Neutral Term Extraction |
Authors |
Wauter Bosma and Piek Vossen |
Abstract |
A variety of methods exist for extracting terms and relations between terms from a corpus, each of them having strengths and weaknesses. Rather than just using the joint results, we apply different extraction methods in a way that the results of one method are input to another. This gives us the leverage to find terms and relations that otherwise would not be found. Our goal is to create a semantic model of a domain. To that end, we aim to find the complete terminology of the domain, consisting of terms and relations such as hyponymy and meronymy, and connected to generic wordnets and ontologies. Terms are ranked by domain-relevance only as a final step, after terminology extraction is completed. Because term relations are a large part of the semantics of a term, we estimate the relevance from its relation to other terms, in addition to occurrence and document frequencies. In the KYOTO project, we apply language-neutral terminology extraction from a parsed corpus for seven languages. |
Topics |
Lexicon, lexical database, MultiWord Expressions & Collocations, Multilinguality |
Full paper |
Bootstrapping Language Neutral Term Extraction |
Slides |
Bootstrapping Language Neutral Term Extraction |
Bibtex |
@InProceedings{BOSMA10.902,
author = {Wauter Bosma and Piek Vossen}, title = {Bootstrapping Language Neutral Term Extraction}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |