Title |
Computer Assisted Semantic Annotation in the DutchSemCor Project |
Authors |
Attila Görög and Piek Vossen |
Abstract |
The goal of this paper is to describe the annotation protocols and the Semantic Annotation Tool (SAT) used in the DutchSemCor project. The DutchSemCor project is aiming at aligning the Cornetto lexical database with the Dutch language corpus SoNaR. 250K corpus occurrences of the 3,000 most frequent and most ambiguous Dutch nouns, adjectives and verbs are being annotated manually using the SAT. This data is then used for bootstrapping 750K extra occurrences which in turn will be checked manually. Our main focus in this paper is the methodology applied in the project to attain the envisaged Inter-annotator Agreement (IA) of =80%. We will also discuss one of the main objectives of DutchSemCor i.e. to provide semantically annotated language data with high scores for quantity, quality and diversity. Sample data with high scores for these three features can yield better results for co-training WSD systems. Finally, we will take a brief look at our annotation tool. |
Topics |
Corpus (creation, annotation, etc.), Word Sense Disambiguation, Tools, systems, applications |
Full paper |
Computer Assisted Semantic Annotation in the DutchSemCor Project |
Slides |
- |
Bibtex |
@InProceedings{GRG10.269,
author = {Attila Görög and Piek Vossen}, title = {Computer Assisted Semantic Annotation in the DutchSemCor Project}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |