Title |
The MASC Word Sense Corpus |
Authors |
Rebecca J. Passonneau, Collin F. Baker, Christiane Fellbaum and Nancy Ide |
Abstract |
The MASC project has produced a multi-genre corpus with multiple layers of linguistic annotation, together with a sentence corpus containing WordNet 3.1 sense tags for 1000 occurrences of each of 100 words produced by multiple annotators, accompanied by indepth inter-annotator agreement data. Here we give an overview of the contents of MASC and then focus on the word sense sentence corpus, describing the characteristics that differentiate it from other word sense corpora and detailing the inter-annotator agreement studies that have been performed on the annotations. Finally, we discuss the potential to grow the word sense sentence corpus through crowdsourcing and the plan to enhance the content and annotations of MASC through a community-based collaborative effort. |
Topics |
Corpus (creation, annotation, etc.), Word Sense Disambiguation, Lexicon, lexical database |
Full paper |
The MASC Word Sense Corpus |
Bibtex |
@InProceedings{PASSONNEAU12.589,
author = {Rebecca J. Passonneau and Collin F. Baker and Christiane Fellbaum and Nancy Ide}, title = {The MASC Word Sense Corpus}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |