LREC 2012 Proceedings

Summary of the paper

Title	DutchSemCor: Targeting the ideal sense-tagged corpus
Authors	Piek Vossen, Attila Görög, Rubén Izquierdo and Antal Van den Bosch
Abstract	Word Sense Disambiguation (WSD) systems require large sense-tagged corpora along with lexical databases to reach satisfactory results. The number of English language resources for developed WSD increased in the past years while most other languages are still under-resourced. The situation is no different for Dutch. In order to overcome this data bottleneck, the DutchSemCor project will deliver a Dutch corpus that is sense-tagged with senses from the Cornetto lexical database. In this paper, we discuss the different conflicting requirements for a sense-tagged corpus and our strategies to fulfill them. We report on a first series of experiments to sup- port our semi-automatic approach to build the corpus.
Topics	Word Sense Disambiguation, Corpus (creation, annotation, etc.), Statistical and machine learning methods
Full paper	DutchSemCor: Targeting the ideal sense-tagged corpus
Bibtex	@InProceedings{VOSSEN12.187, author = {Piek Vossen and Attila Görög and Rubén Izquierdo and Antal Van den Bosch}, title = {DutchSemCor: Targeting the ideal sense-tagged corpus}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} }