LREC 2014 Proceedings

Summary of the paper

Title	Building The Sense-Tagged Multilingual Parallel Corpus
Authors	Shan Wang and Francis Bond
Abstract	Sense-annotated parallel corpora play a crucial role in natural language processing. This paper introduces our progress in creating such a corpus for Asian languages using English as a pivot, which is the first such corpus for these languages. Two sets of tools have been developed for sequential and targeted tagging, which are also easy to set up for any new language in addition to those we are annotating. This paper also briefly presents the general guidelines for doing this project. The current results of monolingual sense-tagging and multilingual linking are illustrated, which indicate the differences among genres and language pairs. All the tools, guidelines and the manually annotated corpus will be freely available at compling.ntu.edu.sg/ntumc.
Topics	Multilinguality, Linked Data
Full paper	Building The Sense-Tagged Multilingual Parallel Corpus
Bibtex	@InProceedings{WANG14.916, author = {Shan Wang and Francis Bond}, title = {Building The Sense-Tagged Multilingual Parallel Corpus}, booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)}, year = {2014}, month = {may}, date = {26-31}, address = {Reykjavik, Iceland}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Hrafn Loftsson and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-8-4}, language = {english} }