LREC 2016 Proceedings

Summary of the paper

Title	TweetMT: A Parallel Microblog Corpus
Authors	Iñaki San Vicente, Iñaki Alegria, Cristina España-Bonet, Pablo Gamallo, Hugo Gonçalo Oliveira, Eva Martinez Garcia, Antonio Toral, Arkaitz Zubiaga and Nora Aranberri
Abstract	We introduce TweetMT, a parallel corpus of tweets in four language pairs that combine five languages (Spanish from/to Basque, Catalan, Galician and Portuguese), all of which have an official status in the Iberian Peninsula. The corpus has been created by combining automatic collection and crowdsourcing approaches, and it is publicly available. It is intended for the development and testing of microtext machine translation systems. In this paper we describe the methodology followed to build the corpus, and present the results of the shared task in which it was tested.
Topics	Machine Translation, SpeechToSpeech Translation, Corpus (Creation, Annotation, etc.), Social Media Processing
Full paper	TweetMT: A Parallel Microblog Corpus
Bibtex	@InProceedings{SANVICENTE16.465, author = {Iñaki San Vicente and Iñaki Alegria and Cristina España-Bonet and Pablo Gamallo and Hugo Gonçalo Oliveira and Eva Martinez Garcia and Antonio Toral and Arkaitz Zubiaga and Nora Aranberri}, title = {TweetMT: A Parallel Microblog Corpus}, booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)}, year = {2016}, month = {may}, date = {23-28}, location = {Portorož, Slovenia}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Sara Goggi and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Helene Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {978-2-9517408-9-1}, language = {english} }