Title |
HunOr: A Hungarian―Russian Parallel Corpus |
Authors |
Martina Katalin Szabó, Veronika Vincze and István Nagy T. |
Abstract |
In this paper, we present HunOr, the first multi-domain Hungarian―Russian parallel corpus. Some of the corpus texts have been manually aligned and split into sentences, besides, named entities also have been annotated while the other parts are automatically aligned at the sentence level and they are POS-tagged as well. The corpus contains texts from the domains literature, official language use and science, however, we would like to add texts from the news domain to the corpus. In the future, we are planning to carry out a syntactic annotation of the HunOr corpus, which will further enhance the usability of the corpus in various NLP fields such as transfer-based machine translation or cross lingual information retrieval. |
Topics |
Corpus (creation, annotation, etc.), Multilinguality, Named Entity recognition |
Full paper |
HunOr: A Hungarian―Russian Parallel Corpus |
Bibtex |
@InProceedings{SZAB12.262,
author = {Martina Katalin Szabó and Veronika Vincze and István Nagy T.}, title = {HunOr: A Hungarian―Russian Parallel Corpus}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |