Title |
TimeBankPT: A TimeML Annotated Corpus of Portuguese |
Authors |
Francisco Costa and António Branco |
Abstract |
In this paper, we introduce TimeBankPT, a TimeML annotated corpus of Portuguese. It has been produced by adapting an existing resource for English, namely the data used in the first TempEval challenge. TimeBankPT is the first corpus of Portuguese with rich temporal annotations (i.e. it includes annotations not only of temporal expressions but also about events and temporal relations). In addition, it was subjected to an automated error mining procedure that checks the consistency of the annotated temporal relations based on their logical properties. This procedure allowed for the detection of some errors in the annotations, that also affect the original English corpus. The Portuguese language is currently undergoing a spelling reform, and several countries where Portuguese is official are in a transitional period where old and new orthographies are valid. TimeBankPT adopts the recent spelling reform. This decision is to preserve its future usefulness. TimeBankPT is freely available for download. |
Topics |
Corpus (creation, annotation, etc.), Statistical and machine learning methods, Validation of LRs |
Full paper |
TimeBankPT: A TimeML Annotated Corpus of Portuguese |
Bibtex |
@InProceedings{COSTA12.246,
author = {Francisco Costa and António Branco}, title = {TimeBankPT: A TimeML Annotated Corpus of Portuguese}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |