Title |
QurAna: Corpus of the Quran annotated with Pronominal Anaphora |
Authors |
Abdul-Baquee Sharaf and Eric Atwell |
Abstract |
This paper presents QurAna: a large corpus created from the original Quranic text, where personal pronouns are tagged with their antecedence. These antecedents are maintained as an ontological list of concepts, which have proved helpful for information retrieval tasks. QurAna is characterized by: (a) comparatively large number of pronouns tagged with antecedent information (over 24,500 pronouns), and (b) maintenance of an ontological concept list out of these antecedents. We have shown useful applications of this corpus. This corpus is first of its kind considering classical Arabic text, which could be used for interesting applications for Modern Standard Arabic as well. This corpus would benefit researchers in obtaining empirical and rules in building new anaphora resolution approaches. Also, such corpus would be used to train, optimize and evaluate existing approaches. |
Topics |
Corpus (creation, annotation, etc.), Anaphora, Coreference, Text mining |
Full paper |
QurAna: Corpus of the Quran annotated with Pronominal Anaphora |
Bibtex |
@InProceedings{SHARAF12.123,
author = {Abdul-Baquee Sharaf and Eric Atwell}, title = {QurAna: Corpus of the Quran annotated with Pronominal Anaphora}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |