An important kind of discourse annotation is relational annotation in which texts are analyzed with respect to coherence relations (relations between text components, such as Cause or Evidence) present in the texts. Relational annotation according to Rhetorical Structure Theory (Mann and Thompson, 1988) typically begins with segmenting a text into minimal discourse units, which are then linked with each other (and later recursively with larger units) by certain coherence relations. As part of an ongoing corpus development project called the Bangla RST Discourse Treebank (Das and Stede, to appear), we have considered, examined and implemented a number of segmentation principles and strategies for dividing Bangla texts into minimal discourse units for the purpose of relational annotation. In this paper, we provide an overview of our annotation tasks, and describe our segmentation guidelines. We also present a few problems we encountered in segmenting Bangla texts, and discuss how we have addressed those issues.
@InProceedings{DAS18.9, author = {Debopam Das}, title = {Discourse Segmentation in Bangla}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Girish Nath Jha and Kalika Bali and Sobha L and Atul
Kr. Ojha}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-09-2}, language = {english} }