LREC 2016 Proceedings

Summary of the paper

Title	Compilation of an Arabic Children’s Corpus
Authors	Latifa Al-Sulaiti, Noorhan Abbas, Claire Brierley, Eric Atwell and Ayman Alghamdi
Abstract	Inspired by the Oxford Children's Corpus, we have developed a prototype corpus of Arabic texts written and/or selected for children. Our Arabic Children's Corpus of 2950 documents and nearly 2 million words has been collected manually from the web during a 3-month project. It is of high quality, and contains a range of different children's genres based on sources located, including classic tales from The Arabian Nights, and popular fictional characters such as Goha. We anticipate that the current and subsequent versions of our corpus will lead to interesting studies in text classification, language use, and ideology in children's texts.
Topics	Corpus (Creation, Annotation, etc.), Document Classification, Text categorisation, Metadata
Full paper	Compilation of an Arabic Children’s Corpus
Bibtex	@InProceedings{ALSULAITI16.142, author = {Latifa Al-Sulaiti and Noorhan Abbas and Claire Brierley and Eric Atwell and Ayman Alghamdi}, title = {Compilation of an Arabic Children’s Corpus}, booktitle = {Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016)}, year = {2016}, month = {may}, date = {23-28}, location = {Portorož, Slovenia}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Sara Goggi and Marko Grobelnik and Bente Maegaard and Joseph Mariani and Helene Mazo and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {978-2-9517408-9-1}, language = {english} }