Title |
Texto4Science: a Quebec French Database of Annotated Short Text Messages |
Authors |
Philippe Langlais, Patrick Drouin, Amélie Paulus, Eugénie Rompré Brodeur and Florent Cottin |
Abstract |
In October 2009, was launched the Quebec French part of the international sms4science project, called texto4science. Over a period of 10 months, we collected slightly more than 7000 SMSs that we carefully annotated. This database is now ready to be used by the community. The purpose of this article is to relate the efforts put into designing this database and provide some data analysis of the main linguistic phenomenon that we have annotated. We also report on a socio-linguistic survey we conducted within the project. |
Topics |
Corpus (creation, annotation, etc.), LR national/international projects, organizational/policy issues, Lexicon, lexical database |
Full paper |
Texto4Science: a Quebec French Database of Annotated Short Text Messages |
Bibtex |
@InProceedings{LANGLAIS12.413,
author = {Philippe Langlais and Patrick Drouin and Amélie Paulus and Eugénie Rompré Brodeur and Florent Cottin}, title = {Texto4Science: a Quebec French Database of Annotated Short Text Messages}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |