Title |
Syntactic annotation of spontaneous speech: application to call-center conversation data |
Authors |
Thierry Bazillon, Melanie Deplano, Frederic Bechet, Alexis Nasr and Benoit Favre |
Abstract |
This paper describes the syntactic annotation process of the DECODA corpus. This corpus contains manual transcriptions of spoken conversations recorded in the French call-center of the Paris Public Transport Authority (RATP). Three levels of syntactic annotation have been performed with a semi-supervised approach: POS tags, Syntactic Chunks and Dependency parses. The main idea is to use off-the-shelf NLP tools and models, originaly developped and trained on written text, to perform a first automatic annotation on the manually transcribed corpus. At the same time a fully manual annotation process is performed on a subset of the original corpus, called the GOLD corpus. An iterative process is then applied, consisting in manually correcting errors found in the automatic annotations, retraining the linguistic models of the NLP tools on this corrected corpus, then checking the quality of the adapted models on the fully manual annotations of the GOLD corpus. This process iterates until a certain error rate is reached. This paper describes this process, the main issues raising when adapting NLP tools to process speech transcriptions, and presents the first evaluations performed with these new adapted tools. |
Topics |
Corpus (creation, annotation, etc.), Speech resource/database |
Full paper |
Syntactic annotation of spontaneous speech: application to call-center conversation data |
Bibtex |
@InProceedings{BAZILLON12.682,
author = {Thierry Bazillon and Melanie Deplano and Frederic Bechet and Alexis Nasr and Benoit Favre}, title = {Syntactic annotation of spontaneous speech: application to call-center conversation data}, booktitle = {Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12)}, year = {2012}, month = {may}, date = {23-25}, address = {Istanbul, Turkey}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Uğur Doğan and Bente Maegaard and Joseph Mariani and Asuncion Moreno and Jan Odijk and Stelios Piperidis}, publisher = {European Language Resources Association (ELRA)}, isbn = {978-2-9517408-7-7}, language = {english} } |