This paper briefly introduces the Language into Act Theory (L-AcT), that proposes a pragmatic framework for the corpus-based collection and analysis of spontaneous speech. The L-AcT methodology takes the utterance (i.e. the counterpart of a speech act) as the reference unit for analysis. A set of large-scale Romance corpora has been collected in accordance with the L-AcT methodology (LABLITA Corpus, C-ORAL-ROM, C-ORAL-BRASIL, Cor-DiAL). Data for each corpus can be compared across languages, since they are built using the same corpus design, which entails a set of variation parameters relevant for representing spontaneous speech and, specifically, its pragmatic variation. LABLITA-C-ORAL corpora are text/sound aligned at the utterance level. Empirical research carried out by LABLITA has verified a systematic correspondence between stretches of speech ending with a terminal prosodic break and the accomplishment of an illocutionary force, thus identifying utterances. Within the latter, a correspondence between chunks separated by non-terminal breaks and information functions has been identified. The IPIC database was created for the cross-linguistic comparison of information structure in Romance languages. With regard to the pragmatic classification of utterances, a working repertory of illocutionary types has been established, induced empirically from pragmatic and prosodic features shared in Romance corpora.
@InProceedings{CRESTI18.3, author = {Emanuela Cresti ,Lorenzo Gregori ,Massimo Moneglia and Alessandro Panunzi}, title = {The Language into Act Theory: A Pragmatic Approach to Speech in Real-Life}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Hanae Koiso and Patrizia Paggio}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-16-0}, language = {english} }