LREC 2006 - Proceedings sorted by papers

Title	Multilevel corpus analysis: generating and querying an AGset of spoken Italian (SpIt-MDb).
Authors	R. Savy, F. Cutugno, C. Crocco
Abstract	In this paper we present an application of AGTK to a corpus of spoken Italian annotated at many different linguistic levels. The work consists of two parts: a) the presentation of AG-SpIt, a toolkit devoted to corpus data management that we developed according to AGTK proposals; b) the presentation of corpus’ structure together with some examples and results of cross-level linguistic analyses obtained querying the database (SpIt-MDb). As this work is still an ongoing investigation, results must be considered preliminary, as a ‘demo’ illustrating the potentiality of the tool and the advantages it introduces to validate linguistic theories and annotation systems. Currently, SpIt-MDb is a linguistic resource under development; it represents one of the first attempts to create an Italian corpus labelled at various linguistic levels (from acoustic/sub-phonetic, to textual/pragmatic ones) which can be queried in the interrelations among levels.
Keywords	Spoken Italian corpus; multilevel database; cross-level queries
Full paper	Multilevel corpus analysis: generating and querying an AGset of spoken Italian (SpIt-MDb).

Title

Multilevel corpus analysis: generating and querying an AGset of spoken Italian (SpIt-MDb).

Authors

R. Savy, F. Cutugno, C. Crocco

Abstract

In this paper we present an application of AGTK to a corpus of spoken Italian annotated at many different linguistic levels. The work consists of two parts: a) the presentation of AG-SpIt, a toolkit devoted to corpus data management that we developed according to AGTK proposals; b) the presentation of corpus’ structure together with some examples and results of cross-level linguistic analyses obtained querying the database (SpIt-MDb). As this work is still an ongoing investigation, results must be considered preliminary, as a ‘demo’ illustrating the potentiality of the tool and the advantages it introduces to validate linguistic theories and annotation systems. Currently, SpIt-MDb is a linguistic resource under development; it represents one of the first attempts to create an Italian corpus labelled at various linguistic levels (from acoustic/sub-phonetic, to textual/pragmatic ones) which can be queried in the interrelations among levels.

Keywords

Spoken Italian corpus; multilevel database; cross-level queries

Full paper

Multilevel corpus analysis: generating and querying an AGset of spoken Italian (SpIt-MDb).