LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title Portuguese Corpora at CLUL
Authors Bacelar do Nascimento Maria Fernanda (Centro de Línguística da Universidade de Lisboa, Av. 5 de Outubro, Nº85, 5º-6º 1050-050 LISBOA, fbacelar.nascimento@clul.ul.pt)
Pereira Luisa (Centro de Línguística da Universidade de Lisboa, Av. 5 de Outubro, Nº85, 5º-6º 1050-050 LISBOA, luisa.alice.sp@clul.ul.pt)
Saramago João (Centro de Línguística da Universidade de Lisboa, Av. 5 de Outubro, Nº85, 5º-6º 1050-050 LISBOA, j.saramago@clul.ul.pt)
Keywords Applications, Oral Corpora, Portuguese Varieties, Tools, Written Corpora
Session Session WP7 - Corpus Projects
Full Paper 72.ps, 72.pdf
Abstract The Corpus de Referência do Português Contemporâneo (CRPC) is being developed in the Centro de Linguística da Universidade de Lisboa (CLUL) since 1988 under a perspective of research data enlargement, in the sense of concepts and hypothesis verification by rejecting the sole use of intuitive data. The intention of creating this open corpus is to establish an on-line representative sample collection of general usage contemporary Portuguese: a main corpus of great dimension as well as several specialized corpora. The CRPC has nowadays around 92 million words. Following the use in this area, the CRPC project intends to establish a linguistic database accessible to everyone interested in making theoretical and practical studies or applications. The Dialectal oral corpus of the Atlas Linguístico-Etnográfico de Portugal e da Galiza (ALEPG) is constituted by approximately 3500 hours of speech collected by the CLUL Dialectal Studies Research Group and recorded in analogic audio tape. This corpus contains mainly directed speech: answers to a linguistic questionnaire essentially lexical, but also focusing on some phonetic and morpho-phonological phenomena. An important part of spontaneous speech enables other kind of studies such as syntactic, morphological or phonetic ones.