Title |
Providing Internet Access to Portuguese Corpora: the AC/DC Project |
Authors |
Santos Diana (SINTEF Telecom and Informatics, Postboks 1024 Blindern, N-0314 Oslo, Norway, Diana.Santos@informatics.sintef.no) Bick Eckhard (SINTEF Telecom and Informatics, Postboks 1024 Blindern, N-0314 Oslo, Norway, lineb@hum.au.dk) |
Keywords |
Constraint Grammar, Corpora, Language Resource Creation, Parsing, Web Interfaces |
Session |
Session WO5 - Corpus Tools |
Full Paper |
85.ps, 85.pdf |
Abstract |
In this paper we report on the activity of the project Computational Processing of Portuguese (Processamento computacional do portugues) in what concerns providing access to Portuguese corpora through the Internet. One of its activities, the AC/DC project (Acesso a corpora/Disponibilizacao de Corpora, roughly ''Access and Availability of Corpora'') allows a user to query around 40 million words of Portuguese text. After describing the aims of the service, which is still being subject to regular improvements, we focus on the process of tagging and parsing the underlying corpora, using a Constraint Grammar parser for Portuguese. |