LREC 2000 - Papers

LREC 2000 2^nd International Conference on Language Resources & Evaluation

Conference Papers

Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377.

List of all papers and abstracts.

Previous Paper Next Paper

Title An Open Architecture for the Construction and Administration of Corpora

Authors Orasan Constantin (School of Humanities, Languages and Social Sciences, Stafford Street, University of Wolverhampton, Wolverhampton, WV1 1SB, United Kingdom, in6093@wlv.ac.uk)
Krishnamurthy Ramesh (Computational Linguistics Group, School of Humanities, Languages and Social Sciences, R.Krishnamurthy@wlv.ac.uk, University of Wolverhampton, Stafford Street, Wolverhampton, WV1 1SB, United Kingdom)

Keywords Client-Server, Copyright, Corpora, Corpus Administration, Corpus Building, Modular Programming

Session Session WO12 - Language Resources: Infrastructural Issues

Abstract The use of language corpora for a variety of purposes has increased significantly in recent years. General corpora are now available for many languages, but research often requires more specialized corpora. The rapid development of the World Wide Web has greatly improved access to data in electronic form, but research has tended to focus on corpus annotation, rather than on corpus building tools. Therefore many researchers are building their own corpora, solving problems independently, and producing project-specific systems which cannot easily be re-used. This paper proposes an open client-server architecture which can service the basic operations needed in the construction and administration of corpora, but allows customisation by users in order to carry out project-specific tasks. The paper is based partly on recent practical experience of building a corpus of 10 million words of Written Business English from webpages, in a project which was co-funded by ELRA and the University of Wolverhampton.

Verdana">

Title	An Open Architecture for the Construction and Administration of Corpora
Authors	Orasan Constantin (School of Humanities, Languages and Social Sciences, Stafford Street, University of Wolverhampton, Wolverhampton, WV1 1SB, United Kingdom, in6093@wlv.ac.uk) Krishnamurthy Ramesh (Computational Linguistics Group, School of Humanities, Languages and Social Sciences, R.Krishnamurthy@wlv.ac.uk, University of Wolverhampton, Stafford Street, Wolverhampton, WV1 1SB, United Kingdom)
Keywords	Client-Server, Copyright, Corpora, Corpus Administration, Corpus Building, Modular Programming
Session	Session WO12 - Language Resources: Infrastructural Issues
Abstract	The use of language corpora for a variety of purposes has increased significantly in recent years. General corpora are now available for many languages, but research often requires more specialized corpora. The rapid development of the World Wide Web has greatly improved access to data in electronic form, but research has tended to focus on corpus annotation, rather than on corpus building tools. Therefore many researchers are building their own corpora, solving problems independently, and producing project-specific systems which cannot easily be re-used. This paper proposes an open client-server architecture which can service the basic operations needed in the construction and administration of corpora, but allows customisation by users in order to carry out project-specific tasks. The paper is based partly on recent practical experience of building a corpus of 10 million words of Written Business English from webpages, in a project which was co-funded by ELRA and the University of Wolverhampton.