LREC 2000 2nd International Conference on Language Resources & Evaluation | |
Conference Papers
Papers by paper title: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Papers by ID number: 1-50, 51-100, 101-150, 151-200, 201-250, 251-300, 301-350, 351-377. |
Previous Paper Next Paper
Title | A Self-Expanding Corpus Based on Newspapers on the Web |
Authors |
Hofland Knut (HIT Centre, University of Bergen Allegt. 27, N-5007 Bergen, Norway, email:Knut.Hofland@hit.uib.no) |
Keywords | Batch Download, Corpus, Newspapers, Web, Web-Based Concordance |
Session | Session WO15 - Language Resources Projects |
Abstract | A Unix-based system is presented which automatic collects newspaper articles from the web, converts the texts, and includes these texts in a newspaper corpus. This corpus can be searched from a web-browser. The corpus is currently 70 millions words and increases by 4 millions words each month. |