Title |
A Self-Expanding Corpus Based on Newspapers on the Web |
Authors |
Hofland Knut (HIT Centre, University of Bergen Allegt. 27, N-5007 Bergen, Norway, email:Knut.Hofland@hit.uib.no) |
Keywords |
Batch Download, Corpus, Newspapers, Web, Web-Based Concordance |
Session |
Session WO15 - Language Resources Projects |
Full Paper |
362.ps, 362.pdf |
Abstract |
A Unix-based system is presented which automatic collects newspaper articles from the web, converts the texts, and includes these texts in a newspaper corpus. This corpus can be searched from a web-browser. The corpus is currently 70 millions words and increases by 4 millions words each month. |