pdf |
bib |
Papers |
pages |
pdf |
bib |
Current Challenges in Web Corpus Building Miloš Jakubíček, Vojtěch Kovář, Pavel Rychlý and Vit Suchomel |
pp. 1‑4 |
pdf |
bib |
Out-of-the-Box and into the Ditch? Multilingual Evaluation of Generic Text Extraction Tools Adrien Barbaresi and Gaël Lejeune |
pp. 5‑13 |
pdf |
bib |
From Web Crawl to Clean Register-Annotated Corpora Veronika Laippala, Samuel Rönnqvist, Saara Hellström, Juhani Luotolahti, Liina Repo, Anna Salmela, Valtteri Skantsi and Sampo Pyysalo |
pp. 14‑22 |
pdf |
bib |
Building Web Corpora for Minority Languages Heidi Jauhiainen, Tommi Jauhiainen and Krister Lindén |
pp. 23‑32 |
pdf |
bib |
The ELTE.DH Pilot Corpus – Creating a Handcrafted Gigaword Web Corpus with Metadata Balázs Indig, Árpád Knap, Zsófia Sárközi-Lindner, Mária Timári and Gábor Palkó |
pp. 33‑41 |
pdf |
bib |
Hypernym-LIBre: A Free Web-based Corpus for Hypernym Detection Shaurya Rawat, Mariano Rico and Oscar Corcho |
pp. 42‑49 |
pdf |
bib |
A Cross-Genre Ensemble Approach to Robust Reddit Part of Speech Tagging Shabnam Behzad and Amir Zeldes |
pp. 50‑56 |
pdf |
bib |
Streaming Language-Specific Twitter Data with Optimal Keywords Tim Kreutz and Walter Daelemans |
pp. 57‑64 |