Proceedings of the 12th Web as Corpus Workshop

ISBN: 979-10-95546-68-9
EAN: 9791095546689

List of Papers

pdf	bib	Papers	pages
pdf	bib	Current Challenges in Web Corpus Building Miloš Jakubíček, Vojtěch Kovář, Pavel Rychlý and Vit Suchomel	pp. 1‑4
pdf	bib	Out-of-the-Box and into the Ditch? Multilingual Evaluation of Generic Text Extraction Tools Adrien Barbaresi and Gaël Lejeune	pp. 5‑13
pdf	bib	From Web Crawl to Clean Register-Annotated Corpora Veronika Laippala, Samuel Rönnqvist, Saara Hellström, Juhani Luotolahti, Liina Repo, Anna Salmela, Valtteri Skantsi and Sampo Pyysalo	pp. 14‑22
pdf	bib	Building Web Corpora for Minority Languages Heidi Jauhiainen, Tommi Jauhiainen and Krister Lindén	pp. 23‑32
pdf	bib	The ELTE.DH Pilot Corpus – Creating a Handcrafted Gigaword Web Corpus with Metadata Balázs Indig, Árpád Knap, Zsófia Sárközi-Lindner, Mária Timári and Gábor Palkó	pp. 33‑41
pdf	bib	Hypernym-LIBre: A Free Web-based Corpus for Hypernym Detection Shaurya Rawat, Mariano Rico and Oscar Corcho	pp. 42‑49
pdf	bib	A Cross-Genre Ensemble Approach to Robust Reddit Part of Speech Tagging Shabnam Behzad and Amir Zeldes	pp. 50‑56
pdf	bib	Streaming Language-Specific Twitter Data with Optimal Keywords Tim Kreutz and Walter Daelemans	pp. 57‑64