W17 2018 Proceedings

Summary of the paper

Title	Distributed Corpus Search
Authors	Radoslav Rábara, Pavel Rychlý and Ondřej Herman
Abstract	Every year, the amount of text produced and stored increases, and so do the requirements to process and search it. Some of the largest corpora we build for use in Sketch Engine now take months to compile, and their searching leaves a lot to be desired even on today's state-of-art machines, mainly due to lack of parallelization and storage bottlenecks. We describe our experiments in distributing the processing of corpus operations over a cluster of commodity computers.
Topics	Corpus
Full paper	Distributed Corpus Search
Bibtex	@InProceedings{RÁBARA18.15, author = {Radoslav Rábara ,Pavel Rychlý and Ondřej Herman}, title = {Distributed Corpus Search}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Piotr Banski and Marc Kupietz and Adrien Barbaresi and Hanno Biber and Evelyn Breiteneder and Simon Clematide and Andreas Witt}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-14-6}, language = {english} }