Title |
Experimental Deployment of a Grid Virtual Organization for Human Language Technologies |
Authors |
Jan Jona Javoršek and Tomaž Erjavec |
Abstract |
We propose to create a grid virtual organization for human language technologies, at first chiefly with the task of enabling linguistic researches to use existing distributed computing facilities of the European grid infrastructure for more efficient processing of large data sets. After a brief overview of modern grid computing, a number of common use-cases of natural language processing tasks running on the grid are presented, notably corpus annotation with morpho-syntactic tagging (600+ million-word corpus annotated in less than a day), $n$-gram statistics processing of a corpus and creation of grid-backed web-accessible services with annotation and term-extraction as examples. Implementation considerations and common problems of using grid for this type of tasks are laid out. We conclude with an outline of a simple action plan for evolving the infrastructure created for these experiments into a fully functional Human Language Technology grid Virtual Organization with the goal of making the power of European grid infrastructure available to the linguistic community. |
Topics |
Tools, systems, applications, Corpus (creation, annotation, etc.), LR Infrastructures and Architectures |
Full paper |
Experimental Deployment of a Grid Virtual Organization for Human Language Technologies |
Slides |
- |
Bibtex |
@InProceedings{JAVOREK10.899,
author = {Jan Jona Javoršek and Tomaž Erjavec}, title = {Experimental Deployment of a Grid Virtual Organization for Human Language Technologies}, booktitle = {Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)}, year = {2010}, month = {may}, date = {19-21}, address = {Valletta, Malta}, editor = {Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis and Mike Rosner and Daniel Tapias}, publisher = {European Language Resources Association (ELRA)}, isbn = {2-9517408-6-7}, language = {english} } |