This paper describes a fundamental re-design and extension of the existing general multi-layer corpus search tool ANNIS, which simplifies its re-use in other tools. This embeddable corpus search library is called graphANNIS and uses annotation graphs as its internal data model. It has a modular design, where each graph component can be implemented by a so-called graph storage and allows efficient reachability queries on each graph component. We show that using different implementations for different types of graphs is much more efficient than relying on a single strategy. Our approach unites the interoperable data model of a directed graph with adaptable and efficient implementations. We argue that graphANNIS can be a valuable building block for applications that need to embed some kind of search functionality on linguistically annotated corpora. Examples are annotation editors that need a search component to support agile corpus creation. The adaptability of graphANNIS, and its ability to support new kinds of annotation structures efficiently, could make such a re-use easier to achieve.
@InProceedings{KRAUSE18.12, author = {Thomas Krause ,Ulf Leser ,Anke Lüdeling and Stephan Druskat}, title = {Designing a Re-Usable and Embeddable Corpus Search Library}, booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)}, year = {2018}, month = {may}, date = {7-12}, location = {Miyazaki, Japan}, editor = {Piotr Banski and Marc Kupietz and Adrien Barbaresi and
Hanno Biber and Evelyn Breiteneder and Simon Clematide and Andreas Witt}, publisher = {European Language Resources Association (ELRA)}, address = {Paris, France}, isbn = {979-10-95546-14-6}, language = {english} }