LREC 2000 2nd International Conference on Language Resources & Evaluation
 

Previous Paper   Next Paper

Title ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation
Authors Bird Steven (LDC, 3615 Market Street, Suite 200, Philadelphia, PA, 19104-2608, USA, sb@unagi.cis.upenn.edu)
Day David (The MITRE Corporation, 202 Burlington Road, Bedford, MA 01730, USA, http://www.mitre.org/technology/nlp, day@mitre.org)
Garofolo John (National Institute of Standards and Technology, 100 Bureau Drive, Mailstop 8940, Gaithersburg, MD 20899-8940, USA)
Henderson John (The MITRE Corporation, 202 Burlington Road, Bedford, MA 01730, USA, http://www.mitre.org/technology/nlp, jhndrsn@mitre.org)
Laprun Christophe (National Institute of Standards and Technology, 100 Bureau Drive, Mailstop 8940, Gaithersburg, MD 20899-8940, USA)
Liberman Mark (Linguistic Data Consortium, University of Pennsylvania, Philadelphia, Pennsylvania, USA, myl@ldc.upenn.edu)
Keywords  
Session Session SP5 - Multimodal - Multimedia Resources and Tools
Full Paper 184.ps, 184.pdf
Abstract We describe a formal model for annotating linguistic artifacts, from which we derive an application programming interface (API) to a suite of tools for manipulating these annotations. The abstract logical model provides for a range of storage formats and promotes the reuse of tools that interact through this API. We focus first on “Annotation Graphs,” a graph model for annotations on linear signals (such as text and speech) indexed by intervals, for which efficient database storage and querying techniques are applicable. We note how a wide range of existing annotated corpora can be mapped to this annotation graph model. This model is then generalized to encompass a wider variety of linguistic “signals,” including both naturally occuring phenomena (as recorded in images, video, multi-modal interactions, etc.), as well as the derived resources that are increasingly important to the engineering of natural language processing systems (such as word lists, dictionaries, aligned bilingual corpora, etc.). We conclude with a review of the current efforts towards implementing key pieces of this architecture.